Compare detection strategies per chunk/detector #2922

rgmz · 2024-06-05T23:19:58Z

Description:

This is an alternative implementation to #2918, with two main advantages:

It only requires the data to be downloaded and parsed/chunked once
It provides specific context about the affected detector(s) and chunk(s), rather than vague high-level metrics which are hard to action.

Note: Anything I deleted wasn't necessarily because I didn't think it was useful, it was just the easiest way to implement this.

Example

Compare per chunk/detector

When testing #2894 against a known true positive, it provides immediate and unambiguous feedback that there's a discrepancy.

$ ./trufflehog github --repo="..." --token="..." --compare-detection-strategies  --allow-verification-overlap
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-05T19:11:32-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "XlWGy", "with_units": false, "target_count": 0, "source_manager_units_configurable": true}
...
2024-06-05T19:11:34-04:00       error   trufflehog      Scan results do not match       {"detector_worker_id": "vMNq4", "chunk": {"Github":{"link":"...","repository":"...","commit":"...","email":"...","file":"...","timestamp":"2019-05-06 16:03:25 +0000","line":709}}, "detector": "KubeConfig", "error": "mismatch between custom span and entire chunk: 0 vs 1"}

Compare per scan

Information is only logged at the end, and doesn't provide any insight to which chunk(s) or detector(s) were affected.

$ ./trufflehog github --repo="..." --token="..." --compare-detection-strategies  --allow-verification-overlap
🐷🔑🐷  TruffleHog. Unearth your secrets. 🐷🔑🐷

2024-06-05T19:22:41-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "2xLZv", "with_units": false, "target_count": 0, "source_manager_units_configurable": true}
2024-06-05T19:22:41-04:00       info-0  trufflehog      running source  {"source_manager_worker_id": "lzBGQ", "with_units": false, "target_count": 0, "source_manager_units_configurable": true}
2024-06-05T19:22:41-04:00       info-0  trufflehog      Completed enumeration   {"num_repos": 1, "num_orgs": 0, "num_members": 0}
2024-06-05T19:22:41-04:00       info-0  trufflehog      Completed enumeration   {"num_repos": 1, "num_orgs": 0, "num_members": 0}
2024-06-05T19:22:42-04:00       info-0  trufflehog      scanning repo   {"source_manager_worker_id": "2xLZv", "repo": "..."}
2024-06-05T19:22:42-04:00       info-0  trufflehog      scanning repo   {"source_manager_worker_id": "2xLZv", "repo": "..."}
2024-06-05T19:23:39-04:00       info-0  trufflehog      scanning repo   {"source_manager_worker_id": "lzBGQ", "repo": "..."}
2024-06-05T19:23:39-04:00       info-0  trufflehog      scanning repo   {"source_manager_worker_id": "lzBGQ", "repo": "..."}
Comparison of scan results:
Custom span - Chunks: 169, Bytes: 706634, Verified Secrets: 0, Unverified Secrets: 0, Duration: 1.453095634s
Entire chunk - Chunks: 169, Bytes: 706634, Verified Secrets: 0, Unverified Secrets: 3, Duration: 1.637307744s

Checklist:

Tests passing (make test-community)?
Lint passing (make lint this requires golangci-lint)?

ahrav · 2024-06-06T00:14:09Z

Compare per scan
No error is returned. This may be a bug?

@ahrav there seems to be a bug with EntireChunkSpanCalculator.

I am testing against #2894. It uses the keyword current-context, however, that key can be at the start, middle, or end of the config. Given the following input, I would expect EntireChunkSpanCalculator to return the entire chunk (a), but it only returns data after the keyword (b). Without --scan-entire-chunk, it returns (c).

(These are approximations, not exact return values.)

A: Full Chunk

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: LS0t...
    server: https://console-kubernetes.example.com:443
  name: console-kubernetes-example-com:443
- cluster:
    certificate-authority: ./ca.crt
    server: https://192.168.99.100:8443
  name: minikube
contexts:
- context:
    cluster: console-kubernetes-example-com:443
    user: "system:serviceaccount:default:example/console-kubernetes-example-com:443"
  name: default
- context:
    cluster: minikube
    user: minikube
  name: minikube
current-context: default
kind: Config
preferences: {}
users:
- name: "system:serviceaccount:default:example/console-kubernetes-example-com:443"
  user:
    as-user-extra: {}
    client-certificate-data: LS0t...
    client-key-data: LS0t...
- name: minikube
  user:
    as-user-extra: {}
    client-certificate: ./client.crt
    client-key: ./client.key

B: Full Chunk After Keyword

current-context: default
kind: Config
preferences: {}
users:
- name: "system:serviceaccount:default:example/console-kubernetes-example-com:443"
  user:
    as-user-extra: {}
    client-certificate-data: LS0t...
    client-key-data: LS0...
- name: minikube
  user:
    as-user-extra: {}
    client-certificate: ./client.crt
    client-key: ./client.key

C: Custom Span

current-context: default
kind: Config
preferences: {}
users:
- name: "system:serviceaccount:default:example/console-kubernetes-example-com:443"
  user:
    as-user-extra: {}
    client-certificate-data: LS0t...

Yea, that's my mistake. I should have a fix for that tonight. Thanks for the reminder.

rgmz · 2024-12-02T13:45:09Z

pkg/engine/engine.go

@@ -1094,14 +1126,24 @@ func (e *Engine) detectChunk(ctx context.Context, data detectableChunk) {
 			results = e.filterResults(ctx, data.detector, results)
 		}

+	HandleResults:
+		results = e.filterResults(ctx, data.detector, results)


Idk if this line is still necessary.

rgmz · 2024-12-02T13:45:48Z

pkg/engine/engine.go

+	if (!e.compareScanStrategies) || (e.compareScanStrategies && !scanEntireChunk) {
+		data.wgDoneFn()
+	}


This is a bit brittle. It might be better if moved to detectorWorker.

rgmz requested a review from a team as a code owner June 5, 2024 23:19

rgmz force-pushed the feat/inline-compare branch from e6c8d7d to 298b178 Compare June 5, 2024 23:20

rgmz requested a review from a team as a code owner June 5, 2024 23:20

rgmz force-pushed the feat/inline-compare branch from 298b178 to e6c8d7d Compare June 5, 2024 23:21

This comment was marked as outdated.

Sign in to view

rgmz mentioned this pull request Jun 6, 2024

Add explicit warning when an overlap is detected #2929

Open

2 tasks

rgmz force-pushed the feat/inline-compare branch 2 times, most recently from d845167 to 809cdcf Compare June 11, 2024 18:34

rgmz force-pushed the feat/inline-compare branch 5 times, most recently from ffecb81 to e835036 Compare June 21, 2024 02:51

rgmz force-pushed the feat/inline-compare branch 2 times, most recently from ef30b7c to 6457ae4 Compare July 1, 2024 18:37

rgmz force-pushed the feat/inline-compare branch from 6457ae4 to c6f7138 Compare November 3, 2024 14:37

rgmz requested a review from a team as a code owner November 3, 2024 14:37

rgmz force-pushed the feat/inline-compare branch 3 times, most recently from 6cd8337 to 479c463 Compare November 11, 2024 19:20

rgmz force-pushed the feat/inline-compare branch from 479c463 to c85c7cb Compare December 2, 2024 13:39

rgmz requested a review from a team as a code owner December 2, 2024 13:39

rgmz changed the title ~~POC: Compare detection strategies per chunk/detector~~ Compare detection strategies per chunk/detector Dec 2, 2024

rgmz force-pushed the feat/inline-compare branch from c85c7cb to 727bf58 Compare December 2, 2024 13:43

rgmz commented Dec 2, 2024

View reviewed changes

rgmz force-pushed the feat/inline-compare branch from 727bf58 to 184d3a0 Compare December 15, 2024 15:29

feat(engine): compare results per chunk/detector

60b5791

rgmz force-pushed the feat/inline-compare branch from 184d3a0 to 60b5791 Compare December 21, 2024 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare detection strategies per chunk/detector #2922

Compare detection strategies per chunk/detector #2922

rgmz commented Jun 5, 2024 •

edited

Loading

This comment was marked as outdated.

ahrav commented Jun 6, 2024

rgmz Dec 2, 2024

rgmz Dec 2, 2024

Compare detection strategies per chunk/detector #2922

Are you sure you want to change the base?

Compare detection strategies per chunk/detector #2922

Conversation

rgmz commented Jun 5, 2024 • edited Loading

Description:

Example

Checklist:

This comment was marked as outdated.

ahrav commented Jun 6, 2024

rgmz Dec 2, 2024

Choose a reason for hiding this comment

rgmz Dec 2, 2024

Choose a reason for hiding this comment

rgmz commented Jun 5, 2024 •

edited

Loading