Multi-line include/exclude in pipeline mode #508

johnhtodd · 2023-12-14T01:19:06Z

Is your feature request related to a problem? Please describe.
It would be useful to support multiple lines in matching syntax, with an "OR" implied between the lines that have the same keypair string.

Describe the solution you'd like
Currently, in the pipeline mode branch, there exists syntax like this:

  - name: tag-queries
    dnsmessage:
      matching:
        include:
          dnstap.operation: "CLIENT_Q*."
          dns.qname: "^.*\\.google\\.com$"
        greater-than:
          dns.length: 50
      policy: "drop-unmatched"
    transforms:
      atags:
        tags: [ "TAG-QUERIES:tag-queries" ]
    routes: [ match-queries ]

It would be very useful to have additional matching performed without jamming it all on one line, so something like this:

  - name: tag-queries
    dnsmessage:
      matching:
        include:
          dnstap.operation: "CLIENT_Q*."
          dns.qname: "^.*\\.google\\.com$"
          dns.qname: "^.*\\youtube\\.com$"
          dns.qname: "^.*\\gmail\\.com$"
        greater-than:
          dns.length: 50
      policy: "drop-unmatched"
    transforms:
      atags:
        tags: [ "TAG-QUERIES:tag-queries" ]
    routes: [ match-queries ]

Similarly with "exclude:" lines (no example shown.) This wouldn't be limited to "dns.qname" - it would be for any matched component of the packet.

Describe alternatives you've considered
Making a giant unmanageable regexp on one line is... possible. But terrifying. If I have only a few matching statements, it would be great to just put a few lines in.

It also would be ideal if files were supported in matching lines, so very long lists of include/exclude filters could be ingested from an external source. So:
dns.qname: file:/var/collector/names-to-include.txt
... but this seems like a separate feature request. :-)

The text was updated successfully, but these errors were encountered:

dmachard · 2023-12-17T21:03:20Z

List of regex can be easily supported with minor update (more easy to implement)

dnsmessage:
  matching:
    include:
      dns.qtype: [ "TXT", "MX" ]
      dns.qname: 
        - "^*.apple.com$"
        - "^*.google.com$"

Here a adaptation of the configuration for file support in a generic way

dnsmessage:
  matching:
    include:
      dns.opcode: 0
      dns.length:
        greater-than: 50
      dns.qname:
        file-list: "./testsdata/filtering_keep_domains_regex.txt"
        file-kind: "domain_list"
    exclude:
      dns.qtype: [ "TXT", "MX" ]
  policy: "drop-unmatched"

This logic has been implemented in the pipeline branch.

johnhtodd · 2023-12-18T06:08:18Z

This is good - I'll look at it on Tuesday when I'm back from travel. Thank you for the quick code changes!

I'm not quite clear why the "file-kind" definition is required. Wouldn't the match depend on what context the matching file is loaded into? Why would there need to be any parsing of any kind? I can see how matching can be applied to qname, resource records, EDNS data, geoIP data, TLD data, qtype... pretty much any field.

I'm very interested in how matching can apply to tags, because tag management deeper in the processing chain (on different machines, centrally located) seems to me to be a critical part of how go-dnscollector arrays interact with each other. Otherwise, we are left using (argh!) port numbers as indicators of intent, which makes me sad.

I made my example a bit more generic to perhaps allow for expansion in the future.

Your example is this:

      dns.qname:
        file-list: "./testsdata/filtering_keep_domains_regex.txt"
        file-kind: "domain_list"

My example thinking looks more like this:

      dns.qname:
        match-source: "file:./testsdata/filtering_keep_domains_regex.txt"

because maybe the future could have something like this:

      dns.qname:
        match-source: "https://filters.example.com/testsdata/filtering_keep_domains_regex.txt"
        match-source-refresh: 86400

...and it is possible to imagine future plug-in methods like "script:" or "sftp:" or "axfr:" for developers who want to be adventurous.

dmachard · 2023-12-18T10:24:18Z

I'm not quite clear why the "file-kind" definition is required. Wouldn't the match depend on what context the matching file is loaded into? Why would there need to be any parsing of any kind? I can see how matching can be applied to qname, resource records, EDNS data, geoIP data, TLD data, qtype... pretty much any field.

It can be necessary to known the type of content

If the source list contains IPs, I need to known that to preload IP with the specific internal golang dataset type
If the source contains a list of regex, I need to known that to compile each regex before to start
if the source list contains just basic string without regex, we need to known that to avoid to use regex
etc...

otherwise your match-source with plugin approach is better :)

dmachard added the feature request label Dec 15, 2023

dmachard added the pipelines label Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-line include/exclude in pipeline mode #508

Multi-line include/exclude in pipeline mode #508

johnhtodd commented Dec 14, 2023

dmachard commented Dec 17, 2023 •

edited

Loading

johnhtodd commented Dec 18, 2023 •

edited

Loading

dmachard commented Dec 18, 2023

Multi-line include/exclude in pipeline mode #508

Multi-line include/exclude in pipeline mode #508

Comments

johnhtodd commented Dec 14, 2023

dmachard commented Dec 17, 2023 • edited Loading

johnhtodd commented Dec 18, 2023 • edited Loading

dmachard commented Dec 18, 2023

dmachard commented Dec 17, 2023 •

edited

Loading

johnhtodd commented Dec 18, 2023 •

edited

Loading