Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-line include/exclude in pipeline mode #508

Open
johnhtodd opened this issue Dec 14, 2023 · 3 comments
Open

Multi-line include/exclude in pipeline mode #508

johnhtodd opened this issue Dec 14, 2023 · 3 comments

Comments

@johnhtodd
Copy link

Is your feature request related to a problem? Please describe.
It would be useful to support multiple lines in matching syntax, with an "OR" implied between the lines that have the same keypair string.

Describe the solution you'd like
Currently, in the pipeline mode branch, there exists syntax like this:

  - name: tag-queries
    dnsmessage:
      matching:
        include:
          dnstap.operation: "CLIENT_Q*."
          dns.qname: "^.*\\.google\\.com$"
        greater-than:
          dns.length: 50
      policy: "drop-unmatched"
    transforms:
      atags:
        tags: [ "TAG-QUERIES:tag-queries" ]
    routes: [ match-queries ]

It would be very useful to have additional matching performed without jamming it all on one line, so something like this:

  - name: tag-queries
    dnsmessage:
      matching:
        include:
          dnstap.operation: "CLIENT_Q*."
          dns.qname: "^.*\\.google\\.com$"
          dns.qname: "^.*\\youtube\\.com$"
          dns.qname: "^.*\\gmail\\.com$"
        greater-than:
          dns.length: 50
      policy: "drop-unmatched"
    transforms:
      atags:
        tags: [ "TAG-QUERIES:tag-queries" ]
    routes: [ match-queries ]

Similarly with "exclude:" lines (no example shown.) This wouldn't be limited to "dns.qname" - it would be for any matched component of the packet.

Describe alternatives you've considered
Making a giant unmanageable regexp on one line is... possible. But terrifying. If I have only a few matching statements, it would be great to just put a few lines in.

It also would be ideal if files were supported in matching lines, so very long lists of include/exclude filters could be ingested from an external source. So:
dns.qname: file:/var/collector/names-to-include.txt
... but this seems like a separate feature request. :-)

@dmachard
Copy link
Owner

dmachard commented Dec 17, 2023

List of regex can be easily supported with minor update (more easy to implement)

dnsmessage:
  matching:
    include:
      dns.qtype: [ "TXT", "MX" ]
      dns.qname: 
        - "^*.apple.com$"
        - "^*.google.com$"

Here a adaptation of the configuration for file support in a generic way

dnsmessage:
  matching:
    include:
      dns.opcode: 0
      dns.length:
        greater-than: 50
      dns.qname:
        file-list: "./testsdata/filtering_keep_domains_regex.txt"
        file-kind: "domain_list"
    exclude:
      dns.qtype: [ "TXT", "MX" ]
  policy: "drop-unmatched"

This logic has been implemented in the pipeline branch.

@johnhtodd
Copy link
Author

johnhtodd commented Dec 18, 2023

This is good - I'll look at it on Tuesday when I'm back from travel. Thank you for the quick code changes!

I'm not quite clear why the "file-kind" definition is required. Wouldn't the match depend on what context the matching file is loaded into? Why would there need to be any parsing of any kind? I can see how matching can be applied to qname, resource records, EDNS data, geoIP data, TLD data, qtype... pretty much any field.

I'm very interested in how matching can apply to tags, because tag management deeper in the processing chain (on different machines, centrally located) seems to me to be a critical part of how go-dnscollector arrays interact with each other. Otherwise, we are left using (argh!) port numbers as indicators of intent, which makes me sad.

I made my example a bit more generic to perhaps allow for expansion in the future.

Your example is this:

      dns.qname:
        file-list: "./testsdata/filtering_keep_domains_regex.txt"
        file-kind: "domain_list"

My example thinking looks more like this:

      dns.qname:
        match-source: "file:./testsdata/filtering_keep_domains_regex.txt"

because maybe the future could have something like this:

      dns.qname:
        match-source: "https://filters.example.com/testsdata/filtering_keep_domains_regex.txt"
        match-source-refresh: 86400

...and it is possible to imagine future plug-in methods like "script:" or "sftp:" or "axfr:" for developers who want to be adventurous.

@dmachard
Copy link
Owner

I'm not quite clear why the "file-kind" definition is required. Wouldn't the match depend on what context the matching file is loaded into? Why would there need to be any parsing of any kind? I can see how matching can be applied to qname, resource records, EDNS data, geoIP data, TLD data, qtype... pretty much any field.

It can be necessary to known the type of content

  • If the source list contains IPs, I need to known that to preload IP with the specific internal golang dataset type
  • If the source contains a list of regex, I need to known that to compile each regex before to start
  • if the source list contains just basic string without regex, we need to known that to avoid to use regex
  • etc...

otherwise your match-source with plugin approach is better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants