Query versioning + Sharding Support #2163

nyanshak · 2019-12-06T01:36:15Z

I've been trying to figure out how to slowly roll out changes to queries, and I've been pointed to sharding, which looks like an excellent solution for new queries, but I feel could provide a better experience for modifying existing queries.

New queries: Create new query with, say, 5% rollout, then gradually increase to 100% rollout over time.

Existing queries (current situation): Create new query with a different name (since queries can't have the same name), with rollout 5%. Gradually increase to 100%, then delete the old query. This also involves alerting config (on logs or whatever) to be modified to look at both names, and there's likely to be some overlap where you'll log duplicate data (once for old query, once for new query, as the changes are rolled out).

Existing queries (proposal): Add version support for queries. As a user, you might start with v0 of the query, rolled out similar to "New queries" above. When you want to modify the query, you create a v1 of the query, with 5% rollout, and decrease rollout of v0 to 95%. Gradually you shift the balance until v1 has 100% rollout and v0 has 0% rollout.

Additional impacts: fleetctl would somehow need to support this concept, so one example (pretty rough, definitely would want to polish this with the community):

---
apiVersion: v1
kind: query
spec:
  name: example_query
  description: Example Query
  queries:
    - query: select * from foo;
      version: 0
      rollout: 95
    - query: select foo as foo_rename, bar from foo;
      version: 1
      rollout: 5

The text was updated successfully, but these errors were encountered:

zwass · 2020-01-29T18:50:56Z

I have put some thought into this, and it's not clear to me that the proposed concept would improve the situation.

You note as a disadvantage of the current approach:

This also involves alerting config (on logs or whatever) to be modified to look at both names, and there's likely to be some overlap where you'll log duplicate data (once for old query, once for new query, as the changes are rolled out).

It seems to me that the proposal would come with further concerns about this:

Alerting config must be specified in a way that it is resilient to changes to the schema of the output logs.
Alerting seems likely to miss alerts unless it is properly configured to use both old and new columns.

I can see that it might be nice to reduce duplication of data, but I'm not sure it is worth the tradeoff.

In the current situation, adding some_query_v2 and alerting on that allows you to side-by-side test that new query with the old query, without disturbing your pipelines. Then you could phase out the existing query.

What do you think? I am open to further discussion on this.

nyanshak · 2020-02-02T17:27:17Z

Alerting config must be specified in a way that it is resilient to changes to the schema of the output logs.
Alerting seems likely to miss alerts unless it is properly configured to use both old and new columns.

I think these are both great concerns, and I think that they're problematic in either scenario (keep things the way things are or change to this proposal). I didn't intend this proposal to primarily address these problems. I think you could end up with the exact same scenario with alerting config either way.

Keep things the way they are:

alert 0: name=query_v0 <some_other_search>
alert 1: name=query_v1 <some_maybe_changed_search_fields>

Proposal:

alert 0: name=query version=0 <some_other_search>
alert 1: name=query version=1 <some_other_search>
Or alternately, the name field could append a version, maybe with fleet options.

It doesn't really do much to solve the alerting config problems, but it makes it much easier to support progressive rollout of query changes.

For example, let's say that I have a query that I know is expensive. I want to modify the query to JOIN on a new table to get some additional field added to the results. If it's an expensive query, I don't want to run the query essentially multiple times on the same host because of the potential impact to the normal workload of the system. I am also concerned that there may be problems with the new query, so I don't want to roll out the changes to all of my hosts at once.

In the current system, I'm really not sure how to do this. The best I can think of is to somehow divide the hosts by decorators and create separate queries to target each decorator, but this involves duplicating query config in addition to the alert config. As in query_v0 targets decorator_0, query_v1 targets decorator_1, etc, and then the alert configs now need to be duplicated as well, alert_query_v0, alert_query_v1, ...

In the proposed system, I would be able to apply different versions of the query to some different % of hosts, meaning I don't have to manage a complicated decorator / query config, while also providing benefits of progressive rollouts of new queries.

After writing this out, I imagine the actual implementation would have the targeting of the versions applied in the pack config rather than directly in the query config to be more consistent with the separation of concerns of packs vs queries.

Maybe you still would use separate queries, but add additional targeting abilities to the pack config:

---
apiVersion: v1
kind: pack
spec:
  name: example_pack
  queries:
    - name: my_awesome_query # assume this will be the `name` in logs
      interval: 7200
      snapshot: true
      versions:
        - name: my_query_v0 # this is a query defined elsewhere, called my_query_v0
          version: 0         # to possibly tag the query, could be auto-generated
          rollout: 95
        - name: my_query_v1 # this is a query defined elsewhere, called my_query_v0
          version: 1         # to possibly tag the query, could be auto-generated
          rollout: 5
---
... definitions for my_query_v0, my_query_v1 here ...

tl;dr: I think the proposal doesn't as much target the alerting config, but rather is aimed at providing progressive rollouts of new / modified queries.

zwass added the Feature Request label Jan 14, 2020

zwass added the Needs: Discussion label Jan 29, 2020

nyanshak mentioned this issue Jan 30, 2021

Blueprint discussion: query versioning / sharding support fleetdm/fleet#249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query versioning + Sharding Support #2163

Query versioning + Sharding Support #2163

nyanshak commented Dec 6, 2019

zwass commented Jan 29, 2020

nyanshak commented Feb 2, 2020

Query versioning + Sharding Support #2163

Query versioning + Sharding Support #2163

Comments

nyanshak commented Dec 6, 2019

zwass commented Jan 29, 2020

nyanshak commented Feb 2, 2020