-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Badger storage plugin: query service to support spanKind when retrieve operations for a given service. #1922
Comments
I'll add notes here so that I don't forget (or if someone else is implementing):
Thus, replace the following code from reader: if query.OperationName != "" {
indexSearchKey = append(indexSearchKey, operationNameIndexKey)
indexSearchKey = append(indexSearchKey, []byte(query.ServiceName+query.OperationName)...)
} else {
indexSearchKey = append(indexSearchKey, serviceNameIndexKey)
indexSearchKey = append(indexSearchKey, []byte(query.ServiceName)...)
} Instead, write a single key with operationIndexKey and in the form of In the cache.go, remove two full-table-scans and replace with one which would warm the cache for both services, operations as well any theoretical operations+kind query. Bonus is one less key written to the db with a slight increase in complexity for reading. |
Not familiar with badger, but should the key be query.ServiceName+tags.span.kind+query.OperationName because we want to get a list of operation names given the service name & span.kind? |
It slightly depends. There's two different use-cases for that information, fetching the list of operations vs fetching the list of traces. For traces, is the span.kind used or not? You should optimize for that use-case and then the "list of operations" for UI purposes is done with the cache.go and a table scan of possible pairs of span.kind + operations meaning it doesn't really matter in which order they are. To clarify, the list of available operations is done once in the startup and never again before a restart. So don't optimize for that, but optimize for the use-case of traces-fetching where the order for smaller TraceID scanning is more important. |
Make sense. I didn't know it's also used for fetching traces. |
As far I could understand, operations are fetched only from cache not from the store! So do we need to change the cache only? @yurishkuro |
Cache loads from the store, to the store needs to support this |
Got it! Thinking of changing the service key to serviceName+span.kind and operations to serviceName+span.kind+operationName. Will it be viable? |
Probably. As long as the list can be scanned. |
There is a problem in this approach! For getting traces, |
Please elaborate. Why would this lead to extra scanning? You also talk about migration - the migration is to be avoided at all costs. Users should be able to upgrade to next version of Jaeger and still use their existing data. If we must we can implement dual look-ups and phase them out eventually (e.g. make them opt-out first, then opt-in). |
Because as far I could understand from the comment is that it is trying to change the index to
Now regarding migration, if we have to avoid it, we can do one thing we can look up for old index during two cases, when no span kind tag is sent or when span kind of unspecified is sent, this means we can label as all the old spans as unspecified. I don't know whether this is correct! Would be interested to know your as I am new to this changing database structures in production! |
I tend to agree that a single combined index is not the right approach. It enforces a specific hierarchy I think given the current architecture of the indices we just need to add a 3rd index that includes the kind. It is similar to the overall indexing approach we did in Cassandra. And this would be easily backwards compatible. I would say the old data that was not written with the extra index will not be searchable if the |
Ok, so for now what we could do is introduce a 3rd index with kind and old data will not be accessible when kind tag is there from user but I couldn't understand the working of |
right now the cutoff isn't especially useful because the UI ticket for querying with span kind is still not implemented. If it were implemented we would have to decide how to deal with that. I also think we should include support for this query in the storage capabilities API so that UI could not try to query by kind if the backend doesn't support that. |
Requirement - what kind of business use case are you trying to solve?
Related to #1920
Update badger reader and writer such that we can retrieve operations by service name and span kind
Problem - what in Jaeger blocks you from solving the requirement?
Proposal - what do you suggest to solve the problem or improve the existing situation?
Any open questions to address
The text was updated successfully, but these errors were encountered: