Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badger storage plugin: query service to support spanKind when retrieve operations for a given service. #1922

Open
guo0693 opened this issue Nov 13, 2019 · 14 comments · May be fixed by #6376
Open
Labels
good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/badger Issues related to badger storage

Comments

@guo0693
Copy link
Contributor

guo0693 commented Nov 13, 2019

Requirement - what kind of business use case are you trying to solve?

Related to #1920

Update badger reader and writer such that we can retrieve operations by service name and span kind

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

Any open questions to address

@guo0693 guo0693 changed the title Badger storage update: query service to support spanKind when retrieve operations for a given service. Badger storage plugin: query service to support spanKind when retrieve operations for a given service. Nov 13, 2019
@burmanm
Copy link
Contributor

burmanm commented Nov 17, 2019

I'll add notes here so that I don't forget (or if someone else is implementing):

  • A single index key is enough for serviceName, operationName + span.Kind indexing

Thus, replace the following code from reader:

		if query.OperationName != "" {
			indexSearchKey = append(indexSearchKey, operationNameIndexKey)
			indexSearchKey = append(indexSearchKey, []byte(query.ServiceName+query.OperationName)...)
		} else {
			indexSearchKey = append(indexSearchKey, serviceNameIndexKey)
			indexSearchKey = append(indexSearchKey, []byte(query.ServiceName)...)
		}

Instead, write a single key with operationIndexKey and in the form of query.ServiceName+query.OperationName+tags.span.kind. Then, in the reader scanning, take into account the length of the searched index key such as query.ServiceName has total length of serviceName+overhead+timestamp+sizeOfTraceID and not more. This can be added to the scanFunction to skip keys which do not match here (or stop searching if we bypassed the match).

In the cache.go, remove two full-table-scans and replace with one which would warm the cache for both services, operations as well any theoretical operations+kind query.

Bonus is one less key written to the db with a slight increase in complexity for reading.

@guo0693
Copy link
Contributor Author

guo0693 commented Dec 2, 2019

Not familiar with badger, but should the key be query.ServiceName+tags.span.kind+query.OperationName because we want to get a list of operation names given the service name & span.kind?

@burmanm
Copy link
Contributor

burmanm commented Dec 2, 2019

It slightly depends. There's two different use-cases for that information, fetching the list of operations vs fetching the list of traces. For traces, is the span.kind used or not? You should optimize for that use-case and then the "list of operations" for UI purposes is done with the cache.go and a table scan of possible pairs of span.kind + operations meaning it doesn't really matter in which order they are.

To clarify, the list of available operations is done once in the startup and never again before a restart. So don't optimize for that, but optimize for the use-case of traces-fetching where the order for smaller TraceID scanning is more important.

@guo0693
Copy link
Contributor Author

guo0693 commented Dec 2, 2019

Make sense. I didn't know it's also used for fetching traces.

@pavolloffay pavolloffay added the storage/badger Issues related to badger storage label Dec 3, 2019
@yurishkuro yurishkuro added help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners labels Feb 3, 2024
@Manik2708
Copy link
Contributor

As far I could understand, operations are fetched only from cache not from the store! So do we need to change the cache only? @yurishkuro

@yurishkuro
Copy link
Member

Cache loads from the store, to the store needs to support this

@Manik2708
Copy link
Contributor

Cache loads from the store, to the store needs to support this

Got it! Thinking of changing the service key to serviceName+span.kind and operations to serviceName+span.kind+operationName. Will it be viable?

@yurishkuro
Copy link
Member

Probably. As long as the list can be scanned.

@Manik2708
Copy link
Contributor

Cache loads from the store, to the store needs to support this

Got it! Thinking of changing the service key to serviceName+span.kind and operations to serviceName+span.kind+operationName. Will it be viable?

There is a problem in this approach! For getting traces, span.kind is always to be passed in the Query as a tag and fetching the spans of all kind will be impossible without multiple iterations and scans. The approach given in this: #1922 (comment) is also leading to this problem. I could think of this solution:
Creating an additional operationNameWithKindIndexKey with span.kind instead of embedding them in existing keys. When in tags "span.kind" would be present we can use this key else we could use the default operation key (without kind hence will fetch spans for all kinds). With this we can create a migration script to create this new key for all operations, then cache can be refilled with operations and its kinds! If getting spans of all kinds is not a concern then we can skip this and embed the span.kind in operationNameIndexKey and when no tags of span.kind are given we can by default add an unspecified tag while reading that implies when no tags will be given only unspecified spans would be fetched. Migration script here would be adding the kinds in same key. @yurishkuro Please see this and give me direction or a more appropriate way!

@yurishkuro
Copy link
Member

The approach given in this: #1922 (comment) is also leading to this problem.

Please elaborate. Why would this lead to extra scanning?

You also talk about migration - the migration is to be avoided at all costs. Users should be able to upgrade to next version of Jaeger and still use their existing data. If we must we can implement dual look-ups and phase them out eventually (e.g. make them opt-out first, then opt-in).

@Manik2708
Copy link
Contributor

Please elaborate. Why would this lead to extra scanning?

Because as far I could understand from the comment is that it is trying to change the index to <operationNameIndexKey><serviceName+operationName+spanKind><startTime><traceId>. Now 2 cases rises:

  1. When user sends a tag of "span.kind" then we can just add the kind to operationNameIndexKey while reading and send back the result.
  2. When no tag is sent by the user, this implies we need to send spans of all kind. Now scanning happens in a way that if we want to find trace ids now, we have to embed the kind that means either we have to scan for all kinds or we have to append 6 keys (as there are 6 kinds of span) to the reader so that all kinds can be sent

Now regarding migration, if we have to avoid it, we can do one thing we can look up for old index during two cases, when no span kind tag is sent or when span kind of unspecified is sent, this means we can label as all the old spans as unspecified. I don't know whether this is correct! Would be interested to know your as I am new to this changing database structures in production!

@yurishkuro
Copy link
Member

I tend to agree that a single combined index is not the right approach. It enforces a specific hierarchy service > operation > kind, which would be ok if that was the end of the key, but since we also follow this with the timestamps then the searches for only higher levels of hierarchy cannot be efficiently reduced to known time ranges only.

I think given the current architecture of the indices we just need to add a 3rd index that includes the kind. It is similar to the overall indexing approach we did in Cassandra. And this would be easily backwards compatible. I would say the old data that was not written with the extra index will not be searchable if the kind is present - perhaps we can introduce an optional cutoff_time parameter that will help the storage make these decisions. What I would rather avoid is having to support with-kind queries against the old index, as it means loading all the traces for a given operation and then filtering them out by span_kind, which is messy.

@Manik2708
Copy link
Contributor

Ok, so for now what we could do is introduce a 3rd index with kind and old data will not be accessible when kind tag is there from user but I couldn't understand the working of cutoff_time. How will it help the storage to make decisions?

@yurishkuro
Copy link
Member

right now the cutoff isn't especially useful because the UI ticket for querying with span kind is still not implemented. If it were implemented we would have to decide how to deal with that.

I also think we should include support for this query in the storage capabilities API so that UI could not try to query by kind if the backend doesn't support that.

@Manik2708 Manik2708 linked a pull request Dec 17, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement storage/badger Issues related to badger storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants