-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric queries return null data until whisper file exists #629
Metric queries return null data until whisper file exists #629
Comments
@obfuscurity: I didn't talk to @mleinart but I had a look in the code to change that behaviour by myself. IMHO it should be interesting to search the cache for series not yet present on disk, however it needs some changes since carbon cache store series in a hashtable, therefore resolving glob-expression will need another memory structure in addition to the hashtable (or, less likely, instead of the hashtable). This is probably why Graphite-web does not request carbon cache first. |
Any updates on this ? I am running into the same problem. |
@ocervell - unfortunately not yet. |
It completely slips off our radars. Developers on my work hitting that constantly, I even using dirty hack for 0.9.x |
Issue is 3 years old. Any update ? |
The big thing that would be needed for this to work would be the ability for carbon to respond to a find request, since right now if the whisper file doesn't exist then the standard finder is never going to find the series and the code won't even get as far as calling the whisper reader. If carbon were to provide a find method then it seems like it should be possible to move all the cache functionality out into a finder and reader for the cache, and handle merging the cached data into the final results via new the MultiReader mechanism. The biggest issue there seems to be aligning the data from the cache since it's stored with the raw timestamps and not aligned to a step. |
@ocervell : There're 2 main reasons why this issue is not in active development yet. First - that logic buried quite deep in whisper itself, and fix for this is quite big and massive, as @DanCech said. |
I'm hitting the above scenario. I have to lower MAX_UPDATES_PER_SECOND because it kills the disk's throughput but that causes the cache to increase. Then queries start returning null for a few mintues because the cache is not being checked. |
@deniszh , i saw that you included this issue into milestones 1.1.0 and 1.2.0, can you confirm in which version it will be definitely included ? Seems that we are struggling with the same problem. |
Milestone 1.2.0 means only 'not now'. Currently, we have no solution for that problem, and not actively working on it. |
I just wanted to add my two cents. We also are running a large graphite installation. This problem has been apparent for a while, but was tolerable because we were using nvme-backed storage. But I recently converted to iops-provisioned EBS volumes (because we were losing historical data every time an nvme drive bombed out) and the problem has been exasperated. It's problematic for us because a lot our monitoring and auto-scaling stuff relies on being able to retrieve timely metrics from the graphite front-end. I should note that for us, some metrics current values are retrievable from the cache, while others aren't. I don't know if this is because of a hashing problem or because of the sheer number of metrics in the cache at any given time. But in either case, the fact that it retrieves some of them means it shouldn't be a big deal to update the code to retrieve any of them, right? In our case, we're not using wildcards, if that makes a difference. |
This issue is about metrics for which a .wsp file does not exist yet. They never return results. I don't get why it would sometimes return data for you. In graphite-project/carbon#782 (comment) you indicated that the metrics you are having trouble with already have a .wsp file. Is this another issue you are experiencing? |
@piotr1212 Ok, if this issue #629 is about metrics for which a .wsp file does not exist yet, then my problem is different. Sorry to bother. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is still a thing AFAIK. |
There is no easy way to fix this without changing the whole architecture. |
@piotr1212 the whole architecture I don't think so, but it would need a kind of index in addition to the hashtable in the write cache, in order to be able to find new series names in the cache with a globing pattern and not only knowing their exact name after globing on the filesystem. |
@g76r Graphite-web needs to figure out which cache to query, as long as globbing is not done it just doesn't know which cache has that metric. Querying all caches on every metric with a glob is not scalable at all. So you will need a index daemon which keeps a list (or rather tree or some faster datastructure) of all metrics. When a cache receives a metric it will first have to check if it already exists on disk instead of just pushing it to cache and worrying about creating it later at write time. It can check it on disk which is slow or keep an internal index. With an internal index you have to reindex at startup or save it on disk. |
@piotr1212 graphite-web already query every carbon instance, to complete on-disk data with its still-in-memory data. because graphite-web has already no way to known which cache contains still-in-memory data. |
PS: |
@piotr1212 imho what you call a distributed index is actually carbon cache daemon |
No, it uses carbon consistent hashing to determine which cache to query. I don't know if it ever worked differently. basically:
|
Another option which would be easy to implement would be to create the new files earlier. |
In large Graphite installations the queue's can get really long. It can take an hour for Graphite to write all metrics in queue. New db files are created when the metric is written, which can take too long. This separates the creation of metrics from writing data to them and moves the creation to an earlier moment. Whenever a new metric is received it's name is pushed to a new_metric list. The first step in the writer loop is to check if there are new metrics received and creates them if they don't exist on disk yet. After the creation the writer continues as usual with writing metrics from the queue but it does not check if the file already exists, to prevent that the check occurs twice and has impact on IO. If the file does not exists at thie point it is logged. Fixes: graphite-project/graphite-web#629
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Scenario
Cause
Querying the webapp for a specific metric path returns no data until it hits disk even if data is in the cache, due to the short-circuit linked here.
Solution
Update
readers.py
such that it performs a best-effort check of the cache for the requested metric path before giving up and returningNone
.Downsides/challenges
The text was updated successfully, but these errors were encountered: