Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flamegraph generation memory usage is quite high for large inputs #201

Open
itamarst opened this issue Dec 6, 2020 · 7 comments
Open

Flamegraph generation memory usage is quite high for large inputs #201

itamarst opened this issue Dec 6, 2020 · 7 comments

Comments

@itamarst
Copy link
Contributor

itamarst commented Dec 6, 2020

Memory usage for inferno-flamegraph, and the equivalent Rust API usage, is proportional to input file size. A 44MB file results in 60MB memory usage for me, a 3KB input file results in 3MB (presumably the minimum).

I discovered this when processing a 440MB file, which resulted in hundreds of MB RAM usage, which is embarrassing when one is implementing a memory profiler 😁 So now I'm prefiltering out tiny irrelevant frames, which is why it's 44MB and not 440MB. Still, less memory usage would be nice.

Now, the output file is typically more like a 1 megabyte or less, because all those repeating frames in the input file get combined into a graph in the output. So it ought to be possible to reduce memory usage quite a lot in the internal representation as well.

My completely unverified guess as to the problem: my input files have quite long strings for frame names, and multiple copies of each string are being stored in memory when the data structures are built up. If this is the case, usage of a string interner in the right place might be quite helpful, and potentially even speed up runtime because more data would fit in the CPU memory caches.

@jonhoo
Copy link
Owner

jonhoo commented Dec 6, 2020

That is interesting indeed. Are you seeing this problem with inferno-flamegraph or inferno-stack-collapse? If it is indeed the former, my guess is that this comes from the need to sort the input before processing it. It's an unfortunate property of the algorithm it uses to merge stack frames that it requires the input to be sorted, which means reading it all into memory and then sorting the lines. If your input is already sorted, you could try the --no-sort flag which assumes the input is already sorted, and therefore should be able to avoid reading it all into memory.

As far as the blow-up goes, a 1.4x blowup is unfortunate, though surprisingly good given little mind has been paid to optimizing memory use. That is, given that the input file has to be in memory to sort it! I'm a little strapped for time, but if you want to do some digging, this article has some good tips on profiling memory use in Rust!

@itamarst
Copy link
Contributor Author

itamarst commented Dec 6, 2020

Pre-sorting might help for me, yeah. But—

Consider an input that looks like this:

A;B;C 123
A;B;D 345

The strings A and B repeat. In fact, most of the memory usage of loaded lines will be repeated. Which is what makes me think you can not just go from 1.4× to 1×, you can plausibly go to 0.1× or even better.

@jonhoo
Copy link
Owner

jonhoo commented Dec 8, 2020

That's a neat idea. I wonder how well it'll turn out in practice though. Currently we store a single string A;B;C and a single string A;B;D, but with the proposed change we'd store two vectors each holding three (interned) strings. Assuming interned strings are stored as a u32, that's:

 2x String (8b pointer, 8b length, 8b capacity) + "A;B;C" + "A;B;D" = 58b

versus

2xVec (8b pointer, 8b length, 8b capacity) + 4x distinct interned Strings (24b) + 6u32s + "A"+"B"+"C"+"D" = 172b

Of course the benefits add up the more strings there are, but if the strings are generally short (like main) it probably doesn't end up buying that much. It'd be super interesting to see experiments on this on some real traces!

@itamarst
Copy link
Contributor Author

Going to try to do this.

@itamarst
Copy link
Contributor Author

Or at least, try to find some way to reduce memory usage, I'm getting hundreds of MB in memory use.

@itamarst
Copy link
Contributor Author

It's possible that it's easier for me to just do this on my side, where each unique frame text is mapped to a unique unicode character, and then I search and replace on the resulting SVG. Which feels terrible, but is plausibly less work. Will think about it some more as I read code.

@itamarst
Copy link
Contributor Author

After further thought, going to go back and see why my files are so big, the input file sizes do seem excessive even for a worst-case scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants