Avoid repeated string join()s in analyze_macros() #750
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Change addressing #749
When an input file has on the order of ~100,000s of strings,
analyze_macros()
becomes very slow. Let's change this behavior to instead store extracted strings in their own structure, and add them to the module source afterwards instead of incrementally.Example with input file 4a87ee5ecd46a3fab735656b77d0e4fea8d3d72f3a6e0fb791999a2dfe8d59d2 (available on VirusTotal), using Python 3.9.10 on macOS:
Time performance without this patch:
python3 ./oletools/olevba.py > /dev/null 929.50s user 1896.56s system 58% cpu 1:20:32.49 total
Time performance with this patch:
python3 ./oletools/olevba.py > /dev/null 5.81s user 2.43s system 95% cpu 8.605 total