-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How diffsitter works, How performant it is and What cases are exemplatory good and bad #149
Comments
I can write up how it works. I started out with a very naive min-edit-distance algorithm, but I was thinking of switching to Myers since you're effectively comparing an in-order traversal of the graph and Myers is extremely efficient in terms of space (profiling showed that the current biggest bottleneck in diffsitter is I can also try operating directly on the graph differences and see which approach is the fastest, I've been meaning to improve the performance for a while now. I'll add documentation with the known problems and non-goals soon (TM). These are great suggestions, thanks :) |
What do the devs here think about considering various aspects mentioned here in a future release? |
@afnanenayet Curious what the state of this is in 2024? Searching the repo, it looks like the Myers algorithm may have been implemented in September 2021, eg.: I was just trying to diff a large bundled/minified JS file (~8.4MB; ~249,128 lines long), and noticed that majority of the time was spent in the Minimising the debug output to just the relevant timing events: // Start
2024-01-30T07:07:58.282Z INFO libdiffsitter::parse > Deduced language "typescript" from extension "js" provided from user mappings
// ..snip..
2024-01-30T07:07:59.791Z INFO TimerFinished > parse::parse_file(), Elapsed=1.50364822s
// ..snip..
2024-01-30T07:08:01.160Z INFO TimerFinished > parse::parse_file(), Elapsed=1.364231801s
2024-01-30T07:08:01.985Z INFO TimerFinished > ast::from_ts_tree(), Elapsed=825.767557ms
2024-01-30T07:08:02.972Z INFO TimerFinished > ast::process(), Elapsed=1.812168007s
2024-01-30T07:08:03.814Z INFO TimerFinished > ast::from_ts_tree(), Elapsed=841.781925ms
2024-01-30T07:08:04.787Z INFO TimerFinished > ast::process(), Elapsed=1.815188386s
2024-01-30T07:20:22.478Z INFO TimerFinished > diff::compute_edit_script(), Elapsed=737.685108096s
// ..snip..
2024-01-30T07:20:22.822Z DEBUG libdiffsitter::render::unified > Printing line 96301
// Finish As a point of comparison, ⇒ time git difftool --tool difftastic HEAD~1 HEAD -- unpacked/_next/static/chunks/pages/_app.js | subl
git difftool --tool difftastic HEAD~1 HEAD -- 2.63s user 0.46s system 47% cpu 6.494 total
subl 0.01s user 0.02s system 0% cpu 6.746 total Though with a lot of this printed among the diff output:
Edit: It seems when
I tried overriding that in my # https://github.com/Wilfred/difftastic
[difftool "difftastic"]
cmd = difft --byte-limit 20971520 "$LOCAL" "$REMOTE" And then running it again, but then I just got a different set of errors: ⇒ time git difftool --tool difftastic HEAD~1 HEAD -- unpacked/_next/static/chunks/pages/_app.js | subl
git difftool --tool difftastic HEAD~1 HEAD -- 12.42s user 1.10s system 79% cpu 17.043 total
subl 0.01s user 0.02s system 0% cpu 17.248 total
|
Could you send me the file you were trying to diff? Would love to do some profiling and see if there's anything I can do to make diffsitter faster. Alternatively we could have some timeout for computing the edit script and fallback to a simpler diff algo or error out. I'm actually working on ditching the myers diff algo in favor of computing a diff on the tree directly |
Also very intrigued by this. What did you see that led you to suspect this? If you get a chance to investigate feel free to throw me whatever you got. Thanks for all the work you've put in to raising issues and helping with debugging, it's much appreciated! |
@afnanenayet Was trying to diff these 2 (basically to see the changes between the 2 builds/commits):
@afnanenayet For my specific use case / interest here, timing out would be suboptimal, as ideally I would like it to run even if it takes a longer time to do it (assuming the output ends up being good). Though a timeout/error for others in more normal use cases might make sense.
@afnanenayet Oo, interesting! Is that for speed reasons, or better outcomes, or?
@afnanenayet See below for more details, but it was basically a gut feel based on not seeing the diff output (I might have closed the file it was piping to by accident though), and not seeing the 'end hunk' line in the debug logs. You can see the full output/etc in this comment, look for this part:
@afnanenayet Thanks for making an awesome tool! It's exciting to see such quick/positive responses; I had somewhat assumed that it might have been abandonware. |
Nope! Not abandonware, I'm just busy and have been working on these things really slowly |
Haha, totally understand that feel. |
Difftastic has a complete section dedicated what methods it uses, but has the performance not written in the repo but in the inspired projects:
https://github.com/Wilfred/difftastic#how-it-works
Could you elaborate what algorithm you use for the diff? A* and dijkstra for graph difference are the most prominent ones, where A* is much more performant on multiple changes (in a function) but has overall worse results.
Performance of examples (ideally with benchmarks), Known problems and Non-Goals would complete the first impression.
I am asking due for an evaluation of neovim what to use, but I am not affiliated or core member (only neovim user) and would like to see things moving forward.
The text was updated successfully, but these errors were encountered: