Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Feature request: node hashes #78

Open
vmarkovtsev opened this issue Mar 8, 2018 · 6 comments · May be fixed by #105
Open

Feature request: node hashes #78

vmarkovtsev opened this issue Mar 8, 2018 · 6 comments · May be fixed by #105
Assignees

Comments

@vmarkovtsev
Copy link
Contributor

I have already seen several times that it is often required to quickly compare two sets of UAST nodes.
The naive approach is to compare each to each which is O(n^2). If we had a standard way to hash UAST nodes, we would solve this problem in O(n).

There are two ways of hashing nodes: a single tiny change rewrites the hash (1) and locality sensitive hashing (2). (1) is easy to implement by hashing the serialization of a node. (2) requires some research, and I've got a sufficiently working implementation in https://github.com/src-d/treediff/blob/master/treediff.py#L33 which recursively uses hashes of the sorted children.

@juanjux
Copy link
Contributor

juanjux commented Mar 8, 2018

You could also use the node positions in the tree relative to the root as "hash" (key, really). We'll almost surely be adding hashes or keys as part of the "UAST 2" since they'll be needed to implement some of the approved features.

@vmarkovtsev
Copy link
Contributor Author

vmarkovtsev commented Mar 8, 2018

Node positions are not influenced by the token values, are they.

@juanjux
Copy link
Contributor

juanjux commented Mar 8, 2018

Sorry, I wasn't clear, I didn't mean the token positions in the source code but the node positions in the tree, take a look at the ascii "art" at the start of this (superceed) draft:

https://gist.github.com/juanjux/5c905c8bfbf2091b6da43eb76dd822e9

@dennwc
Copy link
Member

dennwc commented Sep 5, 2018

@vmarkovtsev Do you need the hashes specifically or (a hash-based) Equal operation will work for you as well?

@vmarkovtsev
Copy link
Contributor Author

Yep tree positions matter.

I need the explicit hashes, Equal does not help with going from O(n^2) to O(n)

@dennwc
Copy link
Member

dennwc commented Sep 6, 2018

I assumed that this method can compare node sets and hashes are cached by libuast automatically.

But yeah, explicit hashes are easier to implement.

@dennwc dennwc self-assigned this May 2, 2019
dennwc pushed a commit to dennwc/libuast that referenced this issue Jun 14, 2019
Signed-off-by: Denys Smirnov <denys@sourced.tech>
dennwc pushed a commit to dennwc/libuast that referenced this issue Jun 25, 2019
Signed-off-by: Denys Smirnov <denys@sourced.tech>
@dennwc dennwc linked a pull request Jun 25, 2019 that will close this issue
dennwc pushed a commit to dennwc/libuast that referenced this issue Jul 4, 2019
Signed-off-by: Denys Smirnov <denys@sourced.tech>
dennwc pushed a commit to dennwc/libuast that referenced this issue Jul 4, 2019
Signed-off-by: Denys Smirnov <denys@sourced.tech>
dennwc pushed a commit to dennwc/libuast that referenced this issue Jul 9, 2019
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants