-
Notifications
You must be signed in to change notification settings - Fork 15
Feature request: node hashes #78
Comments
You could also use the node positions in the tree relative to the root as "hash" (key, really). We'll almost surely be adding hashes or keys as part of the "UAST 2" since they'll be needed to implement some of the approved features. |
Node positions are not influenced by the token values, are they. |
Sorry, I wasn't clear, I didn't mean the token positions in the source code but the node positions in the tree, take a look at the ascii "art" at the start of this (superceed) draft: https://gist.github.com/juanjux/5c905c8bfbf2091b6da43eb76dd822e9 |
@vmarkovtsev Do you need the hashes specifically or (a hash-based) |
Yep tree positions matter. I need the explicit hashes, |
I assumed that this method can compare node sets and hashes are cached by libuast automatically. But yeah, explicit hashes are easier to implement. |
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Signed-off-by: Denys Smirnov <denys@sourced.tech>
Signed-off-by: Denys Smirnov <denys@sourced.tech>
I have already seen several times that it is often required to quickly compare two sets of UAST nodes.
The naive approach is to compare each to each which is O(n^2). If we had a standard way to hash UAST nodes, we would solve this problem in O(n).
There are two ways of hashing nodes: a single tiny change rewrites the hash (1) and locality sensitive hashing (2). (1) is easy to implement by hashing the serialization of a node. (2) requires some research, and I've got a sufficiently working implementation in https://github.com/src-d/treediff/blob/master/treediff.py#L33 which recursively uses hashes of the sorted children.
The text was updated successfully, but these errors were encountered: