-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[idea] Generic code tokenizer #44
Comments
Right now we have tokenizer for JS in style-analyzer: If we start this project, this code should be considered as an entry point. |
Should this be transferred to src-d/feature-idea? @vmarkovtsev @EgorBu |
Good idea, @m09 |
Somehow I cannot transfer it, GitHub does not find the feature-idea repo 😕 Edit: it seems we need someone admin in both ml-backlog and feature-idea to transfer the issue. |
Calling @smola to the rescue to transfer the issue to feature-idea :) |
Feature extraction for source code heavily relies on tokenization of source code and structure information in many tasks. If we want to use suggestion feature at GitHub we must use tokenized code.
This part is very important for everybody in MLonCode area and still it's quite complicated to do.
Proposal - extend bblfsh client or make new module that could be used by many different projects.
TLDR: information required by feature extractor
code
->nodes
->"".join(nodes) == code
bblfsh
by researchers in this areaAnd this module could be used by many researchers in this area. Related issue bblfsh/bblfshd#231
The text was updated successfully, but these errors were encountered: