Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

UASTv2 has no tokens for JSX Nodes #58

Closed
zurk opened this issue Feb 13, 2019 · 8 comments
Closed

UASTv2 has no tokens for JSX Nodes #58

zurk opened this issue Feb 13, 2019 · 8 comments
Assignees
Labels

Comments

@zurk
Copy link

zurk commented Feb 13, 2019

JSX is a syntax extension to JavaScript. It can be converted to a proper JS with babel tool.
The bug I found is that UASTv1 provides a lot of tokens while UASTv2 don't
Here is the data to reproduce:

The JSX code

class TestIdTestApp extends React.Component {
  render() {
    return (
      <View>
        <Text testID="Text">text</Text>
      </View>
    );
  }
}
UASTv1
Start Position  Token              Internal Role      Roles Tree                                              
                                                                                                              
(0, 1, 1)       |''|               File               FILE                                                    
(0, 1, 1)       |''|               Program            ┣ MODULE                                                
(0, 1, 1)       |''|               ClassDeclaration   ┃ ┣ DECLARATION, TYPE, STATEMENT                        
(6, 1, 7)       |'TestIdTestApp'|  Identifier         ┃ ┃ ┣ EXPRESSION, IDENTIFIER, TYPE, NAME                
(28, 1, 29)     |''|               MemberExpression   ┃ ┃ ┣ QUALIFIED, EXPRESSION, IDENTIFIER, TYPE, BASE     
(28, 1, 29)     |'React'|          Identifier         ┃ ┃ ┃ ┣ EXPRESSION, IDENTIFIER                          
(34, 1, 35)     |'Component'|      Identifier         ┃ ┃ ┃ ┗ EXPRESSION, IDENTIFIER                          
(44, 1, 45)     |''|               ClassBody          ┃ ┃ ┣ TYPE, BODY                                        
(48, 2, 3)      |''|               ClassMethod        ┃ ┃ ┃ ┣ DECLARATION, FUNCTION, STATEMENT                
(48, 2, 3)      |'render'|         Identifier         ┃ ┃ ┃ ┃ ┣ EXPRESSION, IDENTIFIER, KEY, NAME             
(57, 2, 12)     |''|               BlockStatement     ┃ ┃ ┃ ┃ ┣ FUNCTION, BODY, STATEMENT, BLOCK, SCOPE, VALUE
(63, 3, 5)      |''|               ReturnStatement    ┃ ┃ ┃ ┃ ┃ ┣ STATEMENT, RETURN                           
(78, 4, 7)      |''|               JSXElement         ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                               
(78, 4, 7)      |''|               JSXOpeningElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(79, 4, 8)      |'View'|           JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ UNANNOTATED                           
(84, 4, 13)     |'\n        '|     JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(93, 5, 9)      |''|               JSXElement         ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(93, 5, 9)      |''|               JSXOpeningElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                           
(94, 5, 10)     |'Text'|           JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                         
(99, 5, 15)     |''|               JSXAttribute       ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                         
(99, 5, 15)     |'testID'|         JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                       
(106, 5, 22)    |'Text'|           StringLiteral      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ ┗ EXPRESSION, LITERAL, STRING       
(113, 5, 29)    |'text'|           JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                           
(117, 5, 33)    |''|               JSXClosingElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                           
(119, 5, 35)    |'Text'|           JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ ┗ UNANNOTATED                         
(124, 5, 40)    |'\n      '|       JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(131, 6, 7)     |''|               JSXClosingElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(133, 6, 9)     |'View'|           JSXIdentifier      ┗ ┗ ┗ ┗ ┗ ┗ ┗ ┗ ┗ UNANNOTATED           
UASTv2
(0, 1, 1)       |''|               File               FILE                                                    
(0, 1, 1)       |''|               Program            ┣ MODULE                                                
(0, 1, 1)       |''|               ClassDeclaration   ┃ ┣ DECLARATION, TYPE, STATEMENT                        
(6, 1, 7)       |'TestIdTestApp'|  Identifier         ┃ ┃ ┣ EXPRESSION, IDENTIFIER, TYPE, NAME                
(28, 1, 29)     |''|               MemberExpression   ┃ ┃ ┣ QUALIFIED, EXPRESSION, IDENTIFIER, TYPE, BASE     
(28, 1, 29)     |'React'|          Identifier         ┃ ┃ ┃ ┣ EXPRESSION, IDENTIFIER                          
(34, 1, 35)     |'Component'|      Identifier         ┃ ┃ ┃ ┗ EXPRESSION, IDENTIFIER                          
(44, 1, 45)     |''|               ClassBody          ┃ ┃ ┣ TYPE, BODY                                        
(48, 2, 3)      |''|               ClassMethod        ┃ ┃ ┃ ┣ DECLARATION, FUNCTION, STATEMENT                
(48, 2, 3)      |'render'|         Identifier         ┃ ┃ ┃ ┃ ┣ EXPRESSION, IDENTIFIER, KEY, NAME             
(57, 2, 12)     |''|               BlockStatement     ┃ ┃ ┃ ┃ ┣ STATEMENT, BLOCK, SCOPE, FUNCTION, BODY, VALUE
(63, 3, 5)      |''|               ReturnStatement    ┃ ┃ ┃ ┃ ┃ ┣ STATEMENT, RETURN                           
(78, 4, 7)      |''|               JSXElement         ┃ ┃ ┃ ┃ ┃ ┃ ┣ INCOMPLETE                                
(78, 4, 7)      |''|               JSXOpeningElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ BLOCK, INCOMPLETE                       
(79, 4, 8)      |''|               JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ IDENTIFIER, INCOMPLETE                
(84, 4, 13)     |''|               JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(93, 5, 9)      |''|               JSXElement         ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ INCOMPLETE                              
(93, 5, 9)      |''|               JSXOpeningElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ BLOCK, INCOMPLETE                     
(94, 5, 10)     |''|               JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ IDENTIFIER, INCOMPLETE              
(99, 5, 15)     |''|               JSXAttribute       ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ INCOMPLETE                          
(99, 5, 15)     |''|               JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ IDENTIFIER, INCOMPLETE            
(106, 5, 22)    |'"Text"'|         StringLiteral      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ ┗ EXPRESSION, LITERAL, STRING       
(113, 5, 29)    |''|               JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                           
(117, 5, 33)    |''|               JSXClosingElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                           
(119, 5, 35)    |''|               JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ ┗ IDENTIFIER, INCOMPLETE              
(124, 5, 40)    |''|               JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(131, 6, 7)     |''|               JSXClosingElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED                             
(133, 6, 9)     |''|               JSXIdentifier      ┗ ┗ ┗ ┗ ┗ ┗ ┗ ┗ ┗ IDENTIFIER, INCOMPLETE   

Should be easy to fix because all positional information is saved.

drivers are v1.2.0 and v.2.6.0

@zurk
Copy link
Author

zurk commented Feb 13, 2019

Also, I am not sure if we can count JSX as the same language like JS. Can I kindly ask you do give some understanding why is it so?

@zurk
Copy link
Author

zurk commented Feb 19, 2019

If you want to reproduce this trees:

Python Code
# my bblfsh is v2.12.6
import bblfsh
from bblfsh import BblfshClient, role_id, role_name

def node2raw(node, internal_type=True):
    token = "|%s|" % node.token[:20].replace("\n", "\\n")
    roles = ", ".join([role_name(k) for k in node.roles])
    internal_role = node.internal_type
    positions = str(node.start_position.line) if node.start_position.line != 0 else ""
    return [positions, token, internal_role, roles] if internal_type else [positions, token, roles]
        
def uast_roles(uast) -> iter:
    stack = [(0, uast)]
    while stack:
        level, node = stack.pop()
        yield level, node
        stack.extend((level+1, c) for c in node.children[::-1])

def rreplace(s, old, new, times):
    li = s.rsplit(old, times)
    return new.join(li)
    
def print_table(table):
    col_size = [max(len(cell) for cell in table_col) for table_col in zip(*table)]
    for raw in table:
        print("  ".join(cell.ljust(size) for cell, size in zip(raw, col_size)))
    
def print_uast_structure(uast, internal_type=True):
    if internal_type:
        table = [["#", "Token", "Internal Role", "Roles Tree"], ["", "", "", ""]]
    else:
        table = [["#", "Token", "Roles Tree"], ["", "", ""]]
    old_level = 0
    for level, node in uast_roles(uast):
        raw = node2raw(node, internal_type)
        if level != 0:
            raw[-1] = "┃ " * (level-1) + "┣ " + raw[-1]
        if level < old_level:
            table[-1][-1] = rreplace(table[-1][-1].replace("┣", "┗"), "┃", "┗", old_level-level-1)
            pass
        table.append(raw)
        old_level = level
    table[-1][-1] = table[-1][-1].replace("┣", "┗").replace("┃", "┗")
    print_table(table)
    
        
def print_nodes(nodes):
    table = [["#", "Token", "Roles"]]
    for node in nodes:
        table.append(node2raw(node, False))
    print_table(table)
    
code = """
class TestIdTestApp extends React.Component {
  render() {
    return (
      <View>
        <Text testID="Text">text</Text>
      </View>
    );
  }
}
""".strip().encode()

bc = bblfsh.BblfshClient("localhost:9432")

response = bc.parse("", language="JavaScript", contents=code)
print_uast_structure(response.uast)

@bzz
Copy link
Contributor

bzz commented Feb 22, 2019

I will get back to this, as soon as driver is migrated to the latest SDK version and all the missing annotations are added (#59 #56)

@bzz bzz self-assigned this Feb 22, 2019
@bzz bzz added question and removed enhancement labels Mar 3, 2019
@bzz
Copy link
Contributor

bzz commented Mar 3, 2019

@zurk the second example you gave uses bblfsh protocol.v2 but it is in "annotated" mode, or equivalent of bblfsh-cli -m annotated.

Full UASTv2, which is a "semantic" mode, is only supported by bblfsh/python-client#128

So here is an output of your script with 1 line modified to play better in this mode:

#  Token            Internal Role      Roles Tree

1  ||               File               FILE
1  ||               Program            ┣ MODULE
1  ||               ClassDeclaration   ┃ ┣ DECLARATION, TYPE, STATEMENT
1  |TestIdTestApp|  Identifier         ┃ ┃ ┣ EXPRESSION, IDENTIFIER, TYPE, NAME
1  ||               MemberExpression   ┃ ┃ ┣ QUALIFIED, EXPRESSION, IDENTIFIER, TYPE, BASE
1  |React|          Identifier         ┃ ┃ ┃ ┣ EXPRESSION, IDENTIFIER
1  |Component|      Identifier         ┃ ┃ ┃ ┗ EXPRESSION, IDENTIFIER
1  ||               ClassBody          ┃ ┃ ┣ TYPE, BODY
2  ||               ClassMethod        ┃ ┃ ┃ ┣ DECLARATION, FUNCTION, STATEMENT
2  |render|         Identifier         ┃ ┃ ┃ ┃ ┣ EXPRESSION, IDENTIFIER, KEY, NAME
2  ||               BlockStatement     ┃ ┃ ┃ ┃ ┣ STATEMENT, BLOCK, SCOPE, FUNCTION, BODY, VALUE
3  ||               ReturnStatement    ┃ ┃ ┃ ┃ ┃ ┣ STATEMENT, RETURN
4  ||               JSXElement         ┃ ┃ ┃ ┃ ┃ ┃ ┣ INCOMPLETE
4  ||               JSXOpeningElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ BLOCK, INCOMPLETE
4  |View|           JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ IDENTIFIER, INCOMPLETE
4  ||               JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED
5  ||               JSXElement         ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ INCOMPLETE
5  ||               JSXOpeningElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ BLOCK, INCOMPLETE
5  |Text|           JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ IDENTIFIER, INCOMPLETE
5  ||               JSXAttribute       ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ INCOMPLETE
5  |testID|         JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ IDENTIFIER, INCOMPLETE
5  |"Text"|         StringLiteral      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ ┗ EXPRESSION, LITERAL, STRING
5  ||               JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED
5  ||               JSXClosingElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED
5  |Text|           JSXIdentifier      ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┗ ┗ IDENTIFIER, INCOMPLETE
5  ||               JSXText            ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED
6  ||               JSXClosingElement  ┃ ┃ ┃ ┃ ┃ ┃ ┃ ┣ UNANNOTATED
6  |View|           JSXIdentifier      ┗ ┗ ┗ ┗ ┗ ┗ ┗ ┗ ┗ IDENTIFIER, INCOMPLETE

The only change is

    token = "|%s|" % (node.token[:20].replace("\n", "\\n") if node.token else node.properties["name"])

and, as you can see, now it includes the text of all JSXIdentifiers which are all have Identifier role.

From my experience, bblfsh-cli is the best way to check the UAST structure, so you could see how Semantic USATv2 schema looks like and adapt your client code for it (hint: there is not going to be token field anywhere in it, Identifiers have only names)

@bzz
Copy link
Contributor

bzz commented Mar 3, 2019

To your second question

Also, I am not sure if we can count JSX as the same language like JS. Can I kindly ask you do give some understanding why is it so?

Long story short: it depends on the convention :)

In enry using a CLI you can quickly verify that JSX is a different language from the JavaScript language: JSX but in bblfsh - there is no driver for this language and so the parsing will fail, until the client manually overrides it to be parsed by javascript driver. In CLI that can be done with bblfsh-sdk -l javascript <your-file>.jsx.

So that way, on bblfsh side a single driver now does support multiple languages which makes sense (e.g as there is no such language as JavaScript - there are many ECMAScript language standards that all are parsed by the same tool that you mention babel, including JSX though the plugin) - this fact is not well documented right now BUT we are looking to change that in the Q2.

If you are interested in sneak-peeking into the future - there is a proposal in RFC stage on project's mailing list for language aliases support that covers exactly this problem.

Let me know if that answers your questions!

@zurk
Copy link
Author

zurk commented Mar 4, 2019

Answering to #58 (comment)
Well, I still do not see tokens for JSXText which are present in UASTv1 example. Is it a bug or it was removed on purpose?

and adapt your client code for it (hint: there is not going to be token field anywhere in it, Identifiers have only names)

If you are worried about style-analyzer I think we will not use the annotated layer. We have an implementation for code tokenizer based on UAST tokens. So, we actually don't need just identifiers but all tokens that UAST have.
But for sourced-ml package adaptation, it is a good feature for sure 👍

About JSX #58 (comment)
Thanks for the clarification, now I wonder how lookout deals with it... Going to ask in https://github.com/src-d/lookout.

@bzz
Copy link
Contributor

bzz commented Mar 5, 2019

Answering to #58 (comment)
Well, I still do not see tokens for JSXText which are present in UASTv1 example. Is it a bug or it was removed on purpose?

Would you be so kind to also check that it in the CLI following the instructions that I posted above and let me know if you think something is missing in there?

About JSX #58 (comment)
Thanks for the clarification, now I wonder how lookout deals with it...

I belive lookout does not add anything to that and is just using enry to detect language if IncludeLanguages was set.

Right now bblfshd is adapted to accept enry names so difference like C# (enry) vs csharp (bblfsh driver) is mitigated but e.g in case of .jsx I'm almost sure it will not get UAST as it must have javascript set as language in the request.

@zurk
Copy link
Author

zurk commented Mar 5, 2019

bblfsh-cli works as expected.
Ok, I think that some tokens were missing because I use old and possibly incompatible python client.

For me this issue is solved, thanks @bzz! Feel free to close it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants