Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Python client v3 (UASTv2) #128

Merged
merged 53 commits into from
Mar 12, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
9b5b502
use the new libuast; rewrite bindings using cpp layer of libuast
Oct 4, 2018
2788a1a
working bindings prototype
Oct 11, 2018
8e39632
fix string memory management
Oct 16, 2018
d3b6bf9
refactoring: NodeExtType->PyNodeExtType for consistency
bzz Oct 16, 2018
6ce2b11
refactoring: NodeExt->PyNodeExt for consistency
bzz Oct 16, 2018
7bac87f
refactoring: PyUastType->PyContextType for consistency
bzz Oct 16, 2018
8b84e4a
refactoring: PyUast->PyContext for consistency
bzz Oct 16, 2018
1fa1d0d
refactoring: fix comments + fmt after rebase
bzz Oct 17, 2018
9e88733
apply review feedback
bzz Oct 17, 2018
98b3ef8
fix replace
Oct 17, 2018
96abf64
Build fixes, comment out v1 things, some other adjustements
Oct 18, 2018
151e61c
Recover grpc sdk v1 protocol for some grpc objects
Oct 22, 2018
7f583ea
Forward port the aliases refactor by Vadim
Oct 22, 2018
6ba57fc
Forward port travis changes
Oct 24, 2018
24fd7b6
Merge branch 'master' into v3
juanjux Oct 24, 2018
a2ca471
fix pip install
Oct 30, 2018
e308038
update the client to use both protocols
Oct 30, 2018
0d675e1
Remove unused and broken import
Oct 31, 2018
91b798b
Compile the ext module from an static libuast object
Oct 31, 2018
acec219
enable building the client with static libuast
Nov 1, 2018
2fd570c
do not free the query string in filter, it seems to be borrowed
Nov 1, 2018
d05770c
improve the native Python wrappers and update the readme
Nov 1, 2018
272acc9
fix error handling in native extension
Nov 2, 2018
270445b
Explicit cast to char* to avoid nasty warning with latest G++
Nov 8, 2018
1f977e4
PEP8
Nov 8, 2018
20890e0
Renamed PyContext to PythonContext to avoid symbol conflict in 3.7+
Nov 8, 2018
2c983a9
Use same name for Windows an Linux static lib before the extension
Nov 8, 2018
0bcf223
Add several needed static libs for Windows
Nov 8, 2018
8401a1c
Several improvements (see desc)
Nov 15, 2018
bd8c2d5
Several Improvements (II)
Nov 16, 2018
f964c46
Make iterators great (and working) again
Nov 16, 2018
ea7d615
fix usage of parsed string arguments in filter
Dec 5, 2018
1c73766
properly deallocate python objects
Dec 5, 2018
7cb563a
free encoding buffer
Dec 5, 2018
cd1d90d
bump versions
Dec 5, 2018
a55abc4
Unittests and other fixes
Dec 11, 2018
d74a514
Merge branch 'master' into v3
juanjux Dec 11, 2018
75170a6
Uncommented failed test
Dec 11, 2018
c4fd5be
Enabled unnitesting in travis
Dec 11, 2018
753efb4
Run docker and install python driver from travis
Dec 11, 2018
9876503
Commented out the node afected by SDK issue 340
Dec 12, 2018
7098328
Merge branch 'master' into v3
juanjux Dec 12, 2018
7ad8c6f
Remove Python 3.5 from Travis
Dec 12, 2018
a2752b7
Use range for grpcio and grpciotools
Dec 12, 2018
a99f5be
Fixed some of @bzz feedback from review
Dec 12, 2018
b983ea2
add error checks for iterators and clarify comments
Dec 13, 2018
9e3f415
Fixed from @zurk review (thanks!)
Dec 18, 2018
988eb5e
Merge branch 'v3' of https://github.com/dennwc/client-python into v3
Dec 18, 2018
ba93944
Merge branch 'master' into v3
juanjux Dec 18, 2018
d485273
Fixes and improvements from @vmarkovtsev review
Dec 18, 2018
66ccfed
PEP8 fix
Dec 18, 2018
a020666
Changed ModeDict to a Modes enum-like class
Dec 18, 2018
9b094aa
Allow to create Clients with an instanced grpc channel as suggested b…
Dec 18, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,22 +1,26 @@
language: python
sudo: true
dist: xenial
env:
- BBLFSHD_VERSION=v2.9.1 BBLFSH_PYTHON_VERSION=v2.3.0
services:
- docker
cache:
directories:
- $HOME/.cache/pip
python:
- "3.5"
juanjux marked this conversation as resolved.
Show resolved Hide resolved
- "3.6"
- "3.7"
install:
- docker run --privileged -d -p 9432:9432 --name bblfshd bblfsh/bblfshd:$BBLFSHD_VERSION
- docker exec bblfshd bblfshctl driver install bblfsh/python-driver:$BBLFSH_PYTHON_VERSION
- wget https://github.com/bblfsh/client-python/releases/download/v2.2.1/protobuf-python_3.4.1-1_amd64.deb
- sudo dpkg -i protobuf-python_3.4.1-1_amd64.deb
- pip3 install --upgrade pip
- pip3 install -r requirements.txt
- python3 setup.py --getdeps --log
- pip3 install . --upgrade
- cd bblfsh && python3 -m unittest discover
- if [[ -z "$TRAVIS_TAG" ]]; then exit 0; fi
- if [[ $TRAVIS_PYTHON_VERSION != '3.6' ]]; then exit 0; fi # disable double uploads to pypi
- echo "[distutils]" > .pypirc
Expand All @@ -28,6 +32,5 @@ install:
- HOME=. python setup.py sdist upload
script:
- python3 setup.py build_ext -i
- python3 -m unittest discover .
notifications:
email: false
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ include Makefile
include github.com/gogo/protobuf/gogoproto/gogo.proto
include gopkg.in/bblfsh/sdk.v1/protocol/generated.proto
include gopkg.in/bblfsh/sdk.v1/uast/generated.proto
include bblfsh/memtracker.h
include bblfsh/libuast/libuast.hpp
prune bblfsh/libuast
60 changes: 45 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,18 @@ pip install bblfsh
```bash
git clone https://github.com/bblfsh/client-python.git
cd client-python
pip install -r requirements.txt
python setup.py --getdeps
python setup.py install
# or: pip install .
```

### Dependencies

You need to install `libxml2` and its header files. You also will need a `curl` cli tool to dowload `libuast`, and a `g++` for building [libtuast Python bindings](https://github.com/bblfsh/client-python/blob/0037d762563ab49b3daac8a7577f7103a5628fc6/setup.py#L17).
You also will need a `curl` cli tool to dowload `libuast`, and a `g++` for building [libtuast Python bindings](https://github.com/bblfsh/client-python/blob/0037d762563ab49b3daac8a7577f7103a5628fc6/setup.py#L17).
The command for Debian and derived distributions would be:

```bash
sudo apt install libxml2-dev
sudo apt install curl
sudo apt install build-essential
```
Expand All @@ -49,21 +51,49 @@ Please, read the [getting started](https://doc.bblf.sh/using-babelfish/getting-s
import bblfsh

client = bblfsh.BblfshClient("0.0.0.0:9432")
uast = client.parse("/path/to/file.py").uast
print(uast)
# "filter' allows you to use XPath queries to filter on result nodes:
print(bblfsh.filter(uast, "//Import[@roleImport and @roleDeclaration]//alias"))

# filter\_[bool|string|number] must be used when using XPath functions returning
# these types:
print(bblfsh.filter_bool(uast, "boolean(//*[@strtOffset or @endOffset])"))
print(bblfsh.filter_string(uast, "name(//*[1])"))
print(bblfsh.filter_number(uast, "count(//*)"))
ctx = client.parse("/path/to/file.py")
print(ctx)
# or to get the results in a dictionary:
resdict = ctx.get_all()

# You can also iterate on several tree iteration orders:
it = bblfsh.iterator(uast, bblfsh.TreeOrder.PRE_ORDER)
# "filter' allows you to use XPath queries to filter on result nodes:
it = ctx.filter("//python:Call")
for node in it:
print(node.internal_type)
print(node)
# or:
doSomething(node.get())

# filter must be used when using XPath functions returning these types:
juanjux marked this conversation as resolved.
Show resolved Hide resolved
# XPath queries can return different types (dicts, int, float, bool or str),
# calling get() with an item will return the right type, but if you must ensure
# that you are getting the expected type (to avoid errors in the queries) there
# are alterative typed versions:
x = next(ctx.filter("boolean(//*[@strtOffset or @endOffset])").get_bool()
y = next(ctx.filter("name(//*[1])")).get_str()
z = next(ctx.filter("count(//*)").get_int() # or get_float()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very Pythonic. IDK if it is hard to guess the type, but the following is more Pythonic:

x = next(ctx.filter("boolean(//*[@strtOffset or @endOffset])").iterate())
for name in ctx.filter("name(//*[1])").iterate():
    print(name)

Where iterate() would be an unordered iterator over the resulting values. Or name it results(). Or remove it and iterate directly, anyway.

Copy link
Contributor

@juanjux juanjux Dec 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are typed functions for queries returning boolean/string/integer/float values instead of nodes so the "typed get" is needed when the user doesn't have to do isinstance of the results to check that they're right every time (but if he wants, he can use the normal get()).


# You can also iterate using iteration orders different than the
# default preorder using the `iterate` method on `parse` result or node objects:

# Directly over parse results
it = client.parse("/path/to/file.py").iterate(bblfsh.TreeOrder.POST_ORDER)
for i in it: ...

# Over filter results (which by default are already iterators with PRE_ORDER):
ctx = client.parse("file.py")
newiter = ctx.filter("//python:Call").iterate(bblfsh.TreeOrder.LEVEL_ORDER)
for i in newiter: ...

# Over individual node objects to change the iteration order of
# a specific subtree:
ctx = client.parse("file.py")
first_node = next(ctx)
newiter = first_node.iterate(bblfsh.TreeOrder.POSITION_ORDER)
for i in newiter: ...

# You can also get the non semantic UAST or native AST:
ctx = client.parse("file.py", mode=bblfsh.ModeDict["NATIVE"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use an enum class or define separate constants instead of raw strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mapping the protocol_v2_module.Mode.DESCRIPTOR.values_by_name field. Maybe we could generate variable names from these strings using eval or exec but that's almost worse.

# Possible values for ModeDict: DEFAULT_MODE, NATIVE, PREPROCESSED, ANNOTATED, SEMANTIC
```

Please read the [Babelfish clients](https://doc.bblf.sh/using-babelfish/clients.html)
Expand Down
23 changes: 8 additions & 15 deletions bblfsh/__init__.py
Original file line number Diff line number Diff line change
@@ -1,33 +1,26 @@
from bblfsh.client import BblfshClient
from bblfsh.pyuast import filter, filter_bool, filter_number, filter_string, iterator
from bblfsh.pyuast import decode, iterator, uast
from bblfsh.tree_order import TreeOrder
from bblfsh.aliases import *

class TreeOrder:
PRE_ORDER = 0
POST_ORDER = 1
LEVEL_ORDER = 2
POSITION_ORDER = 3

# "in" is a reserved keyword in Python thus can't be used as package name, so
# we import by string

class RoleSearchException(Exception):
pass


def role_id(role_name: str) -> int:
def role_id(rname: str) -> int:
try:
name = DESCRIPTOR.enum_types_by_name["Role"].values_by_name[role_name].number
name = DESCRIPTOR.enum_types_by_name["Role"].values_by_name[rname].number
except KeyError:
raise RoleSearchException("Role with name '{}' not found".format(role_name))
raise RoleSearchException("Role with name '{}' not found".format(rname))

return name


def role_name(role_id: int) -> str:
def role_name(rid: int) -> str:
try:
id_ = DESCRIPTOR.enum_types_by_name["Role"].values_by_number[role_id].name
id_ = DESCRIPTOR.enum_types_by_name["Role"].values_by_number[rid].name
except KeyError:
raise RoleSearchException("Role with ID '{}' not found".format(role_id))
raise RoleSearchException("Role with ID '{}' not found".format(rid))

return id_
63 changes: 28 additions & 35 deletions bblfsh/__main__.py
Original file line number Diff line number Diff line change
@@ -1,69 +1,62 @@
import argparse
import pprint
import sys

import bblfsh
from bblfsh.pyuast import filter

from bblfsh.client import BblfshClient
from bblfsh.launcher import ensure_bblfsh_is_running


def setup():
def setup() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Query for a UAST to Babelfish and dump it to stdout."
)
parser.add_argument("-e", "--endpoint", default="0.0.0.0:9432",
help="bblfsh gRPC endpoint.")
help="bblfsh gRPC endpoint.", type=str)
parser.add_argument("-f", "--file", required=True,
help="File to parse.")
help="File to parse.", type=str)
parser.add_argument("-l", "--language", default=None,
help="File's language. The default is to autodetect.")
help="File's language. The default is to autodetect.", type=str)
parser.add_argument("--disable-bblfsh-autorun", action="store_true",
help="Do not automatically launch Babelfish server "
"if it is not running.")

parser.add_argument("-q", "--query", default="", help="xpath query")
parser.add_argument("-m", "--mapn", default="", help="transform function of the results (n)")
parser.add_argument("-a", "--array", help='print results as an array', action='store_true')
parser.add_argument("-q", "--query", default="", help="xpath query", type=str)
parser.add_argument("-a", "--array", help='print results as a parseable Python array', action='store_true')

args = parser.parse_args()
return args
return parser.parse_args()

def run_query(root: bblfsh.Node, query: str, mapn: str, as_array: bool) -> None:
result = list(filter(root, query))

if not result:
def run_query(uast, query: str, array: bool) -> None:
result_iter = uast.filter(query)
if not result_iter:
print("Nothing found")

else:
if mapn:
result = [eval(mapn) for n in result]
result_list = [x.load() for x in result_iter]

if as_array:
print("results[{}] = {}".format(len(result), result))
else:
print("Running xpath query: {}".format(query))
print("FOUND {} roots".format(len(result)))
if array:
pprint.pprint(result_list)
else:
print("%d Results:" % len(result_list))
for i, node in enumerate(result_list):
print("== {} ==================================".format(i+1))
print(node)

for i, node in enumerate(result):
print("== {} ==================================".format(i+1))
print(node)

def main():
def main() -> int:
args = setup()
if not args.disable_bblfsh_autorun:
ensure_bblfsh_is_running()

client = BblfshClient(args.endpoint)
response = client.parse(args.file, args.language)
root = response.uast
if len(response.errors):
sys.stderr.write("\n".join(response.errors) + "\n")
query = args.query
if query:
run_query(root, query, args.mapn, args.array)
ctx = client.parse(args.file, args.language)

if args.query:
run_query(ctx, args.query, array=args.array)
else:
print(root)
pprint.pprint(ctx.load())

return 0


if __name__ == "__main__":
sys.exit(main())
56 changes: 34 additions & 22 deletions bblfsh/aliases.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,39 @@
__all__ = ["DESCRIPTOR", "Node", "Position", "ParseResponse", "NativeParseResponse",
"ParseRequest", "NativeParseRequest", "VersionRequest", "ProtocolServiceStub"]

import importlib

from bblfsh.sdkversion import VERSION
import google

# "in" is a reserved keyword in Python thus can't be used as package name, so
# we import by string
uast_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.%s.uast.generated_pb2" % VERSION)
protocol_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.%s.protocol.generated_pb2" % VERSION)
protocol_grpc_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.%s.protocol.generated_pb2_grpc" % VERSION)
uast_v2_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.v2.uast.generated_pb2")
protocol_v2_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.v2.protocol.generated_pb2")
protocol_grpc_v2_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.v2.protocol.generated_pb2_grpc")
protocol_v1_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.v1.protocol.generated_pb2")
protocol_grpc_v1_module = importlib.import_module(
"bblfsh.gopkg.in.bblfsh.sdk.v1.protocol.generated_pb2_grpc")

DESCRIPTOR = uast_v2_module.DESCRIPTOR
ParseRequest = protocol_v2_module.ParseRequest
ParseResponse = protocol_v2_module.ParseResponse
ParseError = protocol_v2_module.ParseError
Mode = protocol_v2_module.Mode
ModeType = google.protobuf.internal.enum_type_wrapper.EnumTypeWrapper


class Modes:
pass

# Current values: {'DEFAULT_MODE': 0, 'NATIVE': 1, 'PREPROCESSED': 2, 'ANNOTATED': 4, 'SEMANTIC': 8}
for k, v in Mode.DESCRIPTOR.values_by_name.items():
setattr(Modes, k, v.number)

DriverStub = protocol_grpc_v2_module.DriverStub
DriverServicer = protocol_grpc_v2_module.DriverServicer

DESCRIPTOR = uast_module.DESCRIPTOR
Node = uast_module.Node
Position = uast_module.Position
ParseResponse = protocol_module.ParseResponse
NativeParseResponse = protocol_module.NativeParseResponse
ParseRequest = protocol_module.ParseRequest
NativeParseRequest = protocol_module.NativeParseRequest
VersionRequest = protocol_module.VersionRequest
SupportedLanguagesRequest = protocol_module.SupportedLanguagesRequest
SupportedLanguagesResponse = protocol_module.SupportedLanguagesResponse
ProtocolServiceStub = protocol_grpc_module.ProtocolServiceStub
VersionRequest = protocol_v1_module.VersionRequest
VersionResponse = protocol_v1_module.VersionResponse
SupportedLanguagesRequest = protocol_v1_module.SupportedLanguagesRequest
SupportedLanguagesResponse = protocol_v1_module.SupportedLanguagesResponse
ProtocolServiceStub = protocol_grpc_v1_module.ProtocolServiceStub
Loading