Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

xpath query is not validated #332

Open
kuba-- opened this issue Nov 28, 2018 · 16 comments
Open

xpath query is not validated #332

kuba-- opened this issue Nov 28, 2018 · 16 comments
Labels

Comments

@kuba--
Copy link
Member

kuba-- commented Nov 28, 2018

The issue is related to:

Generally SDK uses 3rd-party package (github.com/antchfx/xpath) to compile xpath queries.
Unfortunately lot of incorrect xpaths passes, e.g.:

    x, err = Compile("//*[@role=\"Return\"]")
    if err != nil {
        t.Fatalf("should be correct but got error %s", err)
    }
    t.Logf("%v\n", x)

    x, err = Compile("BAD")
    if err != nil {
        t.Fatalf("should be correct but got error %s", err)
    }
    t.Logf("%v\n", x)

    x, err = Compile("this is a totally crap. For sure not valid @xpath")
    if err != nil {
        t.Fatalf("should be correct but got error %s", err)
    }
    t.Logf("%v\n", x)
@dennwc dennwc added the bug label Nov 28, 2018
@dennwc
Copy link
Member

dennwc commented Nov 29, 2018

Yes, the library isn't ideal. We should fix it upstream.

But also there were few discussions that XPath might not be the best language going forward, so maybe we will just use a new language? Will need to decide at some point.

@kuba--
Copy link
Member Author

kuba-- commented Nov 29, 2018

Ok. I'm wondering if graphql would be a good choice here.

@dennwc
Copy link
Member

dennwc commented Nov 29, 2018

Real GraphQL needs a srits schema and we don't have one by design.

But I've implemented a GraphQL endpoint with dynamic schema for Cayley, so in practice it can be done.

The main problem is that GraphQL still cannot change the shape of a response. If you are querying multiple nested nodes, but only want few value extracted from them, you will end up requesting a skeleton of that subtree with few values in there. We need to check if there are any alternatives that can allow querying exactly what the user needs.

@smola
Copy link
Member

smola commented Dec 12, 2018

Keep in mind that we do support, recommend and demo XPath today. Feel free to open a separate issue and start discussing proposals for the future, but this is a real issue today.

@juanjux
Copy link
Contributor

juanjux commented Dec 12, 2018

Since the errors on the test code always say the same, so it's not clear to me what of the examples are supossed to be right or wrong, but I've tested with an online XPath validator. I've also tested using the client-python v3 (which uses the SDK and thus the library).

  1. The first example works on the web (and produce results) even if it uses double quotes. It also works with the Python client.
  2. The second example doesn't produce any error so it seems to be valid XPath. Doesn't fail either with the client, just produce empty results on both places.
  3. Third example doesn't validate with the web but doesn't produce any error with the client, so this one really seems to be a bug.

@bzz
Copy link
Contributor

bzz commented Jan 9, 2019

@kuba-- @ajnavarro if you guys have any thoughts on results of @juanjux preliminary research - please feel free to share

@kuba--
Copy link
Member Author

kuba-- commented Jan 10, 2019

For me it's all about optimization. If we can reject wrong query earlier instead of waiting till the end to say "no results" (because query is no valid) then it's better. In other words we'll take what LA team give us ;) The open question is - do we wanna change 3rd library to parse xpath, or stick to what we use so far?
@ajnavarro - correct me if I'm wrong.

@ajnavarro
Copy link

@kuba-- totally agree.

I think the problem here is that almost any string is a valid XPath query. Maybe we should add some extra validation for our use case, like the supported node names. Per example, if the user tries to query //uast:Idenitfier instead of //uast:Identifier, in my opinion, we should throw an error before executing the query.

@kuba--
Copy link
Member Author

kuba-- commented Jan 14, 2019

We may introduce waterfall of validators. XPath library should validate just xpath. So, bblfsh will be sure that got valid xpath and can focus on validating own syntax and reject e.g. non-uast queries:
//ast:Paradigm

@dennwc
Copy link
Member

dennwc commented Jan 14, 2019

This will require us to have a strict schema exposed to XPath. And before even trying this we will need to refactor XPath library to allow such hooks to be installed.

@kuba--
Copy link
Member Author

kuba-- commented Jan 14, 2019

No worries, I can help ;)
But frankly, this is just a wish. Actually we already have a kind of schema, but not written down, to let us generate validators. It's like having yacc file for our xpath.
I believe, any syntax tree cannot be universal if it doesn't have universal validators. It's like a programming language where in theory you can use any syntax but sometimes it will return something and sometimes not. It should't work like this. If you use fn instead of func it should tell you that this is wrong, even if both cases it's a valid literal.

@dennwc
Copy link
Member

dennwc commented Jan 14, 2019

In general, I think it's a good idea, but we are not ready for it right now.

For example "our" XPath is well defined for Semantic nodes (uast:*), but we know nearly nothing about nodes unique to each programming language (cpp:*). I have plans to derive this schema later, but for now, it's only possible to assert for uast:-related queries.

@kuba--
Copy link
Member Author

kuba-- commented Jan 14, 2019

So let's do it in "baby steps". Personally, I prefer many small features/tools which we can add one by one instead of one big 🎉 effect.

@dennwc
Copy link
Member

dennwc commented Jan 14, 2019

Totally agree! We can start by making an XPath library better. It's a bit messy right now.

@kuba--
Copy link
Member Author

kuba-- commented Mar 22, 2019

Talking about "baby steps", how about changing the current xpath library (github.com/antchfx/xpath) to the gopkg.in/xmlpath.v2?

At least in following test it can find the last invalid xpath:

package main

import (
	. "gopkg.in/xmlpath.v2"
	// . "github.com/antchfx/xpath"
)
func main() {
	MustCompile("//*[@role=\"Return\"]")
	MustCompile("BAD")

	MustCompile("this is a totally crap. For sure not valid @xpath") // <----
}

@dennwc
Copy link
Member

dennwc commented Mar 22, 2019

I checked the library briefly, it looks better but has one downside: it cannot work with virtual nodes, only with full XML tree.

Still, it may be an option to fork the library and make it work with some node interface instead of a specific type.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants