Define Ambiguity node #337

dennwc · 2018-12-05T20:22:19Z

This PR defines a new node type for SemUAST called Ambiguity.

This node will represent multiple alternative meanings of the same node in case driver cannot determine its type with a 100% certainty (without running the compiler or symbol resolution), but can propose a small number of alternative meanings.

The first use case is that languages like Python, Go and some others have built-in names like True/False that can be redefined. The driver might mark them as Identifier and be done with it, but this won't be that helpful for the end user. In fact, the driver knows that in this specific language the True identifier with 99% certainty is a boolean literal. So instead of dumping a generic Identifier, the driver may emit an Ambiguity node with both Bool and Identifier nodes.

The second use case is that sometimes operators might be overloaded in the language itself. I don't want to state that this will solve an issue of user-defined operator overloading, though. The current use case is a bit simpler for now. For example, in Python there is a % operator that can be a modulo operator or a string interpolation operator. The driver may emit both possible versions as an Ambiguity node if it cannot assert the type of the first operand.

Later, this may help to identify places in UAST that needs additional analysis. And it will immediately allow searching for nodes of more specific types in Babelfish instead of generic ones.

This change is

Signed-off-by: Denys Smirnov <denys@sourced.tech>

juanjux · 2018-12-06T09:25:12Z

While I think this is a nice feature to have, it was unplanned and the impact of this and the Bool PR is not trivial since would require to update/debug all drivers again so we should probably discuss not so much when to merge this (since it seems to be backward compatible until you start upgrading nodes) but when to upgrade the drivers.

I suspect having finished C# and Typescript drivers (plus issues that are starting to pile) have more priority than doing another pass over the existing ones, but this is something that @creachadair will have to talk with management.

I also have couple questions:

Imagine I have some annotation (for example Argument) that set some node that is, on a previous annotation, converted into another semantic node (for example, Identifier) that, with this, would be ambiguous so instead of just Identifier it could be Ambiguous:Identifier|String. Wont this break all those annotations? (Argument in this case).
If the answer to the previous question is "Yes", how would the new annotation work to express "This argument name is an Identifier or an Ambiguitiy node that could be an Identifier". We should strive to avoid making annotations/the DSL more complex than they already are so if there was some way in the SDK to get an Ambiguity node match any annotation that expect one of its possible values it would be great.

creachadair · 2018-12-06T17:04:23Z

While I think this is a nice feature to have, it was unplanned and the impact of this and the Bool PR is not trivial

If I understand the proposal correctly, this is effectively adding sum types to our abstract type system. That's not necessarily a bad idea—several languages have variations of sum-typing that we might want to incorporate anyway. However, I agree with @juanjux that it's a much larger design question, and we should consider it as a new feature rather than adding it in piecewise.

I tend to agree that short term we should probably be aiming for "breadth" of coverage (e.g., more languages) before "depth" of coverage (e.g., more detailed types). Both are valuable, but I think the immediate impact for our customers is higher if we go broad first.

dennwc · 2018-12-06T17:46:21Z

The aim of the proposal is to start a discussion about this specific problem. Just in case - I'm not proposing to update all the drivers :)

I also agree that extending language coverage should be the highest priority. This PR is more like a brain-dump of the problem I was thinking about in the background. If you think the proposal is reasonable - let's discuss it (same - in the background).

@juanjux Those are really good questions. First, this node type won't break annotation code, because Semantic pass happens before the Annotation pass and the last one is already prepared to deal with arbitrary Semantic types. And annotations won't ever target Semantic types directly, because it doesn't make sense to do so (Semantic types are "roles" themselves).
For the second question, I think you are referring to the normalization pass, not to the annotation pass. There is a way to make this "or" selector work, but I think it's unnecessary. Adding Ambiguity, by the definition, will happen as the last transform, and all other transforms will work on a "pure" tree.
And if you are thinking about XPath as well, we can solve it in a better way. Since we control the projection now, and projection knows the schema, we can "squash" the Ambiguity node and pretend that there are two alternative nodes in that XML tree. Queries will just work with both types of nodes in this case.

@creachadair I'm not sure that this is similar to sum types. Sum types assume that there is a set of defined types and the sum type may consist of few of them. And this is somehow expressed in AST/code/type system/whatever.
This change, in contrast, aims to describe a specific instance of an AST node and provides alternative interpretations for it. This won't happen automatically for specific node types.

Instead, the transformation code will search for specific patterns (x % y, for example, which is py:BinOp %) and will try to assert some condition (x is a string).
If the assertion is successful, the transform will replace the node with a specific UAST type (string interpolation in this case, if we imagine we define it).
And if the assertion cannot be made (x is an identifier, and the driver won't bother with flow analysis to find out the type), the transform will replace the node with an Ambiguity (either "modulo" or "string interpolation" node).

I think it's a useful change, in general. For most languages, we know 2-3 possible meanings of operators or some identifiers and we can easily be more specific than "unknown binop %" or "identifier true".

Denys Smirnov added 2 commits December 5, 2018 21:29

uast: properly load node arrays with registered types

474baf3

Signed-off-by: Denys Smirnov <denys@sourced.tech>

uast: define Ambiguity node

49571d7

Signed-off-by: Denys Smirnov <denys@sourced.tech>

dennwc self-assigned this Dec 5, 2018

dennwc requested review from creachadair, juanjux and bzz December 5, 2018 20:22

This was referenced Dec 5, 2018

Define Bool node (boolean literal) #338

Merged

Proposal: Allow DAGs in UAST #339

Open

dennwc mentioned this pull request Jan 14, 2019

uast: properly load node arrays with registered types #344

Merged

creachadair removed request for creachadair and juanjux February 20, 2022 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define Ambiguity node #337

Define Ambiguity node #337

dennwc commented Dec 5, 2018 •

edited by creachadair

Loading

juanjux commented Dec 6, 2018 •

edited

Loading

creachadair commented Dec 6, 2018

dennwc commented Dec 6, 2018

Define Ambiguity node #337

Are you sure you want to change the base?

Define Ambiguity node #337

Conversation

dennwc commented Dec 5, 2018 • edited by creachadair Loading

juanjux commented Dec 6, 2018 • edited Loading

creachadair commented Dec 6, 2018

dennwc commented Dec 6, 2018

dennwc commented Dec 5, 2018 •

edited by creachadair

Loading

juanjux commented Dec 6, 2018 •

edited

Loading