Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alma's (and Raku's) quasi scoping is significantly different from Scheme's/Lisp's #567

Closed
masak opened this issue Sep 17, 2021 · 16 comments
Closed

Comments

@masak
Copy link
Owner

masak commented Sep 17, 2021

(This issue is a pretext to say hi to @vendethiel. Hi! 👋)

(Issue already resolved and will be closed as effectively WONTFIX in a week or so from its creation. Discussion still encouraged.)

To summarize:

  • In Lisp and Scheme, even when things are nice and lexically scoped, the notion of scope actualizes very late, at evaluation-time. Specifically, it does not exist as part of read-time.

  • In Raku and Alma, scoping is the concern of the parser. That is, the parser is interested not just in parsing a variable/identifier, but also in making sure that it's bound to something in scope.

To take the smallest possible example that highlights this difference:

  • The string "foo" reads just fine into a symbol foo in any Lisp or Scheme, but when you try to evaluate the result, you get a lookup error, because the environment doesn't bind foo.

  • The string "$foo" doesn't parse in Raku, and the string "foo" doesn't parse in Alma. Both parsers will throw X::Undeclared. You would get AST nodes corresponding to a variable lookup if you didn't get the error — but that's a bit like saying that you'd have been in time for work if you weren't in jail for going twice the speed limit.

Maybe one way to say this is that the Raku/Alma parsers emit not just an AST, but a scoped AST. This is not an established term at all, at least I don't think so, but it gets the point across. Raku and Alma's parsers emit scoped ASTs, which in practice means that they have already done an initial preprocessing of the AST, a kind of abstract interpretation or partial evaluation where all the static lookups have already been made.

(As I was typing out this issue, I also learned about abstract binding trees (Edit: changed link), which seems to be a different name for scoped ASTs. It's in Bob Harper's book Practical Foundations for Programming Languages, which I have not read. But it's good to know someone else has had the same idea.) (Edit: There's some concrete implementation information about ABTs in this PFPL supplement, section 3, "Elaboration" (which seems to mean "make an ABT from an AST").)

What's more, this difference "spreads" to quasis. In Raku/Alma, quasis are closures — quite orthogonally from their role as a data encoding of source code. This flows quite naturally from the above design decisions.

The same is not true in Lisp/Scheme. That is to say, a quasiquote containing foo is valid Lisp/Scheme, no matter whether foo was defined or not in the context surrounding the quasiquote. In fact, pointing to the surrounding context feels a little bit random and unrelated! Meanwhile, in Raku, quasi { $foo } is definitely a compile-time error if $foo was not defined outside the quasi; and in Alma, ditto quasi { foo }.

Quoth S06:

Within a quasiquote, variable and function names resolve according
to the lexical scope of the macro definition.  Unrecognized symbols raise
errors when the macro is being compiled, I<not> when it's being used.

"Resolve" is a nice word — it's like a runtime lookup, except that it's static, so what we're looking up isn't a value, it's more like a binder/definition site. A quasi does not have to occur in a macro definition (just like a when doesn't have to occur in a given), but we get the point being made.

Also S06:

Use of a macro argument in a quasiquote without unquoting should provide a
warning, as this is very likely to be an error.

    # Oops; size of the AST of the argument
    macro mouse ($arg) { quasi { $arg.elems } }

I'm not sure I agree that it should provide a warning. "Very likely" is not the same as "definitely"; see here for an example in Alma where I do this quite deliberately, and wouldn't want the warning. (Spec still has a point; it's likely a mistake and a kind of "level mixup". I do that all the time. But Raku as a language tends to be forgiving rather than forbidding in cases of late-bound stuff like this — cf. method calls that are guaranteed to fail at runtime, or return types on routines that conflict with the actual returned value; all of them late-bound and late errors.)

Also, an interesting bit from S06:

Quasiquotes default to hygienic lexical scoping, just like closures.
The visibility of lexical variables is limited to the quasi expression
by default.

I don't know in which sense "hygienic" is invoked here, but "lexical scoping" is easy to interpret. This is the phrase that means that quasis have their own block scope, as has been discussed in #30 (somewhere...).

Anyway, I find it interesting — Lisp and Scheme choose one "obvious" design alternative, where the object code and the meta code don't share any lexical scoping. Raku and Alma choose the other path: they're nested lexically, because that's what it looks like they should be doing.

I wonder if a hidden kind of prior art is the way eval works in Perl 5 — there, we also get lexical-lookup errors at compile time.

(Later edit: I also wonder if it's kind of a timing issue, with lexical scoping itself. Quotation is from the very beginnings of Lisp; quasiquotation from the, uh, mid-70s, and lexical scoping got going somewhere around that time as well. It's possible quasiquotation (a) was mostly based on quoting, which doesn't have a notion of lexical scope at all, and (b) was also early enough to be mostly outside of the light-cone of lexical scoping, so it didn't occur to people as an idea at all. This is all assuming that it even needs an explanation in the Lisp world — when you quote something, it's data. It gets read but not processed or otherwise given any meaning before being evaluated.)

What's your view on this, @vendethiel? I know you've hinted at this difference a few times. I think I finally understand it better now. It's quite an obvious difference, even though neither Scheme nor Raku seems to call it out as one.

@masak
Copy link
Owner Author

masak commented Sep 18, 2021

Heh, vendethiel's view on this is apparently the "shifty eyes" emoji. Fair enough.

@masak
Copy link
Owner Author

masak commented Sep 18, 2021

Five years ago I kvetched about the unquotes in Raku not being able to "nest" the way they can in Lisp. That is, you can nest quasis just fine, but there's no such thing as a nested unquote.

I just realized two things:

  • It doesn't really detract from the expressive power of quasis and unquotes, because you can always rewrite No, that's not it. I still don't have a working intuition for nested unquotes in the setting Alma establishes. I understand them pretty well in Lisp/Scheme! I'll get back to this point when I've thought it through more.
  • Even though Raku/Alma have a smaller ability to reach "outer" names due to nested unquotes not being a thing, they also have a greater ability to reach outer names due to the natural lexical scoping outlined in this issue, which Raku/Alma quasis have but Lisp/Scheme quasiquotes don't.

@masak
Copy link
Owner Author

masak commented Sep 18, 2021

Also, looking with fresh eyes at @vendethiel's cont-defun example, I see that unquote doesn't suffer from the "no nested unquotes" limitation the way {{{ }}} does. Five years is too long ago, so I'm not sure how well I realized that back then. I did write that I didn't see why we should stick to the {{{ }}} syntax, but I don't know if that also meant that I saw that unquote was less limited.

macro cont-defun(Q::Identifier $name, Q::ParamList $parms, Q::Block $body) {
  my $new-name = $name.scope.declare("cont-$name.bare()");
  quasi {
    macro unquote(Q::Identifier @ $name)(unquote(Q::ParamList @ $parms)) { # should the ParamList unquote be bare, without parenthesis?
      quasi {  # an unquote inside of another one
        my $f = unquote(Q::Identifier @ unquote(Q::Identifier @ $new-name)); # ... just to make this more readable
        $f(unquote(Q::ParamList @ unquote(Q::ParamList @ $parms))); # same q. for ParamList
      };
    }; # end of inner macro
    sub unquote(Q::Identifier @ $new-name)(unquote(Q::ParamList @ $parms))
      unquote-unhygienic(Q::Block @ $block); # mmh... needs that "unhygienic" so that `$parms` are visible.
  };
}

Going to open up a separate issue about adding a nested unquote to the test suite, just to have it in there. That's going to require the unquote syntax, but... we might be able to hack that in.

@vendethiel
Copy link
Collaborator

vendethiel commented Sep 18, 2021

Heh, vendethiel's view on this is apparently the "shifty eyes" emoji. Fair enough.

More so "I need some rest before looking at this properly" :-). (Which means of course that four and a half in the morning is a fitting time to answer... Also, it's been too long since we last talked!).

It does resonate with me, though. I think this was my biggest hurdle when trying to understand Raku macros.

I think the reason there's not that much to read on the difference is because macros have mostly been implemented in late-bound languages.

Even though Raku/Alma have a smaller ability to reach "outer" names due to nested unquotes not being a thing, they also have a greater ability to reach outer names due to the natural lexical scoping outlined in this issue, which Raku/Alma quasis have but Lisp/Scheme quasiquotes don't.

Non-hygienic lisps and hygienic lisps have wildly different stories here. The ones that are hygienic have a smaller ability to reach "outer" names like Raku/Alma, but it might also mean they require less unquoting (scheme templates).

@masak
Copy link
Owner Author

masak commented Sep 18, 2021

Also, it's been too long since we last talked!

Yes. 🎉👯‍♂️🚀 Hi! 🦄😎🤗

I think I "burned out" there on Alma for a bit — a lot of the obvious next steps with the language are big huge undertakings the size of #242. (They are listed here, by the way.) I am still on a bit of a quest to Make Bel Go Fast, but hoping to be able to put that aside and come back to Alma's Phase II sometime Soon™. I haven't forgotten the walker we started building — I think that one is not far from being practically useful.

I think the reason there's not that much to read on the difference is because macros have mostly been implemented in late-bound languages.

This is something I too have understood better lately. The whole subfield of Denotational Semantics seems to be about researchers asking "how does it work?" and then answering that question by implementing an interpreter for the semantics.

Much of my $dayjob has been focused around the question of "what can we shift left?". That is, which late-bound things are possible to bind early without changing the semantics. This pulls together such disparate topics as compiler errors/warnings, IDE errors/warnings, abstract interpretation, partial evaluation, staged compilation, and heck, probably Futamura's Three Ghosts of Christmas as well. The question seems to amount to "if this is the interpreter, what would be the compiler?". (This is also the question that drives the work on the Bel compiler.)

Raku/Alma's "static resolution" mechanism is the left-shiftable part of regular runtime lookup. The "lexical" of "lexical scoping" already hints that this is possible: that there is a part of the scoping that's evident from the program text itself, and hence early/static.

Non-hygienic lisps and hygienic lisps have wildly different stories here. The ones that are hygienic have a smaller ability to reach "outer" names like Raku/Alma, but it might also mean they require less unquoting (scheme templates).

I still need to understand this better. At least lately I've actually been starting to read and process/classify all the macro/hygiene papers I've collected over the years. It's brain-bashingly hard, but also rewarding.

four and a half in the morning

🙀

@masak
Copy link
Owner Author

masak commented Sep 18, 2021

  • It doesn't really detract from the expressive power of quasis and unquotes, because you can always rewrite No, that's not it. I still don't have a working intuition for nested unquotes in the setting Alma establishes. I understand them pretty well in Lisp/Scheme! I'll get back to this point when I've thought it through more.

TimToady's restriction: if an outer quasi < ... > contains an inner quasi ⟦ ... ⟧, then you can use ⟦⟦⟦ ... ⟧⟧⟧ unquotes but not <<< ... >>> unquotes in the inner quasi. The reason for this is that even though the quasis nest, the parser is only ever "in" at most one quasi at a time, keeping track of its stopper and its unquote syntax. Inner quasis do not syntactically have the power to use outer unquotes.

The quasis-are-lambdas view: A quasi represents "an AST with holes", but the concept of holes is fundamentally a negative-space concept — instead of thinking of holes as a Qtype, we should think of the whole quasi as being a lambda-expression/anonymous function for building a complete AST where the holes correspond to parameters. Getting a concrete AST from the quasi happens by invoking the lambda-expression, passing an argument (the expression inside of an unquote) for each parameter.

The lambda-expression approach should be read as a conceptual abstraction — on the implementation level, the quasi can be code-generated into something like an anonymous function. But it's not first-class and cannot be passed around or otherwise observed from userspace — any attempt to evaluate the quasi just returns the result of calling the anonymous function. A lambda-expression is sufficient but not necessary, I guess. I take this to be what S04 talks about with "When is a closure not a closure".

All this skirts around the question of what the AST representation of a quasi should be. As metaprogrammers, we are morbidly interested in having an AST representation of code in general and quasis in particular, to analyze and transform. Saying that unquotes are a non-Qtype then constitutes a bit of a non-answer, or at least an unhelpful/unsatisfactory answer.

Assume we have this Alma code (with unquote() so that we can talk about nested unquotes):

macro moo(x) {
    return quasi {
        quasi {
            unquote(unquote(x))
        }
    };
}

macro zar(y) {
    my qua = moo(y);
    return qua;
}

say( zar("OH HAI") );

At compile-time zar gets invoked with y bound to "OH HAI", then moo gets invoked with x bound to "OH HAI", then... moo returns an AST that (somehow) represents the inner quasi, the one with the unquote-unquote.

So moo acts to "peel off" the outer layer of quasi, and then zar acts to peel off the inner layer. That's not super-precise; a quote, just by evaluating, already collapses down to an AST. It masquerades as code, but its value is actually data. The macro expansion is responsible for "up-leveling" that data to code, by expanding it into (in this case) the my qua rhs and the say argument.

I think unquote(unquote(x)) makes sense to me now. Consider what just unquote(x) would give in this context: it would give the AST passed to moo from zar — that's an AST looking like what quasi { y } would give us at that point. (Macro arguments get "auto-quoted".) Then that would also be what say would print in the end: a representation of that AST. The doubled unquote, then, is because we are dealing with a double layer of ASTs to unpack.

This is how far my insufficient brain will carry me. What I can't get it to answer is — what does this all mean if quasis are lambdas? The quasi in a quasi would be a lambda in a lambda, surely. But the double unquote construction corresponds to... what? Some kind of double evaluation, except... the macro isn't really evaluating things, it's just arranging for code to be evaluated by splicing together ASTs from different places. What's the AST-splicing equivalent of "insert not the AST itself here, but the AST that comes from evaluating this AST"?

This comment ran away from me a bit. I was going to end up with a demonstration of how unquote(unquote(x)) doesn't even make any sense in Raku/Alma's view of scoping — and how that has some nice connection back to TimToady's restriction. But instead unquote(unquote(x)) seems to have some kind of reasonable semantics associated with it... it's just that I can't imagine how it's be expressed.

@masak
Copy link
Owner Author

masak commented Sep 24, 2021

Closing.

Maybe the summary is this: quasiquotes in Scheme are more "quote", and those in Alma/Raku are more "quasi".

@masak masak closed this as completed Sep 24, 2021
@masak
Copy link
Owner Author

masak commented Oct 13, 2021

I just learned from this blog post that the whole "code objects are closures" thing is discussed in Quasiquotation in Lisp (1999) by Alan Bawden.

Any practical system for staged computation will probably have the property Taha and Sheard call “cross-stage persistence.” A system has cross-stage persistence if variables bound in earlier stages can still be referenced in later stages. […] One of the reasons our first version of make-list-proc failed was because Lisp’s quasiquote lacks cross-stage-persistence. […] With cross-stage persistence, variables in constructed code can reference variables from the lexically enclosing environment. In such a system the code objects returned by a quasiquotation must be more than a simple S-expression-like data structure; in order to capture the lexical environment where the quasiquote expression was written, code objects must be closures.

@vendethiel
Copy link
Collaborator

I think unquote(unquote(x)) makes sense to me now.

It's starting to make less and less sense to me, in the context of Alma and Raku.
I've had that sentiment since I first read your message, but I wasn't sure how to formulate it, your last quote helped me find a term I lacked.

"unquote" is about splicing an element in, sure, but if it were only that we could get away with a (eval ast) (or (melt ast) like we had previously).

I think unquote is actually more about reaching across a different stage. Since there's no "cross stage persistence", you can't just type x and expect it to refer to the correct symbol. This is obviously directly tied to hygiene (there's not much more info required for cross stage persistence than there is for hygiene, I'd say probably less).

@masak
Copy link
Owner Author

masak commented Oct 14, 2021

@vendethiel How marvellous — I had thought about this too, and come to diametrically opposite conclusions.

Very briefly, because I don't have the headspace right now:

  • But Raku and Alma do have "cross stage persistence", do they not? I mean, that's what this whole thing is about, macros running at compile time (stage 0) and binding values which can then be consumed at runtime (stage 1). And Alan Bawden's "code objects must be closures" rings very familiar (I had this insight at a hotel breakfast in Copenhagen in ~2013). I have chosen the implementation strategy "detached variables do direct lookup", but I think it's in some sense isomorphic to what Bawden is saying.

  • I realized another thing about ,,x — that is, unquoting something twice. Let's say we're in two quasi blocks, like so: quasi < quasi ⟦ ... ⟧ >. The ,,x Lisp syntax would correspond to ⟦⟦⟦ <<< $x >>> ⟧⟧⟧ in Raku. This is so brain-bendy to think about that I wrote it inside-out on my first attempt. Rationale: first we need to unwrap the inner quasi ⟦ ... ⟧ (ending up in the "syntactic context" outside it but still inside the outer one), and then we need to unwrap the outer quasi < ... >. I use "first" and "then" very provocatively, and it's important to remember that we are in dialogue with the parser (whose notion of order is left-to-right), not the evaluator (whose notion of order is bottom-up/innermost-first). All in all, it now seems to me that TimToady was right this whole time — it just took me ~8 years to figure out why. What's worse, he was right on purely syntactic/parser-based grounds, not needing to reason his way through it like I just did, but just looking at what is parser (STD.pm6) could and could not do in such a context.

Um, I don't know if TimToady would consider ⟦⟦⟦ <<< $x >>> ⟧⟧⟧ any more workable than just <<< $x >>>. Maybe not. If I have a chance to ask him at some point, I will.

@vendethiel
Copy link
Collaborator

vendethiel commented Oct 14, 2021

  • But Raku and Alma do have "cross stage persistence", do they not?

They do. That's why I'm unsure double unquote is as important as it is in i.e. Common Lisp. I'm not even sure you absolutely need unquote at all, except for syntactic purposes (though auto-unquote of an ast would feel very magical).

@masak
Copy link
Owner Author

masak commented Oct 15, 2021

Scala macros and Scala's LMS both have auto-unquote, to great effect. But my personal conclusion is that they get away with this because their macros are operating on type-checked ASTs, after type checking/inference. If you know that x is an Expr<Integer> and not just an Integer, you can auto-unquote x accurately/reliably.

It's a neat idea, and I'm definitely jealous. Then again, I don't think Raku (or Alma) would be willing to pay the price involved in pushing types that deeply into macros/quotation.

@vendethiel
Copy link
Collaborator

It's a neat idea, and I'm definitely jealous. Then again, I don't think Raku (or Alma) would be willing to pay the price involved in pushing types that deeply into macros/quotation.

Well, you could make it magical but it feels a bit wrong.

Scheme templates have no "normal values" so they can auto-unquote (and they don't have an explicit unquote mechanism).

Still, I'm unsure what use there is for unquote(unquote(x)) in Alma.

@masak
Copy link
Owner Author

masak commented Nov 4, 2021

I like where we ended up in this issue. I just found another "data point" of sorts.

In Everything old is new again: Quoted domain specific languages, Philip Wadler talks about using quasiquotation to build a DSL for LINQ-like queries, which are then optimized and sent to a database engine. Around the 23-minute mark, though, he mentions a "use case" for quasiquotation and unquoting that we haven't brought up: using a variable w (without unquoting it), but that variable was declared in a surrounding quote — that is: quote( declare w; unquote( quote( use w ) ) ).

Three things, in no particular order:

  • This is the kind of thing I could do (and probably have done) in Bel without a problem. Same thing with Common Lisp. The reason being, of course, that those Lisps have no notion of binding in quotes (and no cross-stage persistence), and in this case they bumble forward without a care in the world, and things accidentally work out. Yay lack of hygiene!
  • If I'm thinking about this right, cross-stage persistence won't save us here either. It's not powerful enough. Cross-stage persistence helps a definition from outside the quote be visible inside the quote — but in this case, the definition is not just outside the quote in question, it's also inside another one. The fact that it's lexically nested around the quote in question means bupkis, scoping-wise.
  • This seems like a reasonable thing to want. But instead of making me more certain what would be "a nice semantics" for all this, it just makes me painfully aware that there's yet another dimension to an issue which is already pretty weird and hard to think about.

Apparently this (quote-unquote-quote) works in MetaOCaml. I should try MetaOCaml.

@masak
Copy link
Owner Author

masak commented Dec 18, 2023

At compile-time zar gets invoked with y bound to "OH HAI", then moo gets invoked with x bound to "OH HAI", then... moo returns an AST that (somehow) represents the inner quasi, the one with the unquote-unquote.

This is wrong, although past-masak is making a mistake here that is very easy to make with macro calls. I don't believe it has a name, but if it did, it should be something like "confusing AST with values".

zar indeed gets invoked with y bound to (the AST) "OH HAI", but "then" is not correct since the macro call to moo was above the one to zar in the source code. moo gets invoked (before) with x bound to (the AST) y. Specifically, "OH HAI" doesn't propagate in through two layers as intended, due to timing issues. I haven't looked at what would be required to fix this issue, I just now noticed the thinko itself.

@masak
Copy link
Owner Author

masak commented Dec 26, 2023

Just found this, which unnerves me slightly:

Re: "the Lisp way" is more than one way, unfortunately. It's amazing how difficult a problem is nested quasiquotation. R5RS Scheme and Common LISP handle it slightly differently, which I learned after an argument with R. Kent Dybvig (who was clearly right!). Having a CL background, I was arguing that my code, based on the appendix in CLtL2 with corrections vetted by Steele, was correct, and learned that the semantics in R5RS differed. This Gauche blog entry describes a similar discrepancy.

I wonder if this is connected to masak/bel#427 — not sure, but it might well be.

Edit: A later comment also feels insightful:

First there is the metamacro case where macros expand into macro definitions. Second there is the case where a quasiquoted expression is used as an argument to another quasiquote.

You have to be careful, especially if your lisp allows variable captures, or you'll wind up evaluating something with respect to the wrong environment.

I think that those two identified cases are really good to use as litmus tests/test cases to think through in order to evaluate one's quasiquote design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants