Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for heredoc strings #19

Open
nilium opened this issue Nov 6, 2018 · 0 comments
Open

Support for heredoc strings #19

nilium opened this issue Nov 6, 2018 · 0 comments
Labels
enhancement New feature or request

Comments

@nilium
Copy link
Owner

nilium commented Nov 6, 2018

Support for heredoc strings (<<TAG ... TAG and <<-TAG .. TAG for verbatim/indented forms, probably) would be handy. In particular, although newlines are allowed in literal strings, this pushes additional parameters of statements and sections out to the lines following the literal. With heredoc support, it's possible to keep a clean statement line and put one or more heredoc bodies after the statement or section (in the case of sections, it looks a little weird when the opening brace is on the same line).

This likely requires modifying the lexer a bit to, probably, keep a stack of unfulfilled heredoc tags after encountering the newline for the line they're declared on. E.g.,

section {
    run-job load-script <<-SCRIPT flag <<SOMETHINGELSE;
        function main()
            print("OK")
        end
        SCRIPT
    This preserves leading whitespace
SOMETHINGELSE
}

In the above case, <<-SCRIPT is terminated when it encounters the sequence "\n" ws* "SCRIPT" ( eof | "\n" ), ignoring any leading whitespace. As part of lexing, after storing the raw string, the lexer would trim the minimum-length whitespace prefix from all lines. This would ignore empty lines. Lines that contain only whitespace would necessarily affect the prefix. For the sake of keeping things simple, this is only the shortest prefix in bytes -- no consideration of tab width or anything else.

The second case, <<SOMETHINGELSE, there is no change between the raw text and the token's string value. No prefix trimming is done, and the ending string must be "\nSOMETHINGELSE" (eof | "\n").

So, basically, when "<<" "-"? tag is encountered, after a newline is encountered, the heredoc string's body begins parsing. If multiple are encountered, then they have to be parsed in order. This affects the order of output of tokens, so after a heredoc token is encountered, it's necessary to buffer up tokens following the start of that token to be returned once the heredoc tag is returned. In the above example, the sequence of tokens, starting with "run-job" and ending with the semicolon, should be:

...
| 05  | TWord          | "run-job"
| 06  | TWhitespace    | " "
| 07  | TWord          | load-script
| 08  | TWhitespace    | " "
| 09  | THeredoc       | "function main()\n    print(\"OK\")\nend\n"
| 10  | TWhitespace    | " "
| 11  | TWord          | "flag"
| 12  | TWhitespace    | " "
| 13  | THeredoc       | "    This preserves leading whitespace\n"
| 14  | TSemicolon     | ";"
| 15  | TWhitespace    | "\n"
...
@nilium nilium added the enhancement New feature or request label Nov 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

No branches or pull requests

1 participant