Support for heredoc strings #19

nilium · 2018-11-06T16:57:43Z

Support for heredoc strings (<<TAG ... TAG and <<-TAG .. TAG for verbatim/indented forms, probably) would be handy. In particular, although newlines are allowed in literal strings, this pushes additional parameters of statements and sections out to the lines following the literal. With heredoc support, it's possible to keep a clean statement line and put one or more heredoc bodies after the statement or section (in the case of sections, it looks a little weird when the opening brace is on the same line).

This likely requires modifying the lexer a bit to, probably, keep a stack of unfulfilled heredoc tags after encountering the newline for the line they're declared on. E.g.,

section {
    run-job load-script <<-SCRIPT flag <<SOMETHINGELSE;
        function main()
            print("OK")
        end
        SCRIPT
    This preserves leading whitespace
SOMETHINGELSE
}

In the above case, <<-SCRIPT is terminated when it encounters the sequence "\n" ws* "SCRIPT" ( eof | "\n" ), ignoring any leading whitespace. As part of lexing, after storing the raw string, the lexer would trim the minimum-length whitespace prefix from all lines. This would ignore empty lines. Lines that contain only whitespace would necessarily affect the prefix. For the sake of keeping things simple, this is only the shortest prefix in bytes -- no consideration of tab width or anything else.

The second case, <<SOMETHINGELSE, there is no change between the raw text and the token's string value. No prefix trimming is done, and the ending string must be "\nSOMETHINGELSE" (eof | "\n").

So, basically, when "<<" "-"? tag is encountered, after a newline is encountered, the heredoc string's body begins parsing. If multiple are encountered, then they have to be parsed in order. This affects the order of output of tokens, so after a heredoc token is encountered, it's necessary to buffer up tokens following the start of that token to be returned once the heredoc tag is returned. In the above example, the sequence of tokens, starting with "run-job" and ending with the semicolon, should be:

...
| 05  | TWord          | "run-job"
| 06  | TWhitespace    | " "
| 07  | TWord          | load-script
| 08  | TWhitespace    | " "
| 09  | THeredoc       | "function main()\n    print(\"OK\")\nend\n"
| 10  | TWhitespace    | " "
| 11  | TWord          | "flag"
| 12  | TWhitespace    | " "
| 13  | THeredoc       | "    This preserves leading whitespace\n"
| 14  | TSemicolon     | ";"
| 15  | TWhitespace    | "\n"
...

The text was updated successfully, but these errors were encountered:

nilium added the enhancement New feature or request label Nov 6, 2018

nilium added this to codf Config Language Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for heredoc strings #19

Support for heredoc strings #19

nilium commented Nov 6, 2018

Support for heredoc strings #19

Support for heredoc strings #19

Comments

nilium commented Nov 6, 2018