Parse function breaks when there’s a line ending in a string #100

ariasuni · 2018-03-20T18:08:27Z

In: tree = parser.parse('var i = "test\nvalue"')
Illegal character '"' at 1:8 after LexToken(EQ,'=',1,6)
Illegal character '"' at 1:19 after LexToken(ID,'value',1,14)

In: tree.to_ecma()
'var i = test;\nvalue;'

The behavior is the same with \r.

The text was updated successfully, but these errors were encountered:

metatoaster · 2018-04-18T22:32:05Z

No, never mind, if an actual newline character occur inside a string token and actual new line, Node.js doesn't even like it either.

$ cat | node
var i = "test
value"
[stdin]:1
var i = "test
        ^^^^^

SyntaxError: Invalid or unexpected token

ES5 (which is what slimit supports) doesn't have multiline strings like Python does, so fortunately for the parsers, this is a a valid syntax error in the provided ES5 script which the parser correctly provided.

However, if you meant to an escaped sequence representing the newline, this will then work (note the raw string prefix r):

>>> from slimit.parser import Parser
>>> print(Parser().parse(r'var i = "test\nvalue"').to_ecma())
var i = "test\nvalue";

ariasuni · 2018-04-19T01:36:57Z

Well I had this problem when trying to scrape information out of a working JavaScript code on a high-traffic website.

metatoaster · 2018-04-19T03:00:29Z

Can you please provide the link to the example that choked?

metatoaster · 2018-04-19T03:26:19Z

Anyway, I do see what you mean - I had mistakenly used my patched version of slimit that correctly reported that as a parsing error. Anyway, the correct behavior with that input should throw a SyntaxError exception, which my patched version (and calmjs.parse) does. The definition in the ECMA-262 specification that states this as an invalid syntax is defined in section 7.8.4 (specifically "A line terminator character cannot appear in a string literal" at the bottom of that section, where a "line terminator" includes newline characters)

To make things most clear, this is the input JavaScript with the invalid syntax:

var i = "test
value"

Assume that input is assigned to program in the following Python code:

>>> from slimit.parser import Parser
>>> parser = Parser()
>>> node = parser.parse(program)
Illegal character '"' at 1:8 after LexToken(EQ,'=',1,6)
Illegal character '"' at 1:19 after LexToken(ID,'value',1,14)
>>> print(node.to_ecma())
var i = test;
value;

This changed the program entirely, as slimit erroneously fully parsed the input without raising an error and produced an incorrect AST, and this is where my initial confusion lied (when I saw the output which I then used as input, then I noticed the quotes on the original input). The correct behavior is implemented in calmjs.parse, which correctly process this as a syntax error:

>>> from calmjs.parse import es5
>>> es5(program)
Traceback (most recent call last):
...
calmjs.parse.exceptions.ECMASyntaxError: Illegal character '"' at 1:9 after '=' at 1:7

ariasuni · 2018-04-19T14:54:36Z

Well, I’m probably mistaken: this should have been a non-working JavaScript extract among the working ones, because I have the same kind of error with newlines inside strings in my web browser’s console.

ariasuni mentioned this issue Mar 20, 2018

Parse function breaks when there’s a line ending in a string scrapinghub/js2xml#32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse function breaks when there’s a line ending in a string #100

Parse function breaks when there’s a line ending in a string #100

ariasuni commented Mar 20, 2018

metatoaster commented Apr 18, 2018 •

edited

Loading

ariasuni commented Apr 19, 2018

metatoaster commented Apr 19, 2018

metatoaster commented Apr 19, 2018 •

edited

Loading

ariasuni commented Apr 19, 2018

Parse function breaks when there’s a line ending in a string #100

Parse function breaks when there’s a line ending in a string #100

Comments

ariasuni commented Mar 20, 2018

metatoaster commented Apr 18, 2018 • edited Loading

ariasuni commented Apr 19, 2018

metatoaster commented Apr 19, 2018

metatoaster commented Apr 19, 2018 • edited Loading

ariasuni commented Apr 19, 2018

metatoaster commented Apr 18, 2018 •

edited

Loading

metatoaster commented Apr 19, 2018 •

edited

Loading