Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve backslash in raw string literal #6152

Merged
merged 7 commits into from
Jul 31, 2023

Conversation

harupy
Copy link
Contributor

@harupy harupy commented Jul 28, 2023

Summary

Fix #5941

Test Plan

Existing tests

@github-actions
Copy link
Contributor

github-actions bot commented Jul 28, 2023

PR Check Results

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00      9.0±0.05ms     4.5 MB/sec    1.00      9.0±0.06ms     4.5 MB/sec
formatter/numpy/ctypeslib.py               1.00   1730.8±7.25µs     9.6 MB/sec    1.00  1726.8±15.26µs     9.6 MB/sec
formatter/numpy/globals.py                 1.00    181.1±0.53µs    16.3 MB/sec    1.00    180.2±0.88µs    16.4 MB/sec
formatter/pydantic/types.py                1.01      3.8±0.01ms     6.7 MB/sec    1.00      3.8±0.02ms     6.7 MB/sec
linter/all-rules/large/dataset.py          1.02     12.0±0.12ms     3.4 MB/sec    1.00     11.8±0.12ms     3.5 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      3.0±0.02ms     5.5 MB/sec    1.00      3.0±0.01ms     5.6 MB/sec
linter/all-rules/numpy/globals.py          1.00    325.1±1.64µs     9.1 MB/sec    1.00    326.4±0.92µs     9.0 MB/sec
linter/all-rules/pydantic/types.py         1.03      5.5±0.05ms     4.7 MB/sec    1.00      5.3±0.02ms     4.8 MB/sec
linter/default-rules/large/dataset.py      1.04      6.7±0.03ms     6.0 MB/sec    1.00      6.5±0.03ms     6.3 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.03   1314.0±2.36µs    12.7 MB/sec    1.00   1278.9±2.27µs    13.0 MB/sec
linter/default-rules/numpy/globals.py      1.04    129.8±0.63µs    22.7 MB/sec    1.00    125.2±0.35µs    23.6 MB/sec
linter/default-rules/pydantic/types.py     1.03      2.9±0.01ms     8.8 MB/sec    1.00      2.8±0.01ms     9.1 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.00     14.0±0.77ms     2.9 MB/sec    1.02     14.3±0.71ms     2.8 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.6±0.14ms     6.3 MB/sec    1.02      2.7±0.11ms     6.2 MB/sec
formatter/numpy/globals.py                 1.00   298.1±16.71µs     9.9 MB/sec    1.00   297.3±17.43µs     9.9 MB/sec
formatter/pydantic/types.py                1.00      5.6±0.24ms     4.5 MB/sec    1.04      5.9±0.28ms     4.3 MB/sec
linter/all-rules/large/dataset.py          1.04     20.1±0.90ms     2.0 MB/sec    1.00     19.2±0.97ms     2.1 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.01      4.9±0.29ms     3.4 MB/sec    1.00      4.9±0.26ms     3.4 MB/sec
linter/all-rules/numpy/globals.py          1.00   577.0±41.43µs     5.1 MB/sec    1.05   606.8±35.50µs     4.9 MB/sec
linter/all-rules/pydantic/types.py         1.00      8.5±0.55ms     3.0 MB/sec    1.01      8.6±0.47ms     3.0 MB/sec
linter/default-rules/large/dataset.py      1.04     11.0±0.46ms     3.7 MB/sec    1.00     10.6±0.53ms     3.8 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.03      2.2±0.11ms     7.6 MB/sec    1.00      2.1±0.09ms     7.9 MB/sec
linter/default-rules/numpy/globals.py      1.00   238.4±11.41µs    12.4 MB/sec    1.04   247.2±13.92µs    11.9 MB/sec
linter/default-rules/pydantic/types.py     1.01      4.6±0.17ms     5.6 MB/sec    1.00      4.5±0.26ms     5.6 MB/sec

@@ -468,7 +476,7 @@ fn normalize_string(input: &str, quotes: StringQuotes) -> (Cow<str>, ContainsNew
} else if c == '\n' {
newlines = ContainsNewlines::Yes;
} else if !quotes.triple {
if c == '\\' {
if !is_raw && c == '\\' {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to move this check to line 490 instead because we need to make sure that quotes are properly escaped. Can you add the following test

r'It\'s normalizing \' and " quotes'

This should be formatted as:

r"It's normalizing ' and \" quotes"r

Copy link
Contributor Author

@harupy harupy Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaReiser r'It\'s normalizing \' and " quotes' is not equivalent to r"It's normalizing ' and \" quotes":

>>> r'It\'s normalizing \' and " quotes'
'It\\\'s normalizing \\\' and " quotes'
>>> r"It's normalizing ' and \" quotes"
'It\'s normalizing \' and \\" quotes'
>>> r'It\'s normalizing \' and " quotes' == r"It's normalizing ' and \" quotes"
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I now played with your PR and found an example that produces invalid syntax:

# Input
r'Not-so-tricky "quote \'\''

# Ruff
r"Not-so-tricky "quote \'\'"

# Black
r'Not-so-tricky "quote \'\''

Note how Ruff changes the quotes from ' to " but fails to escape the ".

We need to play a bit more with black to understand how black determines the preferred quotes for raw strings, and how the normalization has to work. Maybe @konstin knows more, because I'm not that familiar with Python and I must say, the escaping logic behind raw strings is confusing to me (you have to escape quotes)

Copy link
Contributor Author

@harupy harupy Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the investigation. I'm reading black's source code. It looks like black returns the original string if it contains unescaped opposite quotes:

https://github.com/psf/black/blob/1a972e3e11b144912155babdf48ff23d68059d57/src/black/strings.py#L201

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i find python's raw string escaping rules confusing and i think there are cases that are just not properly representable (as the case above where black returns)

@konstin konstin added the formatter Related to the formatter label Jul 31, 2023
@MichaReiser MichaReiser enabled auto-merge (squash) July 31, 2023 12:44
@MichaReiser
Copy link
Member

Thanks for implementing Raw-strings. Something I entirely overlooked.

konstin added a commit that referenced this pull request Jul 31, 2023
**Summary** Print the errors when the formatter ecosystem checks failed.
Im not happy that we current collect the log in the first place, but
this is the less invasive change and we need it to unblock reviewing
#6152.

**Test Plan**
https://github.com/astral-sh/ruff/actions/runs/5713112075/job/15477879403?pr=6188
@konstin
Copy link
Member

konstin commented Jul 31, 2023

The formatter ecosystem checks are currently failing in CI. You can run the script locally (scripts/formatter_progress.sh). You can also merge main where to get the fix from #6187 that will print the ecosystem check results in CI also.

@MichaReiser MichaReiser merged commit 0274de1 into astral-sh:main Jul 31, 2023
15 of 16 checks passed
@harupy harupy deleted the preserve-backslash branch July 31, 2023 13:02
@harupy
Copy link
Contributor Author

harupy commented Jul 31, 2023

@MichaReiser @konstin Thanks for reviewing this PR :)

@konstin
Copy link
Member

konstin commented Jul 31, 2023

Fixed the end-of-string unescaped quote in #6202

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve single \ in raw string literals
3 participants