feat: make frontend normalize line endings to LF #3903

kmill · 2024-04-14T00:28:55Z

To eliminate parsing differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF, like in Rust.

Effects:

This makes Lake hashes be faithful to what Lean sees (Lake already normalizes line endings before computing hashes).
Docstrings now have normalized line endings. In particular, this fixes #guard_msgs failing multiline tests for Windows users using CRLF.
Now strings don't have different lengths depending on the platform. Before this PR, the following theorem is true for LF and false for CRLF files.

example : "
".length = 1 := rfl

Note: the normalization will take \r\r\n and turn it into \r\n. In the elaborator, we reject loose \r's that appear in whitespace. Rust instead takes the approach of making the normalization routine fail. They do this so that there's no downstream confusion about any \r\n that appears.

Implementation note: the LSP maintains its own copy of a source file that it updates when edit operations are applied. We are assuming that edit operations never split or join CRLFs. If this assumption is not correct, then the LSP copy of a source file can become slightly out of sync. If this is an issue, there is some discussion here.

mhuisi · 2024-04-22T08:33:28Z

src/Lean/Server/Utils.lean

@@ -125,7 +125,7 @@ def applyDocumentChange (oldText : FileMap) : (change : Lsp.TextDocumentContentC
  | TextDocumentContentChangeEvent.rangeChange (range : Range) (newText : String) =>
    replaceLspRange oldText range newText
  | TextDocumentContentChangeEvent.fullChange (newText : String) =>
-    newText.toFileMap
+    newText.crlfToLf.toFileMap


Would it make sense to move crlfToLf to toFileMap?

The replaceLspRange case made me decide not to incorporate crlfToLf into toFileMap. We only want to normalize the line endings of the new text, and it's worth making it explicit where we're normalizing.

There's still something unsatisfactory however, which is that if somehow a \r\n is split and re-merged by edit operations, then we can get into a situation where there's an unnormalized \r\n. I'm not sure if this is an issue that could crop up in practice.

If we're not happy with leaving it in the current form, here are some solutions that came to mind:

Have a second FileMap with the unnormalized text. Make sure LSP operations edit that, and then re-normalize it to create the actual FileMap used by elaboration each time.

Do that, but accelerate it somehow. There's probably some fancy data structure supporting inserting-and-normalizing.

Make the LSP throw an error when it detects a loose \r. It's not clear how to get this information back to the user in a nice way.

The current state seems fine. I'm more worried about accidentally forgetting crlfToLf somewhere in the future, but I agree that this is not that important and explicitly seeing the normalization in the code is helpful, too.

src/Init/Data/String/Extra.lean

To reduce differences between Windows and other platforms, the frontend now normalizes all CRLF line endings to LF. Effects: - Lake's hashes already use normalized line endings. This change makes the hashes be faithful to what Lean sees. - Docstrings are affected by line endings. In particular, this fixes `#guard_msgs` failing multilines tests for Windows users using CRLF. - Now strings don't have different lengths depending on the platform. The following theorem is true for LF and false for CRLF files. ```lean example : " ".length = 1 := rfl ```

leanprover-community-mathlib4-bot · 2024-05-07T19:01:18Z

Mathlib CI status (docs):

❗ Std CI can not be attempted yet, as the nightly-testing-2024-05-07 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Std CI should run now. (2024-05-07 19:01:17)
❗ Mathlib CI can not be attempted yet, as the nightly-testing-2024-05-17 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-mathlib, Mathlib CI should run now. (2024-05-19 17:36:03)
❗ Batteries/Mathlib CI will not be attempted unless your PR branches off the nightly-with-mathlib branch. Try git rebase b278f9dd3096a6f183812b6f9129fc79f8675f00 --onto f53b778c0d4a474383eab709d67db5e12357f39d. (2024-05-20 16:30:51)

tydeu · 2024-05-07T19:09:14Z

Now that we have position termination proofs in core (e.g., termination_by text.utf8ByteSize - pos.byteIdx), it may be worth supporting these automatically through decreasing_trivial. For example, I did this over in my Partax library (code).

src/lake/Lake/Util/Newline.lean

…malize_crlf

Now that the Lean frontend normalises all line endings to LF (since leanprover/lean4#3903), this check is not necessary any more. It is also one fewer Python linter to rewrite in Lean.

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Apr 14, 2024

mhuisi reviewed Apr 22, 2024

View reviewed changes

kmill added 3 commits May 7, 2024 10:31

Add tests

b1464d1

Comments

147cac0

kmill force-pushed the normalize_crlf branch from b064c56 to 147cac0 Compare May 7, 2024 18:44

kmill marked this pull request as ready for review May 7, 2024 18:44

kmill requested review from kim-em, Kha and tydeu as code owners May 7, 2024 18:44

tydeu reviewed May 7, 2024

View reviewed changes

src/lake/Lake/Util/Newline.lean Outdated Show resolved Hide resolved

remove import from Lake test

62adf98

kmill force-pushed the normalize_crlf branch from d283ac3 to 62adf98 Compare May 7, 2024 19:18

Kha approved these changes May 8, 2024

View reviewed changes

mhuisi approved these changes May 8, 2024

View reviewed changes

kmill enabled auto-merge May 19, 2024 17:33

kmill added this pull request to the merge queue May 19, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks May 19, 2024

kmill added the full-ci label May 19, 2024

kmill force-pushed the normalize_crlf branch from 013e7b8 to b901129 Compare May 19, 2024 18:47

kmill added 2 commits May 19, 2024 12:39

separate out decreasing_by proof

a8dcff4

Merge branch 'master' of https://github.com/leanprover/lean4 into nor…

4d69191

…malize_crlf

kmill force-pushed the normalize_crlf branch from b901129 to 4d69191 Compare May 19, 2024 19:41

kmill added 2 commits May 19, 2024 15:02

fix: runParserCategory should use input field for EOF detection

87e6802

Merge branch 'master' of https://github.com/leanprover/lean4 into nor…

d74e1fb

…malize_crlf

kmill added this pull request to the merge queue May 20, 2024

Merged via the queue into leanprover:master with commit a7338c5 May 20, 2024
20 checks passed

grunweg mentioned this pull request May 21, 2024

chore: remove style linter for windows line endings leanprover-community/mathlib4#13088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make frontend normalize line endings to LF #3903

feat: make frontend normalize line endings to LF #3903

kmill commented Apr 14, 2024 •

edited

Loading

mhuisi Apr 22, 2024

kmill May 7, 2024

mhuisi May 8, 2024

leanprover-community-mathlib4-bot commented May 7, 2024 •

edited

Loading

tydeu commented May 7, 2024

feat: make frontend normalize line endings to LF #3903

feat: make frontend normalize line endings to LF #3903

Conversation

kmill commented Apr 14, 2024 • edited Loading

mhuisi Apr 22, 2024

Choose a reason for hiding this comment

kmill May 7, 2024

Choose a reason for hiding this comment

mhuisi May 8, 2024

Choose a reason for hiding this comment

leanprover-community-mathlib4-bot commented May 7, 2024 • edited Loading

tydeu commented May 7, 2024

kmill commented Apr 14, 2024 •

edited

Loading

leanprover-community-mathlib4-bot commented May 7, 2024 •

edited

Loading