Skip partial duplicates when applying multi-edit fixes #6144

charliermarsh · 2023-07-28T04:18:57Z

Summary

Right now, if we have two fixes that have an overlapping edit, but not an identical set of edits, they'll conflict, causing us to do another linter traversal. Here, I've enabled the fixer to support partially overlapping edits, which (as an example) let's us greatly reduce the number of iterations required in the test suite.

The most common case here is that in which a bunch of edits need to import some symbol, and then use that symbol, but in different ways. In that case, all edits will have a common fix (to import the symbol), but deviate in some way. With this change, we can do all of those edits in one pass.

Note that the simplest way to enable this was to store sorted edits on Fix. We don't allow modifying the edits on Fix once it's constructed, so this is an easy change, and allows us to avoid a bunch of clones and traversals later on.

Closes #5800.

charliermarsh · 2023-07-28T04:19:36Z

crates/ruff_diagnostics/src/fix.rs

+#[derive(Debug, Clone)]
+pub struct Edits<'a>(Cow<'a, [Edit]>);
+
+impl<'a> Edits<'a> {


I'm disappointed with what's happening here. The ergonomics of it are fine, but it's so inefficient. inorder does a full clone, retain does a full clone, etc.

charliermarsh · 2023-07-28T04:19:48Z

crates/ruff/src/test.rs

@@ -85,7 +85,7 @@ pub fn test_snippet(contents: &str, settings: &Settings) -> Vec<Message> {
 }

 thread_local! {
-    static MAX_ITERATIONS: std::cell::Cell<usize> = std::cell::Cell::new(30);
+    static MAX_ITERATIONS: std::cell::Cell<usize> = std::cell::Cell::new(8);


Some payoff.

charliermarsh · 2023-07-28T04:20:00Z

crates/ruff/src/message/diff.rs

-            .fix
-            .edits()
-            .iter()
-            .sorted_unstable_by_key(|edit| edit.start())


It does feel good to DRY this up though -- this is now a method on Edits.

github-actions · 2023-07-28T04:45:21Z

PR Check Results

Ecosystem

✅ ecosystem check detected no changes.

Benchmark

Linux

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.02      8.6±0.20ms     4.7 MB/sec    1.00      8.4±0.13ms     4.8 MB/sec
formatter/numpy/ctypeslib.py               1.01  1662.2±29.19µs    10.0 MB/sec    1.00  1647.3±22.76µs    10.1 MB/sec
formatter/numpy/globals.py                 1.00    180.0±3.23µs    16.4 MB/sec    1.01    181.1±4.41µs    16.3 MB/sec
formatter/pydantic/types.py                1.00      3.6±0.05ms     7.1 MB/sec    1.00      3.6±0.05ms     7.1 MB/sec
linter/all-rules/large/dataset.py          1.00     11.1±0.08ms     3.7 MB/sec    1.00     11.1±0.08ms     3.7 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      2.8±0.02ms     5.9 MB/sec    1.00      2.8±0.01ms     5.9 MB/sec
linter/all-rules/numpy/globals.py          1.00    383.1±1.62µs     7.7 MB/sec    1.00    383.7±0.58µs     7.7 MB/sec
linter/all-rules/pydantic/types.py         1.00      5.0±0.05ms     5.1 MB/sec    1.02      5.1±0.11ms     5.0 MB/sec
linter/default-rules/large/dataset.py      1.00      6.0±0.02ms     6.8 MB/sec    1.00      6.0±0.04ms     6.8 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1251.9±17.78µs    13.3 MB/sec    1.00  1251.7±18.74µs    13.3 MB/sec
linter/default-rules/numpy/globals.py      1.01    136.6±0.18µs    21.6 MB/sec    1.00    135.5±0.96µs    21.8 MB/sec
linter/default-rules/pydantic/types.py     1.00      2.6±0.03ms     9.7 MB/sec    1.01      2.6±0.06ms     9.6 MB/sec

Windows

group                                      main                                   pr
-----                                      ----                                   --
formatter/large/dataset.py                 1.01     10.6±0.19ms     3.8 MB/sec    1.00     10.5±0.13ms     3.9 MB/sec
formatter/numpy/ctypeslib.py               1.00      2.0±0.04ms     8.2 MB/sec    1.00      2.0±0.06ms     8.2 MB/sec
formatter/numpy/globals.py                 1.00    218.4±6.69µs    13.5 MB/sec    1.01    221.3±6.65µs    13.3 MB/sec
formatter/pydantic/types.py                1.01      4.4±0.09ms     5.8 MB/sec    1.00      4.4±0.08ms     5.8 MB/sec
linter/all-rules/large/dataset.py          1.00     14.2±0.24ms     2.9 MB/sec    1.00     14.2±0.15ms     2.9 MB/sec
linter/all-rules/numpy/ctypeslib.py        1.00      3.7±0.04ms     4.6 MB/sec    1.00      3.7±0.06ms     4.5 MB/sec
linter/all-rules/numpy/globals.py          1.00    444.7±8.76µs     6.6 MB/sec    1.00   444.3±11.36µs     6.6 MB/sec
linter/all-rules/pydantic/types.py         1.00      6.3±0.13ms     4.0 MB/sec    1.00      6.4±0.12ms     4.0 MB/sec
linter/default-rules/large/dataset.py      1.00      7.8±0.11ms     5.2 MB/sec    1.00      7.8±0.11ms     5.2 MB/sec
linter/default-rules/numpy/ctypeslib.py    1.00  1564.1±23.52µs    10.6 MB/sec    1.00  1557.2±29.38µs    10.7 MB/sec
linter/default-rules/numpy/globals.py      1.00    170.1±3.28µs    17.3 MB/sec    1.01    171.9±5.03µs    17.2 MB/sec
linter/default-rules/pydantic/types.py     1.00      3.3±0.03ms     7.6 MB/sec    1.01      3.4±0.05ms     7.6 MB/sec

dhruvmanila

I like the Edits interface! :)

dhruvmanila · 2023-07-28T05:19:59Z

crates/ruff/src/autofix/mod.rs

+        // Remove any edits that were applied as part of a previous fix.
+        let mut edits = fix.edits();
+        edits.retain(|edit| !applied.contains(edit));


Do you think it's feasible to create a Vec<&Edit> which needs to be applied? i.e., filtering out the Edit which has been applied but not in place. This would avoid cloning the Edit.

MichaReiser

This is an excellent observation.

I think we can implement this in a way that significantly reduces the time we iterate over edits and removes the in place mutation by

Sort the edits of a fix
Skip over all edits of that fix that have already been applied
Take the start position of the first edit, this is the min_start and test if the fix doesn't overlap with another fix
Apply the fix(es)

I think this should simplify the implementation

MichaReiser · 2023-07-28T05:38:49Z

crates/ruff_diagnostics/src/fix.rs

+
+/// A collection of [`Edit`] elements to be applied to a source file.
+#[derive(Debug, Clone)]
+pub struct Edits<'a>(Cow<'a, [Edit]>);


Nit: It should be fine to restrict Edits to &'a [Edit]. It doesn't have to be aware that some usages will use a Cow to avoid clones:

pub struct Edits<'a>(&'a [Edit]); impl<'a> Edits<'a> { pub fn from_slice(edits: &'a [Edit]) -> Self {...} pub fn as_slice(&self) -> &[Edit] { self.0 } } # usage let mut plain_edits = Cow::Borrowed(fix.edits().as_slice()); if needs_mutating { plain_edits = Cow::Owned(vec![edit]); } let edits = Edits::from_slice(&plain_edits);

MichaReiser · 2023-07-28T05:39:31Z

crates/ruff_diagnostics/src/fix.rs

@@ -132,8 +134,8 @@ impl Fix {
    }

    /// Return a slice of the [`Edit`] elements in the [`Fix`].
-    pub fn edits(&self) -> &[Edit] {
-        &self.edits
+    pub fn edits(&self) -> Edits {


The methods above (like min_start) should now use self.edits().min_start() to avoid duplication.

Ah I actually ended to remove that entirely.

MichaReiser · 2023-07-28T05:40:50Z

crates/ruff_diagnostics/src/fix.rs

+    pub fn inorder(&self) -> impl IntoIterator<Item = Edit> {
+        let mut edits = self.to_vec();
+        edits.sort_by_key(Edit::start);
+        edits.into_iter()
+    }


Nit: I would call this method to_sorted to clarify that this operation is potentially expensive.

MichaReiser · 2023-07-28T05:51:21Z

crates/ruff_diagnostics/src/fix.rs

+    where
+        F: FnMut(&Edit) -> bool,
+    {
+        self.0.to_mut().retain(f);


This eagerly clones the edits even if f never returns false.

We can avoid this by iterating manually and only take the slow path that allocates a new Vec if the predicate returns false

let mut edits = self.0.iter().enumerate(); let last_index = loop { match edits.next() { Some((index, edit)) => { if !f(edit) { break index; } }, None => { return Cow::Borrowed(self.edits); } } let mut edits = Vec::from_slice(&self.edits[..last_index]); edits.extend(self.edits[last_index + 1..].iter().filter(f).cloned()); // TODO requires bounds checks. Cow::Owned(edits)

Part of what I meant when I commented that "I'm disappointed with what's happening here." 😂

MichaReiser · 2023-07-28T05:52:20Z

crates/ruff_diagnostics/src/fix.rs

+        edits.into_iter()
+    }
+
+    /// Filter the [`Edit`] elements in the [`Fix`] by the given predicate.


Suggested change

/// Filter the [`Edit`] elements in the [`Fix`] by the given predicate.

/// Retains the [`Edit`] elements in the [`Fix`] for which the predicate `f` returns true

MichaReiser · 2023-07-28T05:54:35Z

crates/ruff_diagnostics/src/fix.rs

+        let mut edits = self.to_vec();
+        edits.sort_by_key(Edit::start);
+        edits.into_iter()


Could you use sort_unstable_by_key? I mean, it doesn't really make a difference because sort_unstable_by_key allocates too, but you would then take benefit of only returning an Iterator

Are you referring to the methods on itertools?

Yes, I think that's where they're defined. I should have marked this as a Nit.

Just confirming, I had the same instinct but this crate doesn't depend on itertools right now and I thought I'd get more heat for adding the dependency 😂

MichaReiser · 2023-07-28T05:54:43Z

crates/ruff_diagnostics/src/fix.rs

+
+    /// Return an iterator over the [`Edit`] elements in the [`Fix`], sorted by their start
+    /// position.
+    pub fn inorder(&self) -> impl IntoIterator<Item = Edit> {


Suggested change

pub fn inorder(&self) -> impl IntoIterator<Item = Edit> {

pub fn inorder(&self) -> impl Iterator<Item = Edit> {

MichaReiser · 2023-07-28T05:59:55Z

crates/ruff_diagnostics/src/fix.rs

+    }
+
+    /// Filter the [`Edit`] elements in the [`Fix`] by the given predicate.
+    pub fn retain<F>(&mut self, f: F)


I find the retain method somewhat confusing. It isn't explicit that it doesn't necessary mutate the elemnt in place and, instead, creates new elements.

We may even be able to get away by filtering the edits instead of calling retain (iterates over all edits once), min (iterates over all edits once), and inorder (iterates over all edits once).

Sort the edits

Getting min is now for free -> It's the start position of the first edit

We can now check adhoc if this specific edit has already been applied, rather than filtering them out previously

I think this is doable... but note that we want the min to be the position of the first non-filtered edit. We may still be able to check it ad hoc though.

charliermarsh · 2023-07-28T18:06:12Z

This is a bit better could likely be better still.

charliermarsh · 2023-07-28T18:07:15Z

crates/ruff/src/autofix/mod.rs

-        if let IsolationLevel::Group(id) = fix.isolation() {
-            if !isolated.insert(id) {
-                continue;
+                first = false;


This is kind of confusing (definitely easier to read locally than as a diff), but in short: we want to apply these constraints at the fix level, but if all of the fix's edits were already applied, we don't need to enforce them at all. So we do so lazily, only checking if the location is acceptable when we reach the first edit-to-apply.

charliermarsh · 2023-07-28T18:23:19Z

Okay, the latest version stores sorted edits rather than sorting on the fly. I think this is a lot more efficient as we sort once (and iterate through the edits exactly once) and no longer need to do any clones anywhere. min_start also becomes O(1).

crates/ruff_diagnostics/src/fix.rs

MichaReiser · 2023-07-29T08:33:51Z

crates/ruff/src/autofix/mod.rs

@@ -50,7 +50,7 @@ fn apply_fixes<'a>(
    let mut fixed = FxHashMap::default();
    let mut source_map = SourceMap::default();

-    for (rule, fix) in diagnostics
+    'outer: for (rule, fix) in diagnostics


Nit: If you want to avoid the labels, but it requires an additional vector:

let mut edits = fix .edits() .iter() .filter(|edit| !applied.contains(edit)) .peekable(); // Lazily enforce any isolation and positional requirements (e.g., avoid applying // overlapping fixes, but avoid applying this requirement if all fixes in the edit were // already applied). if let Some(first) = edits.peek() { // If this fix requires isolation, and we've already applied another fix in the // same isolation group, skip it. if let IsolationLevel::Group(id) = fix.isolation() { if !isolated.insert(id) { continue; } } // Best-effort approach: if this fix overlaps with a fix we've already applied, // skip it. if last_pos.map_or(false, |last_pos| last_pos >= first.start()) { continue; } } for edit in edits { // Add all contents from `last_pos` to `fix.location`. let slice = locator.slice(TextRange::new(last_pos.unwrap_or_default(), edit.start())); output.push_str(slice); // Add the start source marker for the patch. source_map.push_start_marker(edit, output.text_len()); // Add the patch itself. output.push_str(edit.content().unwrap_or_default()); // Add the end source marker for the added patch. source_map.push_end_marker(edit, output.text_len()); // Track that the edit was applied. last_pos = Some(edit.end()); applied_edits.push(edit); } applied.extend(applied_edits.drain(..)); *fixed.entry(rule).or_default() += 1; }

I don't have a preference to be honest. I just thought this is complicated, and then noticed that it is because we read from and write to applied at the same time.

I think what you have here is a big clarity improvement, worth the allocation.

charliermarsh commented Jul 28, 2023

View reviewed changes

charliermarsh added the internal An internal refactor or improvement label Jul 28, 2023

charliermarsh force-pushed the charlie/duplicate-fixes branch 2 times, most recently from 7d8a9bf to 2e8630c Compare July 28, 2023 04:31

charliermarsh requested review from dhruvmanila and MichaReiser July 28, 2023 04:34

dhruvmanila approved these changes Jul 28, 2023

View reviewed changes

MichaReiser requested changes Jul 28, 2023

View reviewed changes

charliermarsh force-pushed the charlie/duplicate-fixes branch from 2e8630c to be7d6cf Compare July 28, 2023 18:05

charliermarsh requested a review from MichaReiser July 28, 2023 18:06

charliermarsh commented Jul 28, 2023

View reviewed changes

charliermarsh force-pushed the charlie/duplicate-fixes branch from 7c0d3e9 to 4b1eaca Compare July 29, 2023 00:28

MichaReiser approved these changes Jul 29, 2023

View reviewed changes

charliermarsh added 4 commits July 29, 2023 07:53

Skip partial duplicates when applying multi-edit fixes

9d21371

Simplify implementation and API

b24a4aa

Store sorted edits

9445248

Remove labels

8a554f4

charliermarsh force-pushed the charlie/duplicate-fixes branch from 4b1eaca to 8a554f4 Compare July 29, 2023 12:02

charliermarsh enabled auto-merge (squash) July 29, 2023 12:02

charliermarsh merged commit 4231ed2 into main Jul 29, 2023
16 checks passed

charliermarsh deleted the charlie/duplicate-fixes branch July 29, 2023 12:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip partial duplicates when applying multi-edit fixes #6144

Skip partial duplicates when applying multi-edit fixes #6144

charliermarsh commented Jul 28, 2023 •

edited

Loading

charliermarsh Jul 28, 2023

charliermarsh Jul 28, 2023

charliermarsh Jul 28, 2023

github-actions bot commented Jul 28, 2023 •

edited

Loading

dhruvmanila left a comment

dhruvmanila Jul 28, 2023

MichaReiser left a comment

MichaReiser Jul 28, 2023

MichaReiser Jul 28, 2023

charliermarsh Jul 28, 2023

MichaReiser Jul 28, 2023

MichaReiser Jul 28, 2023

charliermarsh Jul 28, 2023

MichaReiser Jul 28, 2023

MichaReiser Jul 28, 2023

charliermarsh Jul 28, 2023

MichaReiser Jul 28, 2023

charliermarsh Jul 28, 2023

MichaReiser Jul 28, 2023

MichaReiser Jul 28, 2023

charliermarsh Jul 28, 2023

charliermarsh commented Jul 28, 2023

charliermarsh Jul 28, 2023

charliermarsh commented Jul 28, 2023

MichaReiser Jul 29, 2023

charliermarsh Jul 29, 2023

	/// Filter the [`Edit`] elements in the [`Fix`] by the given predicate.
	/// Retains the [`Edit`] elements in the [`Fix`] for which the predicate `f` returns true

	pub fn inorder(&self) -> impl IntoIterator<Item = Edit> {
	pub fn inorder(&self) -> impl Iterator<Item = Edit> {

Skip partial duplicates when applying multi-edit fixes #6144

Skip partial duplicates when applying multi-edit fixes #6144

Conversation

charliermarsh commented Jul 28, 2023 • edited Loading

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 28, 2023 • edited Loading

PR Check Results

Ecosystem

Benchmark

Linux

Windows

dhruvmanila left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charliermarsh commented Jul 28, 2023

Choose a reason for hiding this comment

charliermarsh commented Jul 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charliermarsh commented Jul 28, 2023 •

edited

Loading

github-actions bot commented Jul 28, 2023 •

edited

Loading