-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip partial duplicates when applying multi-edit fixes #6144
Conversation
crates/ruff_diagnostics/src/fix.rs
Outdated
#[derive(Debug, Clone)] | ||
pub struct Edits<'a>(Cow<'a, [Edit]>); | ||
|
||
impl<'a> Edits<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm disappointed with what's happening here. The ergonomics of it are fine, but it's so inefficient. inorder
does a full clone, retain
does a full clone, etc.
@@ -85,7 +85,7 @@ pub fn test_snippet(contents: &str, settings: &Settings) -> Vec<Message> { | |||
} | |||
|
|||
thread_local! { | |||
static MAX_ITERATIONS: std::cell::Cell<usize> = std::cell::Cell::new(30); | |||
static MAX_ITERATIONS: std::cell::Cell<usize> = std::cell::Cell::new(8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some payoff.
.fix | ||
.edits() | ||
.iter() | ||
.sorted_unstable_by_key(|edit| edit.start()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel good to DRY this up though -- this is now a method on Edits
.
7d8a9bf
to
2e8630c
Compare
PR Check ResultsEcosystem✅ ecosystem check detected no changes. BenchmarkLinux
Windows
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Edits
interface! :)
crates/ruff/src/autofix/mod.rs
Outdated
// Remove any edits that were applied as part of a previous fix. | ||
let mut edits = fix.edits(); | ||
edits.retain(|edit| !applied.contains(edit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's feasible to create a Vec<&Edit>
which needs to be applied? i.e., filtering out the Edit
which has been applied but not in place. This would avoid cloning the Edit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent observation.
I think we can implement this in a way that significantly reduces the time we iterate over edits and removes the in place mutation by
- Sort the edits of a fix
- Skip over all edits of that fix that have already been applied
- Take the start position of the first edit, this is the
min_start
and test if the fix doesn't overlap with another fix - Apply the fix(es)
I think this should simplify the implementation
crates/ruff_diagnostics/src/fix.rs
Outdated
|
||
/// A collection of [`Edit`] elements to be applied to a source file. | ||
#[derive(Debug, Clone)] | ||
pub struct Edits<'a>(Cow<'a, [Edit]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It should be fine to restrict Edits
to &'a [Edit]
. It doesn't have to be aware that some usages will use a Cow
to avoid clones:
pub struct Edits<'a>(&'a [Edit]);
impl<'a> Edits<'a> {
pub fn from_slice(edits: &'a [Edit]) -> Self {...}
pub fn as_slice(&self) -> &[Edit] { self.0 }
}
# usage
let mut plain_edits = Cow::Borrowed(fix.edits().as_slice());
if needs_mutating {
plain_edits = Cow::Owned(vec![edit]);
}
let edits = Edits::from_slice(&plain_edits);
crates/ruff_diagnostics/src/fix.rs
Outdated
@@ -132,8 +134,8 @@ impl Fix { | |||
} | |||
|
|||
/// Return a slice of the [`Edit`] elements in the [`Fix`]. | |||
pub fn edits(&self) -> &[Edit] { | |||
&self.edits | |||
pub fn edits(&self) -> Edits { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The methods above (like min_start
) should now use self.edits().min_start()
to avoid duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I actually ended to remove that entirely.
crates/ruff_diagnostics/src/fix.rs
Outdated
pub fn inorder(&self) -> impl IntoIterator<Item = Edit> { | ||
let mut edits = self.to_vec(); | ||
edits.sort_by_key(Edit::start); | ||
edits.into_iter() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would call this method to_sorted
to clarify that this operation is potentially expensive.
crates/ruff_diagnostics/src/fix.rs
Outdated
where | ||
F: FnMut(&Edit) -> bool, | ||
{ | ||
self.0.to_mut().retain(f); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This eagerly clones the edits even if f
never returns false.
We can avoid this by iterating manually and only take the slow path that allocates a new Vec
if the predicate returns false
let mut edits = self.0.iter().enumerate();
let last_index = loop {
match edits.next() {
Some((index, edit)) => {
if !f(edit) {
break index;
}
},
None => {
return Cow::Borrowed(self.edits);
}
}
let mut edits = Vec::from_slice(&self.edits[..last_index]);
edits.extend(self.edits[last_index + 1..].iter().filter(f).cloned()); // TODO requires bounds checks.
Cow::Owned(edits)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of what I meant when I commented that "I'm disappointed with what's happening here." 😂
crates/ruff_diagnostics/src/fix.rs
Outdated
edits.into_iter() | ||
} | ||
|
||
/// Filter the [`Edit`] elements in the [`Fix`] by the given predicate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Filter the [`Edit`] elements in the [`Fix`] by the given predicate. | |
/// Retains the [`Edit`] elements in the [`Fix`] for which the predicate `f` returns true |
crates/ruff_diagnostics/src/fix.rs
Outdated
let mut edits = self.to_vec(); | ||
edits.sort_by_key(Edit::start); | ||
edits.into_iter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use sort_unstable_by_key
? I mean, it doesn't really make a difference because sort_unstable_by_key
allocates too, but you would then take benefit of only returning an Iterator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you referring to the methods on itertools
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's where they're defined. I should have marked this as a Nit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming, I had the same instinct but this crate doesn't depend on itertools
right now and I thought I'd get more heat for adding the dependency 😂
crates/ruff_diagnostics/src/fix.rs
Outdated
|
||
/// Return an iterator over the [`Edit`] elements in the [`Fix`], sorted by their start | ||
/// position. | ||
pub fn inorder(&self) -> impl IntoIterator<Item = Edit> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub fn inorder(&self) -> impl IntoIterator<Item = Edit> { | |
pub fn inorder(&self) -> impl Iterator<Item = Edit> { |
crates/ruff_diagnostics/src/fix.rs
Outdated
} | ||
|
||
/// Filter the [`Edit`] elements in the [`Fix`] by the given predicate. | ||
pub fn retain<F>(&mut self, f: F) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the retain
method somewhat confusing. It isn't explicit that it doesn't necessary mutate the elemnt in place and, instead, creates new elements.
We may even be able to get away by filtering the edits
instead of calling retain
(iterates over all edits once), min
(iterates over all edits once), and inorder
(iterates over all edits once).
- Sort the edits
- Getting
min
is now for free -> It's the start position of the first edit - We can now check adhoc if this specific edit has already been applied, rather than filtering them out previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is doable... but note that we want the min
to be the position of the first non-filtered edit. We may still be able to check it ad hoc though.
2e8630c
to
be7d6cf
Compare
This is a bit better could likely be better still. |
crates/ruff/src/autofix/mod.rs
Outdated
if let IsolationLevel::Group(id) = fix.isolation() { | ||
if !isolated.insert(id) { | ||
continue; | ||
first = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing (definitely easier to read locally than as a diff), but in short: we want to apply these constraints at the fix level, but if all of the fix's edits were already applied, we don't need to enforce them at all. So we do so lazily, only checking if the location is acceptable when we reach the first edit-to-apply.
Okay, the latest version stores sorted edits rather than sorting on the fly. I think this is a lot more efficient as we sort once (and iterate through the edits exactly once) and no longer need to do any clones anywhere. |
7c0d3e9
to
4b1eaca
Compare
crates/ruff/src/autofix/mod.rs
Outdated
@@ -50,7 +50,7 @@ fn apply_fixes<'a>( | |||
let mut fixed = FxHashMap::default(); | |||
let mut source_map = SourceMap::default(); | |||
|
|||
for (rule, fix) in diagnostics | |||
'outer: for (rule, fix) in diagnostics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: If you want to avoid the labels, but it requires an additional vector:
let mut edits = fix
.edits()
.iter()
.filter(|edit| !applied.contains(edit))
.peekable();
// Lazily enforce any isolation and positional requirements (e.g., avoid applying
// overlapping fixes, but avoid applying this requirement if all fixes in the edit were
// already applied).
if let Some(first) = edits.peek() {
// If this fix requires isolation, and we've already applied another fix in the
// same isolation group, skip it.
if let IsolationLevel::Group(id) = fix.isolation() {
if !isolated.insert(id) {
continue;
}
}
// Best-effort approach: if this fix overlaps with a fix we've already applied,
// skip it.
if last_pos.map_or(false, |last_pos| last_pos >= first.start()) {
continue;
}
}
for edit in edits {
// Add all contents from `last_pos` to `fix.location`.
let slice = locator.slice(TextRange::new(last_pos.unwrap_or_default(), edit.start()));
output.push_str(slice);
// Add the start source marker for the patch.
source_map.push_start_marker(edit, output.text_len());
// Add the patch itself.
output.push_str(edit.content().unwrap_or_default());
// Add the end source marker for the added patch.
source_map.push_end_marker(edit, output.text_len());
// Track that the edit was applied.
last_pos = Some(edit.end());
applied_edits.push(edit);
}
applied.extend(applied_edits.drain(..));
*fixed.entry(rule).or_default() += 1;
}
I don't have a preference to be honest. I just thought this is complicated, and then noticed that it is because we read from and write to applied
at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you have here is a big clarity improvement, worth the allocation.
4b1eaca
to
8a554f4
Compare
Summary
Right now, if we have two fixes that have an overlapping edit, but not an identical set of edits, they'll conflict, causing us to do another linter traversal. Here, I've enabled the fixer to support partially overlapping edits, which (as an example) let's us greatly reduce the number of iterations required in the test suite.
The most common case here is that in which a bunch of edits need to import some symbol, and then use that symbol, but in different ways. In that case, all edits will have a common fix (to import the symbol), but deviate in some way. With this change, we can do all of those edits in one pass.
Note that the simplest way to enable this was to store sorted edits on
Fix
. We don't allow modifying the edits onFix
once it's constructed, so this is an easy change, and allows us to avoid a bunch of clones and traversals later on.Closes #5800.