-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cancel Timeout Tasks and Fail tests if too many Timeouts #2204
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left some minor comments.
crates/task-impls/src/consensus.rs
Outdated
#[cfg(async_executor_impl = "async-std")] | ||
timeout_task.cancel().await; | ||
#[cfg(async_executor_impl = "tokio")] | ||
timeout_task.abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Can we add a helper function to cancel the timeout task depending on their async executor, and use it to simplify places like this?
let num_decided = self.success_nodes.len(); | ||
let num_failed = self.failed_nodes.len(); | ||
let remaining_nodes = total_num_nodes - (num_decided + num_failed); | ||
remaining_nodes + num_decided >= threshold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this function could be simplified as total_num_nodes - num_failed >= threshold
, i.e., we don't need to calculate num_decided
and remaining_nodes
.
@@ -108,7 +108,10 @@ async fn test_with_failures_half_f() { | |||
metadata.spinning_properties = SpinningTaskDescription { | |||
node_changes: vec![(5, dead_nodes)], | |||
}; | |||
|
|||
// TODO: this should only have 3 failures for each down leader, investigate why it fails additional views |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is strange. It made sense before we set num_successful_views
to 22, but now those 3 nodes should each fail only once.
Closes #2165
This PR:
ViewTimedout
event to the application when the consensus timeout task expiresThis PR does not:
num_failed_views
was effectively infinite).Key places to review: