How countdown abort evaluation works

In order to use Abort-Mission properly, you need to understand how instance post-processing related metrics are recorded and how countdown abort works.

When post-processing steps always fail

The first example we will look at the case when we had no successful test instance post-processing executions until the time we reached the burn-in threshold we have set (1 in this case to make it simple). In this case, Abort-Mission will conclude that the test configuration is so wrong, that this test has no chance whatsoever, therefore it will just skip it as the diagram shows below.

Updated with v2.0.0

State changes for always failing instance post-processing

Because of the low burn-in, the evaluator becomes very trigger-happy, if the first test fails, the rest won't even have a chance. At the same time, if the first test passes post-processing successfully, the countdown abort will know that there is hope and no matter how many tests fail during post-processing, the test runs will only get aborted if the mission failure threshold is reached.

As you can see, this can turn the preparation of the first test run into a coin-toss with the power to kill all other test runs if it fails.

When post-processing occasionally fails

In the next example, we have decided that a larger threshold (for example 3) should be used for burn-in and we got lucky at the same time, as the first test managed to pass post-processing and only failed during the test method as the diagram shows below:

Updated with v2.0.0

State changes for occasionally failing instance post-processing

Due to the above mentioned factors, when our second test fails during post-processing, we are both confident that the context is not failing all the time, as it was working before, and the burn-in is still in progress so we decide to go forward and fail spectacularly.

During the third run, the evaluation reaches the same conclusion so the countdown completes and the pre-run evaluation can decide whether we should abort. The longer burn-in ensures that we are going forward regardless of the two previous failures (third time the charm!).

In this particular case, this decision saves the day and the third run produces both a well-deserved success and valuable data about the tested feature.

Take-away

There is no one size fits all configuration for your project. It is really up to you to decide how you would like to handle test failures.

Important factors you can consider as decision-makers:

How many tests do you have that are matching for the health check matcher/evaluator? Lower number of tests demand lower burn-ins.
Are these tests really using the same resources/dependencies? If not, you might want to either use larger burn-in or accept the risk of suppressing tests when they would be in a perfectly fine state.
If tests fail, is it more important to see, that they failed and stop the build, or to see more reliable information about your other tests? You can experiment with the lowest burn-in setting to optimize for speed or set burn-in to higher to see more about the failures.
Keep in mind that in case something is failing in 50% of the cases, the chance of it failing twice in a row becomes only 25%. You might want to not go crazy high with the burn-in threshold to avoid turning off countdown abort evaluations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How countdown abort evaluation works

When post-processing steps always fail

When post-processing occasionally fails

Take-away

Clone this wiki locally