Skip to content

Case study: Lowkey Vault

Esta Nagy edited this page Jul 2, 2023 · 6 revisions

Disclaimer: I am the owner of both applications described on this page, therefore I could affect the results by my actions if I wanted to (at least in theory). In order to make the situation more clear, I have made available the demo branch which produced the results where Abort-Mission was active). If you find anything that would corrupt these results, please make sure to raise it openly, so that I can address these issues properly! Thank you!

About the project

Lowkey Vault is an Azure Key Vault test double (one of the few), supporting a long list of actions with secrets, keys, and certificates. It has a Lowkey Vault App module, where unit and integration tests are verifying each class and the functionality of the REST API. Then the tested application is copied into a Docker image by the Lowkey Vault Docker module, then a container is started with the image so that we can test the functionality end-to-end.

The problem at hand

This page will focus on the Docker module, and the more than 750 test cases executed by Cucumber. The scenarios are grouped by functionality, testing creation, import, backup, etc. but there is overlap as all of these tests must create or import an entity (secret/key/certificate) in order to test how it behaves for example when it is changed, deleted, used. Due to this Abort-Mission was set up to handle each functionality and entity type as a dependency. For example KeyCreate or SecretDelete. These are added to the scenarios using Cucumber tags. This way, if any major feature is fully broken, the related tests can be skipped right-away. Of course, the unit and integration tests try to catch fully broken features already, but there are issues which cannot be caught by them.

The methodology

I have changed a single line in the key import flow. This issue will mean, that imports will succeed, but as soon as the test evaluates the status of the imported key, it will find, that the key is disabled, therefore the test will fail. I executed the tests with the standard configuration, then rerun it after I turned-off aborts (changing all evaluators to report-only).

The results

The below images show the comparison side-by-side just to show the obvious difference between the overall color changes, I will talk about the meaningful part below the diagrams.

Armed: Abort decisions allowed

Lowkey Vault results - Armed

Disarmed: Failures are only counted

Lowkey Vault results - Failing

Conclusion

As seen on the pictures above, the first run ended after 2 minutes of execution and aborted 124 times after observing the first 4 failures. With the same code base, the re-run, where Abort-Mission was only collecting information for the reports, completed after 2 minutes and 18 seconds, collecting a total of 80 failures.

The good news is, that Abort-Mission was able to prevent all failing test executions after the initial burn-in (4 failed tests). At the same time, it decided to abort in 48 cases where the execution would have succeeded. We could make this better if we used even more granular dependencies, indicating which test scenario is only using the minimal part of create or import, and which scenario is performing deep assertions as well. Unfortunately this would take a lot of time and effort, and we would probably never see the return on the investment. Also, this would simply tailor our mission outline (the configuration of Abort-Mission) to a lab-grown scenario and not a real mistake/issue.

Despite the high number of false positive abort decisions, we have still seen some benefits. We didn't need to wait an additional 18 seconds (even while running the tests on 10 threads parallel) and the detailed error messages are still visible. Furthermore, looking at the reports, we can easily see, which functionality might be affected thanks to the detailed log and the filtering capabilities of the report.

References