Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 1.98 KB

README.md

File metadata and controls

25 lines (17 loc) · 1.98 KB

Cockpit Test Neural Network

A neural network that can easily be trained to recognize whether a CI fail is a real fail or happened because of a flaky test.

How to Run

Output

When you run cp_test_nn, you'll see bunch of numbers printed - these are progress of creating feature sets from the learning data set (2017-06-21-2017-07-21-cockpit-test-data.jsonl) and from validation set (2017-06-22-2017-07-22-cockpit-test-data.jsonl). When this is done (this might take couple minutes), the program will output something like:

Success rate: 0.9375314861460957
False positives: 0.0015113350125944584
False negatives: 0.02216624685138539
Unsure: 0.038790931989924435

Your numbers may vary slightly due to random initialization. Generally, success rate on the validation dataset above 0.9 is good - this shows that the approach is sound. The program is still in a very early phase and it should be possible to improve on this. Note that the success rate is artifically decreased by our choice to increase threshold (it's currently 0.75) that we consider safe to classify an example as FLAKE or NOT FLAKE. If the probabilities are below this threshold, we classify it as UNSURE. This can be changed via the -r <threshold> commandline switch (specifically, -r 0.5 will yield no UNSURE and classify all examples as FLAKE or NOT FLAKE).

The -s serialized will make the trained neural network serialize to file called serialized. On future invocations, you can use -l serialized instead of -t ... to load the serialized neural network instead of training it again.