Skip to content

ROB Demo for the Machine Learning Landscape of Top Taggers Benchmark

License

Notifications You must be signed in to change notification settings

scailfin/rob-demo-top-tagger

Repository files navigation

Reproducible Open Benchmarks - Top Tagger Demo

About

The Top Tagger Demo is part of the Reproducible Open Benchmarks for Data Analysis Platform (ROB). This demo contains part of the code used for the comparison analysis published in The Machine Learning Landscape of Top Taggers:

"Based on the established task of identifying boosted, hadronically decaying top quarks, we compare a wide range of modern machine learning approaches. We find that they are extremely powerful and great fun."

The majority of the code in this repository has been adopted from the following repositories: Tree Network in Network (TreeNiN) for Jet Physics and Top Tagging Comparison Analysis.

Getting Started

The demo requires an instance of the ROB Web Service and the ROB Command Line Interface. You can follow the instructions on the Flask Web API - Demo Setup site to setup and run the Web API. The ROB Command Line Interface page contains information to install the client. Below is a summary of the important steps to setup the demo.

First, install the reqiured packages (note that we recommend to install these packages in avirtual environment).

pip install rob-flask
pip install rob-client

Make sure to set the environment variables that configure the database accordingly. The following example will use a SQLLite database that is created in the current working directory:

export FLOWSERV_DATABASE=sqlite:///./db.sqlite

This demo does not use the default workflow controller. It uses the Docker-based controller instead. To configure the Web API accordingly, set the following environment variables:

export FLOWSERV_BACKEND_MODULE=flowserv.controller.serial.docker
export FLOWSERV_BACKEND_CLASS=DockerWorkflowEngine

Note

The demo requires that you have an installed and running Docker daemon.

The other relevant environment variables to set are:

# Store all files in a subfolder .rob in the current workfing directory
export FLOWSERV_API_DIR=./.rob
export FLOWSERV_API_PATH=/rob/api/v1
# Configure flask
export FLASK_APP=robflask.api
export FLASK_ENV=development

If the Web API is otherwise running on your local machine with the default settings there is no need to configure additional environment variables. If the Web API is running on a different machine or port, for example, set the environment variables FLOWSERV_API_HOST, FLOWSERV_API_PORT, and FLOWSERV_API_PATH accordingly (see the documentation for details).

Before starting the Flask web server make sure to initialize the database and to install the Top Tagger Demo:

flowserv init
flowserv install toptagger

flask run

Run the Benchmark

To run the benchmark you will need to open a second terminal (in the same working directory as the terminal that is running Flask). Set environment variables as follws (make sure that you activate the virtual environment first):

export FLOWSERV_API_PATH=/rob/api/v1

Start by registering a new user alice. After you created the user, you can also switch to the Web User Interface. The create a submission for the Top Tagger benchmark.

# Register new user
rob register -u alice -p mypwd
# Login as alice
eval $(rob login -u alice -p mypwd)
# Set predictor benchmark as the default benchmark. Replace *xxxxxxxx*
# with the actual benchmark identifier
rob benchmarks list
export ROB_BENCHMARK=xxxxxxxx
# Create a new submission for the benchmark. Use the unique identifier of
# the created submission as the default submission
eval $(rob submissions create -n 'SimpleNet')

The repository provides several different implementations for the predictor:

  • max-value.py: Uses the maximum value in a given sequence as the prediction result.
  • max-n-value.py: Uses the maximum value in a given sequence and adds a given constant value as the prediction result.
  • last-two-diff.py: Uses the difference between last two values in a given sequence a the prediction result.
  • add-first.py: Uses the sum of the first value and the last value in a given sequence to determine the result.
  • AddDiffOfLastTwoValues.java: Implementation of the predictor that uses Java as the programming language instead of Python. Uses the sum of the last value and the difference between the last value and the next-to-last value in a given sequence as the prediction result.

Create a new benchmark run. In this demo all code files are contained in the repository and can be run using the toptaggerdemo:0.1 Docker container image. Use python code/SimpleNet.py results/processed_test_jets.pkl data/evaluate/ results/ as the command for the ML step (all other template parameters should use the default values).

# Start a new run
rob runs start
# Check run status
rob runs list

Once the run completes successful, you can view the current benchmark results.

rob benchmarks leaders

Screenshots

ROB Home Page

ROB Home Screenshot

Benchmark Overview

Benchmark Overview Screenshot

Current Benchmark Results

Current Benchmark Results Screenshot

Start New Benchmark Run

Start New Benchmark Run Screenshot

Running Benchmark Status

Running Benchmark Status Screenshot

Successful Benchmark Run

Successful Benchmark Run Screenshot

About

ROB Demo for the Machine Learning Landscape of Top Taggers Benchmark

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published