Data workflows for the numer.ai machine learning competition
Currently implemented:
- fetch and extract the datasets
- train and predict
- automatic upload
Fetches the dataset zipfile and extracts the contents to output-path
.
output-path
: where the datasets should be saved eventually (defaults to./data/
)dataset-path
: URI of the remote dataset
Trains a Bernoulli Naïve Bayes classifier and predicts the targets. Output file
is saved at output-path
with a custom, timestamped file name.
output-path
: where the datasets should be saved eventually (defaults to./data/
)dataset-path
: URI of the remote dataset
Uploads the predictions of not already uploaded.
output-path
: where the datasets should be saved eventually (defaults to./data/
)dataset-path
: URI of the remote datasetusermail
: user emailuserpass
: user passwordfilepath
: path to the file ought to be uploaded
Prepare the project:
pip install -r requirements.txt --ignore-installed
If not alread done create an API key here with at least the following permissions:
- Upload submissions.
- View historical submission info.
- View user info, (e.g. balance, withdrawal history)
To run the complete pipeline:
env PYTHONPATH='.' luigi --local-scheduler --module workflow Workflow --secret="YOURSECRET" --public-id="YOURPUBLICID"