Skip to content

A simple BERT based baseline for DataThon @ IndoML'24.

Notifications You must be signed in to change notification settings

karannb/indoml-bert-baseline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERT based Starter Kit for IndoML'24 DataThon

A simple BERT based baseline for DataThon @ IndoML'24, colab_logo.

Data & Details: After registering here, you can get data from here; download the raw data and store it in a directory (ideally, called data/).

Preprocess: Run

python src/preprocess.py --data_dir <your_data_directory>

Download BERT model and tokenizer: You also need the BERT model and tokenizer in appropriate directories, run,

python src/downloadBERT.py

Train & Test: The rest of the code works on all configurations from single CPU, multi-GPU to multi-machine.

python3 src/trainer.py --output <some_output_column>

The code will automatically pick up multiple GPUs, or you can also launch by prefixing it with CUDA_VISIBLE_DEVICES=x,y,z. Feel free to modify any components of this code as you see fit.

All the best!

About

A simple BERT based baseline for DataThon @ IndoML'24.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published