Dataset can be downloaded at https://challengedata.ens.fr/participants/challenges/35/
Warning: After preprocessing, we obtain a very large tf_idf sparse matrix. Training must be run on a machine with large RAM.
Here, we use a Google Cloud Platform Virtual Machine : n1-highmem-16 with 16 CPU and 104 GB RAM.