src/main/resources/docgen/templates/neuclir22-ru-dt-splade.template

# Anserini Regressions: NeuCLIR22 &mdash; Russian (Document Translation)

This page presents **document translation** regression experiments for the [TREC 2022 NeuCLIR Track](https://neuclir.github.io/), Russian, with the following configuration:

+ Queries: English
+ Documents: Machine-translated documents from Russian into English (corpus provided by the organizers)
+ Model: [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil)

The exact configurations for these regressions are stored in [this YAML file](${yaml}).
Note that this page is automatically generated from [this template](${template}) as part of Anserini's regression pipeline, so do not modify this page directly; modify the template instead.

We make available a version of the corpus that has already been encoded with [SPLADE CoCondenser SelfDistil](https://huggingface.co/naver/splade-cocondenser-selfdistil), i.e., we performed model inference on every document and stored the output sparse vectors.
Thus, no neural inference is required to reproduce these experiments; see instructions below.

From one of our Waterloo servers (e.g., `orca`), the following command will perform the complete regression, end to end:

```
python src/main/python/run_regression.py --index --verify --search --regression ${test_name}
```

## Corpus Download

Download the corpus and unpack into `collections/`:

```bash
wget https://rgw.cs.uwaterloo.ca/pyserini/data/neuclir22-ru-en-splade.tar -P collections/
tar xvf collections/neuclir22-ru-en-splade.tar -C collections/
```

To confirm, `neuclir22-ru-en-splade.tar` is 5.5 GB and has MD5 checksum `828e44e1ab8af19e5e1224539abe347c`.
With the corpus downloaded, the following command will perform the remaining steps below:

```bash
python src/main/python/run_regression.py --index --verify --search --regression ${test_name} \
  --corpus-path collections/${corpus}
```

## Indexing

Typical indexing command:

```
${index_cmds}
```

For additional details, see explanation of [common indexing options](${root_path}/docs/common-indexing-options.md).

## Retrieval

After indexing has completed, you should be able to perform retrieval as follows:

```
${ranking_cmds}
```

Evaluation can be performed using `trec_eval`:

```
${eval_cmds}
```

## Effectiveness

With the above commands, you should be able to reproduce the following results:

${effectiveness}

## Reproduction Log[*](${root_path}/docs/reproducibility.md)

To add to this reproduction log, modify [this template](${template}) and run `bin/build.sh` to rebuild the documentation.