Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pt.java #447

Merged
merged 161 commits into from
Aug 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
161 commits
Select commit Hold shift + click to select a range
23b59d1
wip
seanmacavaney Jul 20, 2024
cab0900
Merge remote-tracking branch 'origin/master' into java
seanmacavaney Jul 20, 2024
e474044
integration
seanmacavaney Jul 21, 2024
cebba5e
wip
seanmacavaney Jul 21, 2024
8bddd31
this call no longer needed
cmacdonald Jul 26, 2024
78c4ed2
resolve helper version differently
cmacdonald Jul 26, 2024
56463f2
split terrier out into a subpackage
seanmacavaney Jul 26, 2024
9dd2a37
perhaps a nicer way to maintain a set of classes?
seanmacavaney Jul 26, 2024
1b50854
oops
seanmacavaney Jul 26, 2024
df38982
move general java stuff to its own subpackage
seanmacavaney Jul 26, 2024
c13ac38
move batchretrieve.py -> terrier/retriever.py
seanmacavaney Jul 26, 2024
a9aa472
should fix broken build
seanmacavaney Jul 26, 2024
44a8fab
wip
seanmacavaney Jul 26, 2024
ecef753
fix some tests
seanmacavaney Jul 27, 2024
0a22452
fix more tests
seanmacavaney Jul 27, 2024
7e020d5
wip
seanmacavaney Jul 27, 2024
13b020e
new global place for java configurations, allowing them to be passed …
seanmacavaney Jul 27, 2024
af930ce
fixing a few config errors
seanmacavaney Jul 27, 2024
24ef0ee
wip
seanmacavaney Jul 27, 2024
ff08773
wip
seanmacavaney Jul 27, 2024
f825aa7
wip
seanmacavaney Jul 27, 2024
2baaf09
debugging messages
seanmacavaney Jul 28, 2024
6f26326
python version compat
seanmacavaney Jul 28, 2024
1f16c03
warnings get collapsed?
seanmacavaney Jul 28, 2024
3ee9117
debugging
seanmacavaney Jul 28, 2024
bfc6553
isolate test failures
seanmacavaney Jul 28, 2024
b639cee
isolate test failures
seanmacavaney Jul 28, 2024
71009c5
isolation
seanmacavaney Jul 28, 2024
d0cc671
refactor
seanmacavaney Jul 28, 2024
e8301fe
ray test was disabled :(
seanmacavaney Jul 28, 2024
0f4d290
uggh so was multiprocess :(
seanmacavaney Jul 28, 2024
ed2ece0
wip
seanmacavaney Jul 28, 2024
42205b5
wip
seanmacavaney Jul 28, 2024
78ab681
whoops
seanmacavaney Jul 28, 2024
9385ffa
given it works for me, seems to be platform-dependent? Worth trying
seanmacavaney Jul 28, 2024
d968def
okay, so only breaking on macos-latest. Let's disable that for now so…
seanmacavaney Jul 28, 2024
06af94a
splitting out some general java vs terrier stuff
seanmacavaney Jul 28, 2024
0fd8108
forgot to remove bootstrap
seanmacavaney Jul 28, 2024
ea115a8
wip
seanmacavaney Jul 28, 2024
cade2ec
wip
seanmacavaney Jul 28, 2024
a7f64fd
import
seanmacavaney Jul 28, 2024
342873f
imports :(
seanmacavaney Jul 28, 2024
c54bd40
weird
seanmacavaney Jul 28, 2024
8c636d9
eps
seanmacavaney Jul 28, 2024
9089069
wip
seanmacavaney Jul 28, 2024
613a38e
IndexRef was being loaded before the protocol_map was set
seanmacavaney Jul 28, 2024
4ed1aa3
re-enable test_parallel
seanmacavaney Jul 28, 2024
a214c65
refactoring java __init__.py
seanmacavaney Jul 28, 2024
a0ec531
more cleanup
seanmacavaney Jul 28, 2024
8907465
move ApplicationSetup to terrier scope
seanmacavaney Jul 28, 2024
8255096
cleanup
seanmacavaney Jul 28, 2024
36871c7
explanation
seanmacavaney Jul 29, 2024
5e0cd4a
anserini
seanmacavaney Jul 29, 2024
aae99ef
I think pyserini==0.25.0 is a bad build somehow, based on the error m…
seanmacavaney Jul 29, 2024
60e265e
also disable anserini.java._post_init if not installed
seanmacavaney Jul 29, 2024
0858a88
allow pt.java.required to be used as a function too
seanmacavaney Jul 29, 2024
feddbc0
update pt.io for pt.java
seanmacavaney Jul 29, 2024
85b92cd
update pt.ltr for pt.java
seanmacavaney Jul 29, 2024
bb35f2b
update pt.model for pt.java
seanmacavaney Jul 29, 2024
6b33453
Revert "update pt.model for pt.java"
seanmacavaney Jul 29, 2024
6ec9135
Revert "update pt.ltr for pt.java"
seanmacavaney Jul 29, 2024
ab379dc
move rewrite.py and index.py under terrier/ and update for pt.java
seanmacavaney Jul 29, 2024
8fa0158
fix tests
seanmacavaney Jul 29, 2024
d7d03c4
fix more tests
seanmacavaney Jul 29, 2024
7bdbedc
update text.py for pt.java
seanmacavaney Jul 29, 2024
f38e7aa
fix more tests
seanmacavaney Jul 29, 2024
79eec88
import cleanup in __init__.py
seanmacavaney Jul 29, 2024
459624c
fix broken test
seanmacavaney Jul 29, 2024
b88832f
more cleanup
seanmacavaney Jul 29, 2024
3fd8de1
fix broken test
seanmacavaney Jul 29, 2024
9c0924a
fix broken test
seanmacavaney Jul 29, 2024
b69e310
make java configuration object more pythonic
seanmacavaney Jul 29, 2024
803560a
moved check_version under pt.terrier and deprecated the one in the ma…
seanmacavaney Jul 29, 2024
6daaab7
deprecation in the main namespace
seanmacavaney Jul 29, 2024
8191a8e
helper_version stuff is sorted in a previous commit
seanmacavaney Jul 29, 2024
d854228
more main namespace deprecation
seanmacavaney Jul 29, 2024
4600980
doesn't look like we need global properties
seanmacavaney Jul 29, 2024
3ddb386
further cleanup of main namespace
seanmacavaney Jul 29, 2024
8ab397e
split out terrier stemmer, tokeniser, and stopwords
seanmacavaney Jul 29, 2024
a9583f9
removed autoclasses from index.py
seanmacavaney Jul 29, 2024
348c48e
added missing pt.java.required in index.py
seanmacavaney Jul 29, 2024
ea19a42
flake8 extension to check for missing pt.java.required annotations
seanmacavaney Jul 30, 2024
0b27047
for ease of reading in gh actions
seanmacavaney Jul 30, 2024
80f8ab6
warning for the other case too: annotated but doesn't use java
seanmacavaney Jul 30, 2024
ed6947e
fixed missing pt.java.required decorator and allow decorators on classes
seanmacavaney Jul 30, 2024
8d13dff
python compatibility
seanmacavaney Jul 30, 2024
5cce327
the order of decorators is important?
seanmacavaney Jul 30, 2024
e70402f
check for out-of-order pt.java.required
seanmacavaney Jul 30, 2024
6ee75d0
prf
seanmacavaney Jul 30, 2024
935458c
refactoring
seanmacavaney Jul 30, 2024
d0c57ff
avoid greedy java init in tests
seanmacavaney Jul 30, 2024
f299c45
more deprecation/backcompat
seanmacavaney Jul 30, 2024
215b780
no_download support and automatic "offline" mode handling in mavenres…
seanmacavaney Jul 30, 2024
4311bdc
alright, one of the last things remaining is this pesky macos test
seanmacavaney Jul 30, 2024
5ccfbf0
so what's the deal? Is it the python version?
seanmacavaney Jul 30, 2024
d940884
ok so must be macos-latest?
seanmacavaney Jul 30, 2024
52f646e
is it an arm issue? also disable other configs to save time
seanmacavaney Jul 30, 2024
ad9a378
this is actually the arm image name for macos-13
seanmacavaney Jul 30, 2024
9df53f1
hmm that didn't work, here's a version on 14 with intel
seanmacavaney Jul 30, 2024
cfd7d27
revert for the time being
seanmacavaney Jul 30, 2024
57c6cc3
more informative java loaded message
seanmacavaney Jul 30, 2024
4054fd1
management of java settings, etc
seanmacavaney Jul 30, 2024
423b78b
disable arm64 again
seanmacavaney Jul 30, 2024
3c9bc21
disallow changing java settings once it's started
seanmacavaney Jul 31, 2024
2cffed6
cleaner pre_init/post_init/etc configuration
seanmacavaney Jul 31, 2024
865784d
missing decorator
seanmacavaney Jul 31, 2024
dc3625a
misc fixes
seanmacavaney Jul 31, 2024
8b27581
anserini messages
seanmacavaney Jul 31, 2024
9869fe3
reorg pyterrier/java files to make more sense
seanmacavaney Jul 31, 2024
0d1acdd
organize java files
seanmacavaney Jul 31, 2024
c3a55b4
remove deprecated calls in tests
seanmacavaney Jul 31, 2024
e1d7d3e
documentation updates
seanmacavaney Jul 31, 2024
b88ef8e
private modules
seanmacavaney Jul 31, 2024
1e03262
note about what triggered init in java started message
seanmacavaney Jul 31, 2024
6fb88f7
more concise java started message
seanmacavaney Jul 31, 2024
e721392
error when no java initializers found
seanmacavaney Jul 31, 2024
2f6a094
avoid the annoying "bootstrap configuration" message
seanmacavaney Jul 31, 2024
783f39a
avoid the annoying "bootstrap configuration" message
seanmacavaney Jul 31, 2024
b36408d
Merge remote-tracking branch 'origin/java' into java
seanmacavaney Jul 31, 2024
dace9ba
updated documentation about init
seanmacavaney Aug 3, 2024
efc3279
renamed java Java Initializers to better clarify their purpose
seanmacavaney Aug 3, 2024
c820681
I think I prefer the kwargs style for JavaClasses
seanmacavaney Aug 3, 2024
7d857e4
normalize how the extra requirement decorators are done
seanmacavaney Aug 3, 2024
8fdca57
whoops
seanmacavaney Aug 3, 2024
864dbe6
fix comment typos
cmacdonald Aug 6, 2024
28aca38
more expressive deprecated warning for pt.init()
seanmacavaney Aug 6, 2024
d7b49f9
add comment to warning
cmacdonald Aug 7, 2024
434bebd
move to pt.terrier.Retrieve etc by default
cmacdonald Aug 7, 2024
71c374e
update README examples
cmacdonald Aug 7, 2024
81ba34f
udpate all references to BatchRetrieve in the documentation
cmacdonald Aug 7, 2024
94fd5ae
fix typo comment
cmacdonald Aug 7, 2024
3a3b74e
always load prf package
seanmacavaney Aug 9, 2024
34ef71c
split out parallel tests into a separate gh action
seanmacavaney Aug 9, 2024
5e8f5b4
move retrieve.py back to retriever.py
seanmacavaney Aug 10, 2024
12fef49
Retrieve -> Retriever
seanmacavaney Aug 10, 2024
ae3d08f
Merge branch 'java' into java_terrier_retrieve
seanmacavaney Aug 10, 2024
8347a3b
fix typo
seanmacavaney Aug 10, 2024
e6da26b
a bunch of BatchRetrieve -> Retriever (mostly in docs)
seanmacavaney Aug 10, 2024
bdcfa88
merge error
seanmacavaney Aug 10, 2024
cb271f6
Merge pull request #449 from terrier-org/java_terrier_retrieve
seanmacavaney Aug 10, 2024
972ce73
Merge commit '4dd0752' into java_backport_453
cmacdonald Aug 13, 2024
bf785ee
fix warning message to correctly split out pkg defn
cmacdonald Aug 13, 2024
4be5ee8
upgrade for java branch
cmacdonald Aug 13, 2024
87ccb05
make it easier to take env variables
cmacdonald Aug 13, 2024
59e2adc
use !r repr for strings
cmacdonald Aug 13, 2024
488bd03
Update push.yml
seanmacavaney Aug 13, 2024
8134536
deprecated modifies the class so need stubs for deprecated transformers
seanmacavaney Aug 14, 2024
4eb0799
use tiebreak results for RM3
cmacdonald Aug 14, 2024
d249116
reduce digits for term weights to avoid rounding errors on different …
cmacdonald Aug 14, 2024
608419a
wip - docvectors support in FeaturesRetrieve
cmacdonald Aug 14, 2024
f903c90
improved test cases
cmacdonald Aug 15, 2024
b5b5b87
configurable DV matching implementation
cmacdonald Aug 15, 2024
7c868e9
ensure 5.10 is there before tests will run
cmacdonald Aug 15, 2024
087c29d
Merge pull request #455 from terrier-org/dv
seanmacavaney Aug 15, 2024
bc6cb51
Revert "Merge pull request #455 from terrier-org/dv"
seanmacavaney Aug 16, 2024
1c46751
Merge pull request #454 from terrier-org/java_backport_453
seanmacavaney Aug 16, 2024
cad18c1
flash before RM3
cmacdonald Aug 16, 2024
d726e4a
mergeing rm3 tests back into the main rewrite test file
seanmacavaney Aug 16, 2024
57fe30d
in a short interim
seanmacavaney Aug 16, 2024
8159417
add deprecation on (F)BR.from_dataset
cmacdonald Aug 16, 2024
81bc4d0
Merge branch 'java' of github.com:terrier-org/pyterrier into java
cmacdonald Aug 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 6 additions & 14 deletions .github/workflows/push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@ name: Continuous Testing
on:
push:
branches: [ master ]
pull_request:
branches: [ master ]
pull_request: {}

jobs:
build:
Expand All @@ -18,11 +17,11 @@ jobs:
java: [11, 13]
os: ['ubuntu-latest', 'macos-13', 'windows-latest']
terrier: ['snapshot'] #'5.3', '5.4-SNAPSHOT',
include:
- os: 'macos-latest'
python-version: '3.9'
java: 11
terrier: 'snapshot'
# include:
# - os: 'macos-latest'
# python-version: '3.9'
# java: 11
# terrier: 'snapshot'

runs-on: ${{ matrix.os }}
steps:
Expand Down Expand Up @@ -79,13 +78,6 @@ jobs:
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
#flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

- name: RM3 unit tests
env:
TERRIER_VERSION: ${{ matrix.terrier }}
run: |
pytest -p no:faulthandler tests/test_rewrite_rm3.py
# Hide underlying Jnius problem by disabling faulthandler: https://github.com/pytest-dev/pytest/issues/7634

- name: Flash unit tests
env:
TERRIER_VERSION: ${{ matrix.terrier }}
Expand Down
26 changes: 26 additions & 0 deletions .github/workflows/style.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: Code Style Checks

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
build:
runs-on: 'ubuntu-latest'
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install
run: |
pip install flake8 ./extras/pyterrier-flake8-ext/

- name: pt.java.required checks
run: |
flake8 ./pyterrier --select=PT --show-source --statistics --count
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,8 @@ jobs:
strategy:
matrix:
python-version: ['3.10']
java: [13]
anserini-version: ['==0.19.0', '==0.22.0', '==0.36.0', '']
java: [21]
os: ['ubuntu-latest']
terrier: ['snapshot'] #'5.3', '5.4-SNAPSHOT',

Expand Down Expand Up @@ -63,5 +64,5 @@ jobs:
env:
TERRIER_VERSION: ${{ matrix.terrier }}
run: |
pip install pyserini==0.22.0 faiss-cpu torch
pip install pyserini${{ matrix.anserini-version }} faiss-cpu torch
pytest --durations=20 -p no:faulthandler tests/anserini/
72 changes: 72 additions & 0 deletions .github/workflows/test-parallel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Continuous Testing of Parallel Components

on:
push:
branches: [ master ]
pull_request:
branches: [ master ]

jobs:
build:

strategy:
matrix:
python-version: ['3.8', '3.11']
java: [11, 13]
os: ['ubuntu-latest']
terrier: ['snapshot']

runs-on: ${{ matrix.os }}
steps:

- name: Setup dependencies for xgBoost on macOs-latest
if: matrix.os == 'macOs-latest'
run: |
brew install libomp

- uses: actions/checkout@v4

- name: Set up Python ${{ matrix.python-version }}
if: matrix.os != 'self-hosted'
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}

- name: Setup java
if: matrix.os != 'self-hosted'
uses: actions/setup-java@v4
with:
java-version: ${{ matrix.java }}
distribution: 'zulu'

- name: Install Terrier snapshot
if: matrix.terrier == '5.4-SNAPSHOT'
run: |
git clone https://github.com/terrier-org/terrier-core.git
cd terrier-core
mvn -B -DskipTests install

# follows https://medium.com/ai2-blog/python-caching-in-github-actions-e9452698e98d
- name: Loading Python & dependencies from cache
if: matrix.os != 'self-hosted'
uses: actions/cache@v4
with:
path: ${{ env.pythonLocation }}
key: ${{ runner.os }}-${{ env.pythonLocation }}-${{ hashFiles('requirements.txt') }}-${{ hashFiles('requirements-test.txt') }}

- name: Install Python dependencies
run: |
python -m pip install --upgrade pip
pip install --upgrade --upgrade-strategy eager -r requirements.txt
pip install --upgrade --upgrade-strategy eager -r requirements-test.txt
#install this software
pip install --timeout=120 .
pip install pytest

- name: All unit tests
env:
TERRIER_VERSION: ${{ matrix.terrier }}
PARALLEL_TESTING: '1'
run: |
pytest --durations=20 -p no:faulthandler tests/test_grid.py tests/test_grid.py tests/test_parallel.py tests/test_pool.py

8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ See the [indexing documentation](https://pyterrier.readthedocs.io/en/latest/terr
```python
topics = pt.io.read_topics(topicsFile)
qrels = pt.io.read_qrels(qrelsFile)
BM25_br = pt.BatchRetrieve(index, wmodel="BM25")
BM25_br = pt.terrier.Retriever(index, wmodel="BM25")
res = BM25_br.transform(topics)
pt.Evaluate(res, qrels, metrics = ['map'])
```
Expand All @@ -56,7 +56,7 @@ There is a worked example in the [experiment notebook](examples/notebooks/experi

PyTerrier makes it easy to develop complex retrieval pipelines using Python operators such as `>>` to chain different retrieval components. Each retrieval approach is a [transformer](https://pyterrier.readthedocs.io/en/latest/transformer.html), having one key method, `transform()`, which takes a single Pandas dataframe as input, and returns another dataframe. Two examples might encapsulate applying the sequential dependence model, or a query expansion process:
```python
sdm_bm25 = pt.rewrite.SDM() >> pt.BatchRetrieve(indexref, wmodel="BM25")
sdm_bm25 = pt.rewrite.SDM() >> pt.terrier.Retriever(indexref, wmodel="BM25")
bo1_qe = BM25_br >> pt.rewrite.Bo1QueryExpansion() >> BM25_br
```

Expand All @@ -83,8 +83,8 @@ You can see examples of how to use these, including notebooks that run on Google
Complex learning to rank pipelines, including for learning-to-rank, can be constructed using PyTerrier's operator language. For example, to combine two features and make them available for learning, we can use the `**` operator.
```python
two_features = BM25_br >> (
pt.BatchRetrieve(indexref, wmodel="DirichletLM") **
pt.BatchRetrieve(indexref, wmodel="PL2")
pt.terrier.Retriever(indexref, wmodel="DirichletLM") **
pt.terrier.Retriever(indexref, wmodel="PL2")
)
```

Expand Down
4 changes: 2 additions & 2 deletions docs/anserini.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ Comparative retrieval from Anserini and Terrier::
trIndex = "/path/to/data.properties"
luceneIndex "/path/to/lucene-index-dir"

BM25_tr = pt.BatchRetrieve(trIndex, wmodel="BM25")
BM25_tr = pt.terrier.Retriever(trIndex, wmodel="BM25")
BM25_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="BM25")

pt.Experiment([BM25_tr, BM25_ai], topics, qrels, eval_metrics=["map"])


AnseriniBatchRetrieve can also be used as a re-ranker::

BM25_tr = pt.BatchRetrieve(trIndex, wmodel="BM25")
BM25_tr = pt.terrier.Retriever(trIndex, wmodel="BM25")
QLD_ai = pt.anserini.AnseriniBatchRetrieve(luceneIndex, wmodel="QLD")

pipe = BM25_tr >> QLD_ai
Expand Down
4 changes: 2 additions & 2 deletions docs/apply.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,9 @@ Its also possible to construct a transformer that makes a new column on a row-wi

For instance, if the column you are creating is called rank_2, it might be created as follows::

pipe = pt.BatchRetrieve(index) >> pt.apply.rank_2(lambda row: row["rank"] * 2)
pipe = pt.terrier.Retriever(index) >> pt.apply.rank_2(lambda row: row["rank"] * 2)

To create a transformer that drops a column, you can instead pass `drop=True` as a kwarg::

pipe = pt.BatchRetrieve(index, metadata=["docno", "text"] >> pt.text.scorer() >> pt.apply.text(drop=True)
pipe = pt.terrier.Retriever(index, metadata=["docno", "text"] >> pt.text.scorer() >> pt.apply.text(drop=True)

1 change: 0 additions & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
# -- Dataset table listing -----------------------------------------------------
import pyterrier as pt
import textwrap
pt.init()

from extras import generate_includes
if not "QUICK" in os.environ:
Expand Down
2 changes: 1 addition & 1 deletion docs/datamodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ A dataframe representing which documents are retrieved and scored for a given qu

Note that rank is computed by sorting by qid ascending, then score descending. The first rank for each query is 0. The `pyterrier.model.add_rank()` function is used for adding the rank column.

Optional columns might support additional transformers, such as text (for the contents of the documents), url or title columns. Their presence can facilitate more advanced transformers, such as BERT-based transformers which operate on the raw text of the documents. For instance, if the Terrier index has additional metadata attributes, these can be included by BatchRetrieve using the `metadata` kwarg, i.e. `pt.BatchRetrieve(index, metadata=["docno", "title", "body"])`.
Optional columns might support additional transformers, such as text (for the contents of the documents), url or title columns. Their presence can facilitate more advanced transformers, such as BERT-based transformers which operate on the raw text of the documents. For instance, if the Terrier index has additional metadata attributes, these can be included by BatchRetrieve using the `metadata` kwarg, i.e. `pt.terrier.Retriever(index, metadata=["docno", "title", "body"])`.

Note that the retrieved documents is a subset of the cartesian product of documents and queries; it is important that the query (text) attribute is present for at least ONE document rather than all documents for a given query.

Expand Down
18 changes: 9 additions & 9 deletions docs/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ each defined dataset can download and provide easy access to:
- relevance assessments (aka, labels or qrels), as a dataframe, ready for evaluation
- ready-made Terrier indices, where appropriate

.. autofunction:: pyterrier.datasets.list_datasets()
.. autofunction:: pyterrier.datasets.list_datasets

.. autofunction:: pyterrier.datasets.find_datasets()
.. autofunction:: pyterrier.datasets.find_datasets

.. autofunction:: pyterrier.datasets.get_dataset()
.. autofunction:: pyterrier.datasets.get_dataset

.. autoclass:: pyterrier.datasets.Dataset
:members:
Expand All @@ -27,8 +27,8 @@ Many of the PyTerrier unit tests are based on the `Vaswani NPL test collection <
PyTerrier provides a ready-made index on the `Terrier Data Repository <http://data.terrier.org/>`_. This allows experiments to be easily conducted::

dataset = pt.get_dataset("vaswani")
bm25 = pt.BatchRetrieve.from_dataset(dataset, "terrier_stemmed", wmodel="BM25")
dph = pt.BatchRetrieve.from_dataset(dataset, "terrier_stemmed", wmodel="DPH")
bm25 = pt.terrier.Retriever.from_dataset(dataset, "terrier_stemmed", wmodel="BM25")
dph = pt.terrier.Retriever.from_dataset(dataset, "terrier_stemmed", wmodel="DPH")
pt.Experiment(
[bm25, dph],
dataset.get_topics(),
Expand All @@ -44,8 +44,8 @@ Indexing and then retrieval of documents from the `MSMARCO document corpus <http
indexref = indexer.index(dataset.get_corpus())
index = pt.IndexFactory.of(indexref)

DPH_br = pt.BatchRetrieve(index, wmodel="DPH") % 100
BM25_br = pt.BatchRetrieve(index, wmodel="BM25") % 100
DPH_br = pt.terrier.Retriever(index, wmodel="DPH") % 100
BM25_br = pt.terrier.Retriever(index, wmodel="BM25") % 100
# this runs an experiment to obtain results on the TREC 2019 Deep Learning track queries and qrels
pt.Experiment(
[DPH_br, BM25_br],
Expand All @@ -62,8 +62,8 @@ You can also index datasets that include a corpus using IterDictIndexer and get_
indexref = indexer.index(dataset.get_corpus_iter(), fields=('title', 'abstract'))
index = pt.IndexFactory.of(indexref)

DPH_br = pt.BatchRetrieve(index, wmodel="DPH") % 100
BM25_br = pt.BatchRetrieve(index, wmodel="BM25") % 100
DPH_br = pt.terrier.Retriever(index, wmodel="DPH") % 100
BM25_br = pt.terrier.Retriever(index, wmodel="BM25") % 100
# this runs an experiment to obtain results on the TREC COVID queries and qrels
pt.Experiment(
[DPH_br, BM25_br],
Expand Down
4 changes: 2 additions & 2 deletions docs/experiments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ Getting average effectiveness over a set of topics::
# vaswani dataset provides an index, topics and qrels

# lets generate two BRs to compare
tfidf = pt.BatchRetrieve(dataset.get_index(), wmodel="TF_IDF")
bm25 = pt.BatchRetrieve(dataset.get_index(), wmodel="BM25")
tfidf = pt.terrier.Retriever(dataset.get_index(), wmodel="TF_IDF")
bm25 = pt.terrier.Retriever(dataset.get_index(), wmodel="BM25")

pt.Experiment(
[tfidf, bm25],
Expand Down
8 changes: 4 additions & 4 deletions docs/experiments/Robust04.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,10 @@ Here we define and evaluate standard weighting models.

```python

BM25 = pt.BatchRetrieve(index, wmodel="BM25")
DPH = pt.BatchRetrieve(index, wmodel="DPH")
PL2 = pt.BatchRetrieve(index, wmodel="PL2")
DLM = pt.BatchRetrieve(index, wmodel="DirichletLM")
BM25 = pt.terrier.Retriever(index, wmodel="BM25")
DPH = pt.terrier.Retriever(index, wmodel="DPH")
PL2 = pt.terrier.Retriever(index, wmodel="PL2")
DLM = pt.terrier.Retriever(index, wmodel="DirichletLM")

pt.Experiment(
[BM25, DPH, PL2, DLM],
Expand Down
4 changes: 2 additions & 2 deletions docs/extras/generate_includes.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ def experiment_includes():
os.path.join(tempfile.gettempdir(), "vaswani_index")
).index(pt.get_dataset('vaswani').get_corpus_iter())

tfidf = pt.BatchRetrieve(indexref, wmodel="TF_IDF")
bm25 = pt.BatchRetrieve(indexref, wmodel="BM25")
tfidf = pt.terrier.Retriever(indexref, wmodel="TF_IDF")
bm25 = pt.terrier.Retriever(indexref, wmodel="BM25")

table = pt.Experiment(
[tfidf, bm25],
Expand Down
Loading
Loading