-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TREC CaST #255
TREC CaST #255
Conversation
Great, thanks! I won't have much time to contribute directly to this for the next week or so. Regarding the tasks:
This is my understanding, yes. We can prefix each collection by year.
My vote is to put it on huggingface, and potentially include a backup location too. I can add it under https://huggingface.co/irds. |
OK this is done (uploaded offset files on HF). I see you support python 3.7, is it still ongoing or are you planning to switch to 3.8 ? |
Given that 3.7's reached End of Service, I think it's reasonable to bump the minimum Python version to 3.8. Especially if there's features from the core library that you want to use from 3.8. |
OK great, should I modify the CI myself or will you do it in the main branch Remaining quick question (for now): should I move the generic classes outside of I will test the PR them a bit thoroughly by testing my models on TREC CaST so no emergency |
The generic classes can move outside the cast directory. I'm keen to apply them in other settings. |
You can modify the CI in this branch, that's alright. |
Can you give "write" access temporarily to https://huggingface.co/irds (so I can transfer the ownership) or otherwise copy the files from https://huggingface.co/datasets/bpiwowar/trec_cast_offsets/tree/main |
- Moved generic classes out of trec_cast.py - Fixed some bugs and problems
Done! |
I'm having trouble running some of the tests locally:
Is there a missing |
I modified the code so the LZ4 store is built (I kept the "on-the-fly" code though since for simple cases, where only a prefix is added, this might still be useful) |
Thanks! I'm building the tests and such now :) |
I think it's mostly there! I'm only having some problems with v2. First, when I try to do a lookup, it fails in |
Great! For v2 (if I understood correctly), assessments are at the document level - even though there is an official passage corpus. I added a Can you give me the command for the lookup you are testing with? |
Ran both integration tests and the tests in
Self-explanatory, the
Error's coming from line As for the
|
OK, the integration tests should be fixed now |
The tests in
This is the entire error trace:
|
The tests should run now |
Thanks @bpiwowar and @andreaschari ! |
This pull request is for TREC CaST 2019 to 2022
Generic classes
For the moment, it contains generic handler classes that might be moved in other places (and need to be tested):
PrefixedDocs
that allows to use document collections that are merged using a collection-specific prefix to the ID of each documentDocsSubset
andDupes
: handle duplicates in any document collectionCleanup before merge
I suggest the following steps before merging with master:
downloads.json
(so move them)