-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyPi package #913
Comments
I think the most work is proper versioning and releases. Right now code and data are automatically synchronized because they are in the same repository, but we cannot guarantee that an old version of the library works with new data (and we should not try to change that) and we currently have no versioning at all. Extracting the code into its own git repo and embedding it here creates a lot of overhead (speaking from experience with these setups in an academic setting) and I don't know how we would have version numbers & releases while keeping the code in here. |
Great points, @akoehn. Versioning would indeed require more thought. We could have a file in Conversely, though, you could say that the lack of versioning currently makes it less attractive for people to build on our API, since it could change at any moment without clear documentation. That's why I'm wondering how many people would even be interested in this, to see if it makes sense to think about this.
Are you thinking of the
I'm not sure what problems you foresee here; version numbers for the Python package could be kept in a subdirectory where the package lives (say |
No, I meant another repo. The thing is that fixing a bug is straight-forward now. With a separate repository, you would need to check out acl-anthology and the anthology code, make changes to the code, publish it locally (or otherwise make sure it is used by acl-anthology) test whether your fix worked, repeat. The easiest way would probably be to generate a pypi package from the current setup where the core anthology code base is together with the library part in one repository and we don't have to think about versioning all the time. |
There's a first usable version of a PyPI library now: https://pypi.org/project/acl-anthology-py/ I'm currently developing this in a separate repo, but I've thought about the versioning issues and think it should probably be moved into this repo, as keeping it in sync with the data format here (XML schema etc.) does seem like a headache otherwise. I don't see a big problem with having version numbers & releases within this repo, though. Over the coming weeks, I'll prepare a feature branch here that merges in this library, so that we can continue the discussion here. |
There was some discussion on whether we should make our
anthology
library into a PyPi package. This would make it easier for people to use our Python interface to the Anthology, e.g., to build external tools or run analyses. It might even encourage people to contribute and add functionality to the library itself.Requirements to achieve this (from the top of my head):
A mechanism to download/update the Anthology XML data from within the Python package.
Many Python packages download external data as part of their functionality (e.g., NLTK, torchtext), and I've personally used GitPython to do exactly this with the ACL Anthology for my recent Anthology analysis paper. I believe this is completely solvable.
A proper documentation. If we want to promote our Python API in this way, we should have at least a succinct, user-friendly documentation that gets people started on how to use it. I believe that might be good thing to have anyway, to help future volunteers for the Anthology who might work on the Python API. I'd also be happy to help prepare it.
Faster loading as discussed in Faster loading of Anthology class #835 could be a major factor for usability. I have more ideas in this direction that I want to look into at some point, but maybe it's more of a "nice-to-have" than an actual blocker?
Most importantly, I think it would be great to gauge the community's interest in this. If you'd be interested in and see value in working with Anthology data through a pip-installable library, give a thumbs up here!
The text was updated successfully, but these errors were encountered: