forked from EleutherAI/lm-evaluation-harness
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'EleutherAI:main' into main
- Loading branch information
Showing
227 changed files
with
3,868 additions
and
1,028 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
name: Publish Python distribution to PyPI | ||
|
||
on: | ||
push: | ||
tags: | ||
- '*' | ||
|
||
jobs: | ||
build: | ||
name: Build distribution | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: "3.x" | ||
|
||
- name: Install pypa/build | ||
run: >- | ||
python3 -m | ||
pip install | ||
build | ||
--user | ||
- name: Build a binary wheel and a source tarball | ||
run: python3 -m build | ||
- name: Store the distribution packages | ||
uses: actions/upload-artifact@v3 | ||
with: | ||
name: python-package-distributions | ||
path: dist/ | ||
|
||
publish-to-pypi: | ||
name: >- | ||
Publish Python distribution to PyPI | ||
if: startsWith(github.ref, 'refs/tags/') # only publish to PyPI on tag pushes | ||
needs: | ||
- build | ||
runs-on: ubuntu-latest | ||
environment: | ||
name: pypi | ||
url: https://pypi.org/p/lm_eval | ||
permissions: | ||
id-token: write # IMPORTANT: mandatory for trusted publishing | ||
|
||
steps: | ||
- name: Download all the dists | ||
uses: actions/download-artifact@v3 | ||
with: | ||
name: python-package-distributions | ||
path: dist/ | ||
- name: Publish distribution to PyPI | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
|
||
publish-to-testpypi: | ||
name: Publish Python distribution to TestPyPI | ||
needs: | ||
- build | ||
runs-on: ubuntu-latest | ||
|
||
environment: | ||
name: testpypi | ||
url: https://test.pypi.org/p/lm_eval | ||
|
||
permissions: | ||
id-token: write # IMPORTANT: mandatory for trusted publishing | ||
|
||
steps: | ||
- name: Download all the dists | ||
uses: actions/download-artifact@v3 | ||
with: | ||
name: python-package-distributions | ||
path: dist/ | ||
- name: Publish distribution to TestPyPI | ||
uses: pypa/gh-action-pypi-publish@release/v1 | ||
with: | ||
repository-url: https://test.pypi.org/legacy/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,10 @@ | ||
@software{eval-harness, | ||
author = {Gao, Leo and | ||
Tow, Jonathan and | ||
Biderman, Stella and | ||
Black, Sid and | ||
DiPofi, Anthony and | ||
Foster, Charles and | ||
Golding, Laurence and | ||
Hsu, Jeffrey and | ||
McDonell, Kyle and | ||
Muennighoff, Niklas and | ||
Phang, Jason and | ||
Reynolds, Laria and | ||
Tang, Eric and | ||
Thite, Anish and | ||
Wang, Ben and | ||
Wang, Kevin and | ||
Zou, Andy}, | ||
@misc{eval-harness, | ||
author = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy}, | ||
title = {A framework for few-shot language model evaluation}, | ||
month = sep, | ||
year = 2021, | ||
month = 12, | ||
year = 2023, | ||
publisher = {Zenodo}, | ||
version = {v0.0.1}, | ||
doi = {10.5281/zenodo.5371628}, | ||
url = {https://doi.org/10.5281/zenodo.5371628} | ||
version = {v0.4.0}, | ||
doi = {10.5281/zenodo.10256836}, | ||
url = {https://zenodo.org/records/10256836} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Contributing to LM Evaluation Harness | ||
|
||
Welcome and thank you for your interest in the LM Evaluation Harness! We welcome contributions and feedback and appreciate your time spent with our library, and hope you find it useful! | ||
|
||
We intend LM Evaluation Harness to be a broadly useful and | ||
|
||
## Important Resources | ||
|
||
There are several places information about LM Evaluation Harness is located: | ||
|
||
- Our [documentation pages](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/docs) | ||
- We occasionally use [GitHub Milestones](https://github.com/EleutherAI/lm-evaluation-harness/milestones) to track progress toward specific near-term version releases. | ||
- We maintain a [Project Board](https://github.com/orgs/EleutherAI/projects/25) for tracking current work items and PRs, and for future roadmap items or feature requests. | ||
- Further discussion and support conversations are located in the #lm-thunderdome channel of the [EleutherAI discord](discord.gg/eleutherai). | ||
|
||
## Code Style | ||
|
||
LM Evaluation Harness uses [ruff](https://github.com/astral-sh/ruff) for linting via [pre-commit](https://pre-commit.com/). | ||
|
||
You can install linters and dev tools via | ||
|
||
```pip install lm_eval[dev]``` | ||
|
||
Then, run | ||
|
||
```pre-commit install``` | ||
|
||
in order to ensure linters and other checks will be run upon committing. | ||
|
||
## Testing | ||
|
||
We use [pytest](https://docs.pytest.org/en/latest/) for running unit tests. All library unit tests can be run via: | ||
|
||
``` | ||
python -m pytest --ignore=tests/tests_master --ignore=tests/extra | ||
``` | ||
|
||
## Contributor License Agreement | ||
|
||
We ask that new contributors agree to a Contributor License Agreement affirming that EleutherAI has the rights to use your contribution to our library. | ||
First-time pull requests will have a reply added by @CLAassistant containing instructions for how to confirm this, and we require it before merging your PR. | ||
|
||
|
||
## Contribution Best Practices | ||
|
||
We recommend a few best practices to make your contributions or reported errors easier to assist with. | ||
|
||
**For Pull Requests:** | ||
- PRs should be titled descriptively, and be opened with a brief description of the scope and intent of the new contribution. | ||
- New features should have appropriate documentation added alongside them. | ||
- Aim for code maintainability, and minimize code copying. | ||
- If opening a task, try to share test results on the task using a publicly-available model, and if any public results are available on the task, compare to them. | ||
|
||
**For Feature Requests:** | ||
- Provide a short paragraph's worth of description. What is the feature you are requesting? What is its motivation, and an example use case of it? How does this differ from what is currently supported? | ||
|
||
**For Bug Reports**: | ||
- Provide a short description of the bug. | ||
- Provide a *reproducible example*--what is the command you run with our library that results in this error? Have you tried any other steps to resolve it? | ||
- Provide a *full error traceback* of the error that occurs, if applicable. A one-line error message or small screenshot snippet is unhelpful without the surrounding context. | ||
- Note what version of the codebase you are using, and any specifics of your environment and setup that may be relevant. | ||
|
||
**For Requesting New Tasks**: | ||
- Provide a 1-2 sentence description of what the task is and what it evaluates. | ||
- Provide a link to the paper introducing the task. | ||
- Provide a link to where the dataset can be found. | ||
- Provide a link to a paper containing results on an open-source model on the task, for use in comparisons and implementation validation. | ||
- If applicable, link to any codebase that has implemented the task (especially the original publication's codebase, if existent). | ||
|
||
## How Can I Get Involved? | ||
|
||
To quickly get started, we maintain a list of good first issues, which can be found [on our project board](https://github.com/orgs/EleutherAI/projects/25/views/8) or by [filtering GH Issues](https://github.com/EleutherAI/lm-evaluation-harness/issues?q=is%3Aopen+label%3A%22good+first+issue%22+label%3A%22help+wanted%22). These are typically smaller code changes or self-contained features which can be added without extensive familiarity with library internals, and we recommend new contributors consider taking a stab at one of these first if they are feeling uncertain where to begin. | ||
|
||
There are a number of distinct ways to contribute to LM Evaluation Harness, and all are extremely helpful! A sampling of ways to contribute include: | ||
- **Implementing and verifying new evaluation tasks**: Is there a task you'd like to see LM Evaluation Harness support? Consider opening an issue requesting it, or helping add it! Verifying and cross-checking task implementations with their original versions is also a very valuable form of assistance in ensuring standardized evaluation. | ||
- **Improving documentation** - Improvements to the documentation, or noting pain points / gaps in documentation, are helpful in order for us to improve the user experience of the library and clarity + coverage of documentation. | ||
- **Testing and devops** - We are very grateful for any assistance in adding tests for the library that can be run for new PRs, and other devops workflows. | ||
- **Adding new modeling / inference library integrations** - We hope to support a broad range of commonly-used inference libraries popular among the community, and welcome PRs for new integrations, so long as they are documented properly and maintainable. | ||
- **Proposing or Contributing New Features** - We want LM Evaluation Harness to support a broad range of evaluation usecases. If you have a feature that is not currently supported but desired, feel free to open an issue describing the feature and, if applicable, how you intend to implement it. We would be happy to give feedback on the cleanest way to implement new functionalities and are happy to coordinate with interested contributors via GH discussions or via discord. | ||
|
||
We hope that this has been helpful, and appreciate your interest in contributing! Further questions can be directed to [our Discord](discord.gg/eleutherai). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.