Skip to content

Weasel project templates for research projects using spaCy

License

Notifications You must be signed in to change notification settings

New-Languages-for-NLP/project-templates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🪐 Project Templates

Weasel, previously spaCy projects, lets you manage and share end-to-end workflows for different use cases and domains, and orchestrate training, packaging and serving your custom pipelines. You can start off by cloning a pre-defined project template, adjust it to fit your needs, load in your data, train a pipeline, export it as a Python package, and share your results with other researchers.

This repository contains starter templates published by the New Languages for NLP team aimed at researchers in the humanities who are interested in using spaCy for their projects, especially those who are working in languages not currently supported by spaCy and other NLP tools.

ⓘ These project templates require Weasel, which is included by default with spaCy v3.7+. Just install spaCy via pip with pip install spacy and you're ready to get started. Make sure to use a fresh virtual environment.

tests Code style: black spaCy

🗃 Templates

Template Description
core_inception Train new language core model with Cadet and INCEpTION

🚀 Quickstart

Projects can be used via the weasel CLI, or through the spacy project alias. To find out more about a command, add --help. For detailed instructions, see the Weasel documentation or spaCy projects usage guide.

  1. Clone the project template you want to use.
    spacy project clone core_inception my_new_project --repo https://github.com/New-Languages-for-NLP/project-templates
  2. Install any project requirements.
    cd my_new_project
    python -m pip install -r requirements.txt
  3. Fetch assets (data, weights) defined in the project.yml.
    spacy project assets
  4. Run a command defined in the project.yml.
    spacy project run preprocess
  5. Run a workflow of multiple steps in order.
    spacy project run all
  6. Adjust the template for your specific use case, load in your own data, adjust the settings and model, and publish your results.

👷‍♀️Repository maintanance

To keep the project templates and their documentation up to date, this repo contains several scripts:

Script Description
update_docs.py Update all auto-generated docs in the given root. Calls into spacy project document and only replaces the auto-generated sections, not any custom content before or after.
update_configs.py Update and auto-fill all config.cfg files included in the repo, similar to spacy init fill-config. Can be used to keep the configs up to date with changes in spaCy.
update_projects_jsonl.py Update projects.jsonl file in the given root. Should be used at the root level of the repo.

This README and several of the scripts were adapted from the main explosion projects repository.

About

Weasel project templates for research projects using spaCy

Resources

License

Stars

Watchers

Forks

Languages