Skip to content

Commit

Permalink
PyLaia 1.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
carmocca authored Dec 2, 2020
2 parents e8f6ebf + dd792a4 commit d796239
Show file tree
Hide file tree
Showing 537 changed files with 9,275 additions and 277,706 deletions.
66 changes: 66 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
name: Laia CI

on:
push:
branches: [master]
paths-ignore: ['README.md']
pull_request:
schedule:
# at 07:00 on Sunday
- cron: '0 7 * * sun'

jobs:
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Run pre-commit
run: |
pip install pre-commit
pre-commit run --all-files
pytest:
name: Python ${{ matrix.python }} - PyTorch ${{ matrix.torch }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# TODO: include 3.9 when NumPy adds support
python: [3.8, 3.7, 3.6]
# TODO: include 1.7.* when 1.7.1 is released
torch: [1.6.*, 1.5.*, 1.4.*]
# fix torchvision and nnutils versions for each torch version
include:
#- torch: 1.7.*
# # TODO: update when released
# nnutils: 1.6.*
# torchvision: 0.8.*
- torch: 1.6.*
nnutils: 1.6.*
torchvision: 0.7.*
- torch: 1.5.*
nnutils: 1.5.*
torchvision: 0.6.*
- torch: 1.4.*
nnutils: 1.4.*
torchvision: 0.5.*
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python }}
- name: Configure requirements
run: |
sed -i s/^torch\<1.7.0$/torch==${{ matrix.torch }}/ requirements.txt
sed -i s/^torchvision\<0.8.0$/torchvision==${{ matrix.torchvision }}/ requirements.txt
sed -i s/^nnutils-pytorch$/nnutils-pytorch==${{ matrix.nnutils }}/ requirements.txt
- name: Install requirements
run: pip install -e .[test]
- name: Run pytest
run: pytest --cov=laia tests
- uses: codecov/codecov-action@v1
# upload coverage only for main job
if: ${{ matrix.python == '3.8' && matrix.torch == '1.6.*'}}
42 changes: 10 additions & 32 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,39 +46,15 @@ coverage.xml
*.cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/*build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# dotenv
.env

Expand All @@ -87,13 +63,6 @@ celerybeat-schedule
venv/
ENV/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

Expand All @@ -111,4 +80,13 @@ tmp/
kk/

# Project specific files
laia/version.py
laia/version.py

# data
datasets/
test-resources/

# benchmarks
benchmarks/basic
benchmarks/distributed
benchmarks/half
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

10 changes: 7 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
exclude: egs/*
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v2.3.0
rev: v3.3.0
hooks:
- id: check-yaml
- id: trailing-whitespace
- id: end-of-file-fixer
- repo: https://github.com/timothycrosley/isort
rev: 5.6.4
hooks:
- id: isort
args: [--profile, black]
- repo: https://github.com/psf/black
rev: 19.3b0
rev: 20.8b1
hooks:
- id: black
33 changes: 0 additions & 33 deletions .travis.yml

This file was deleted.

37 changes: 37 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Contributing

Contributions are welcome! Either by reporting bugs, requesting features, or even creating a pull request yourself.

Use this recipe to get ready to work on PyLaia:

```console
# clone PyLaia
git clone https://github.com/jpuigcerver/PyLaia
cd PyLaia

# use a clean Python environment.
# you can skip this if you prefer conda
virtualenv laia-env
source laia-env/bin/activate

# install all dependencies in editable mode,
# including those for development and testing
pip install --editable ".[dev,test]"

# set-up pre-commit hooks
pre-commit install
```

You can run the test suite (including a coverage report) with:

```console
pytest --cov=laia tests
```

Do not worry about code formatting, `pre-commit` will do the work for you when you try to commit. You can also run it manually with:

```console
pre-commit run --all-files
```

Commits and pull requests are tested using GitHub actions CI, so you don't have to worry about testing each Python and PyTorch version combination.
6 changes: 6 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
exclude .pre-commit-config.yaml
exclude CONTRIBUTING.md

prune .github
prune benchmarks
prune tests
69 changes: 40 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
<div align="center">

# PyLaia

[![Build Status](https://travis-ci.com/jpuigcerver/PyLaia.svg?token=HF64eTvPxEUcjjUPXpgm&branch=master)](https://travis-ci.com/jpuigcerver/PyLaia)
[![Python Version](https://img.shields.io/badge/python-3.5%2C%203.6%2C%203.7-blue.svg)](https://www.python.org/)
[![Code Style](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/ambv/black)
**PyLaia is a device agnostic, PyTorch based, deep learning toolkit for handwritten document analysis.**

**It is also a successor to [Laia](https://github.com/jpuigcerver/Laia).**

[![Build](https://img.shields.io/github/workflow/status/jpuigcerver/PyLaia/Laia%20CI?&label=Build&logo=GitHub&labelColor=1b1f23)](https://github.com/jpuigcerver/PyLaia/actions?query=workflow%3A%22Laia+CI%22)
[![Coverage](https://img.shields.io/codecov/c/github/jpuigcerver/PyLaia?&label=Coverage&logo=Codecov&logoColor=ffffff&labelColor=f01f7a)](https://codecov.io/gh/jpuigcerver/PyLaia)
[![Code quality](https://img.shields.io/codefactor/grade/github/jpuigcerver/PyLaia?&label=CodeFactor&logo=CodeFactor&labelColor=2782f7)](https://www.codefactor.io/repository/github/jpuigcerver/PyLaia)

PyLaia is a device agnostic, PyTorch based, deep learning toolkit specialized
for handwritten document analysis. It is also a successor to
[Laia](https://github.com/jpuigcerver/Laia).
[![Python: 3.6+](https://img.shields.io/badge/Python-3.6%2B-FFD43B.svg?&logo=Python&logoColor=white&labelColor=306998)](https://www.python.org/)
[![PyTorch: 1.4.0+](https://img.shields.io/badge/PyTorch-1.4.0%2B-8628d5.svg?&logo=PyTorch&logoColor=white&labelColor=%23ee4c2c)](https://pytorch.org/)
[![pre-commit: enabled](https://img.shields.io/badge/pre--commit-enabled-76877c?&logo=pre-commit&labelColor=1f2d23)](https://github.com/pre-commit/pre-commit)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?)](https://github.com/ambv/black)

> **Disclaimer**: The easiest way to learn to use PyLaia is to follow the
> [IAM example for HTR](egs/iam-htr). Apologies for not having a better
> documentation at this moment, I will keep improving it and adding other
> examples.
</div>

Get started by having a look at our [Wiki](https://github.com/jpuigcerver/PyLaia/wiki)!
###### Several (mostly undocumented) examples of its use are provided at [PyLaia-examples](https://github.com/carmocca/PyLaia-examples).

## Installation

Expand All @@ -20,26 +27,30 @@ In order to install PyLaia, follow this recipe:
```bash
git clone https://github.com/jpuigcerver/PyLaia
cd PyLaia
pip install -r requirements.txt
python setup.py install
pip install -e .
```

The following Python scripts will be installed in your system:

- **pylaia-htr-create-model**: Create a VGG-like model with BLSTMs on top for
handwriting text recognition. The script has different options to costumize
the model. The architecture is based on the paper ["Are Multidimensional
Recurrent Layers Really Necessary for Handwritten Text Recognition?"](https://ieeexplore.ieee.org/document/8269951)
(2017) by J. Puigcerver.
- **pylaia-htr-decode-ctc**: Decode text line images using a trained model and
the CTC algorithm.
- **pylaia-htr-train-ctc**: Train a model using the CTC algorithm and a set of
text-line images and their transcripts.
- **pylaia-htr-netout**: Dump the output of the model for a set of text-line images
in order to decode using an external language model.

Some examples need additional tools and packages, which are not installed
with `pip install -r requirements.txt`.
For instance, typically ImageMagick is used to process images, or Kaldi
is employed to perform Viterbi decoding (and lattice generation) combining
the output of the neural network with a n-gram language model.
- [`pylaia-htr-create-model`](laia/scripts/htr/create_model.py): Create a VGG-like model with BLSTMs on top for handwriting text recognition. The script has different options to costumize the model. The architecture is based on the paper ["Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?"](https://ieeexplore.ieee.org/document/8269951) (2017) by J. Puigcerver.
- [`pylaia-htr-train-ctc`](laia/scripts/htr/train_ctc.py): Train a model using the CTC algorithm and a set of text-line images and their transcripts.
- [`pylaia-htr-decode-ctc`](laia/scripts/htr/decode_ctc.py): Decode text line images using a trained model and the CTC algorithm. It can also output the char/word segmentation boundaries of the symbols recognized.
- [`pylaia-htr-netout`](laia/scripts/htr/netout.py): Dump the output of the model for a set of text-line images in order to decode using an external language model.

## Acknowledgments

Work in this toolkit was financially supported by the [Pattern Recognition and Human Language Technology (PRHLT) Research Center](https://www.prhlt.upv.es/wp/)

## BibTeX

```
@misc{puigcerver2018pylaia,
author = {Joan Puigcerver and Carlos Mocholí},
title = {PyLaia},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jpuigcerver/PyLaia}},
commit = {commit SHA}
}
```
19 changes: 19 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Benchmarks

This is a minimal collection of training benchmarks used to evaluate PyLaia's performance.

You can think of these as an extended suite of training tests which require a GPU, thus cannot be run in CI.

### Data

All the tests use a synthetic dataset we call "MNIST-lines", where MNIST digits are randomly selected to form text-line images, with spaces randomly added.

For larger experiments using real datasets, please have a look at the PyLaia examples [repository](https://github.com/carmocca/PyLaia-examples).

### Running

The following are available. Note that all of them require that CUDA is available:

- `basic.py`: Run Laia's CRNN model for a fixed number of epochs.
- `distributed.py`: On 2 GPUs.
- `half.py`: Using AMP's 16bit precision.
Loading

0 comments on commit d796239

Please sign in to comment.