-
Notifications
You must be signed in to change notification settings - Fork 43
decode ctc
Solène Tarride edited this page Feb 15, 2023
·
2 revisions
Once your PyLaia model is trained, you can use it to predict on test images. To improve results, you can also combine it with a statistical n-gram language model
- Set your configuration
config_decode.yaml
:
common:
experiment_dirname: experiment
decode:
convert_spaces: true
join_string: ''
img_list: test_img_list.txt
syms: syms.txt
- Predict using PyLaia
pylaia-htr-decode-ctc --config my_config.yaml | tee predict.txt
- Use
ngram-count
to generate the ARPA language model
ngram-count -text my_text_file.txt -order 6 -lm language_model.arpa.gz -wbdiscount 6
with lm_text.txt
in the following format for a character-based LM:
f o r <space> d e t <space> t i l f æ l d e <space> d e t <space> s k u l d e <space> l y k k e s <space> D i g
a t <space> o p d r i v e <space> d e t <space> o m s k r e v n e <space> e x p l : <space> a f
« F r u <space> I n g e r » , <space> a t <space> s e n d e <space> m i g <space> s a m m e
t i l <space> B e r c h t e s g a d e n <space> i n <space> B a y e r n ,
d a <space> d e t <space> s å l e d e s <space> s i k k r e s t <space> o g <space> h u r t i g s t <space> k o m -
m e r <space> m i g <space> i h æ n d e . <space> T ø r <space> j e g <space> b e d e <space> D i g <space> g ø r e
M o r g e n b l a d e t s <space> e x p e d : <space> o p m æ r k s o m <space> p å
m i n <space> n y e <space> a d r e s s e ?
Note that you also should be able to use a KenLM language model, although this is not tested.
- Set your configuration
config_decode_lm.yaml
common:
experiment_dirname: experiment
model_filename: experiment
decode:
convert_spaces: true
join_string: ''
use_language_model: True
language_model_path: language_model.arpa.gz
language_model_weight: 1.5
tokens_path: tokens.txt
lexicon_path: lexicon.txt
img_list: test_img_list.txt
syms: syms.txt
- Predict using PyLaia (CPU-only)
pylaia-htr-decode-ctc --config my_config.yaml | tee predict_lm.txt