Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training process #28

Open
ivankrylatskoe opened this issue Jun 8, 2022 · 3 comments
Open

Training process #28

ivankrylatskoe opened this issue Jun 8, 2022 · 3 comments

Comments

@ivankrylatskoe
Copy link

ivankrylatskoe commented Jun 8, 2022

Hello, Sushant!

For the past few days I have been trying to reproduce the results of the repository.
For that I followed the guide described in README.md but the outcome was different.

Steps:

  1. Clone the repo in a new directory
  2. Download IAM database from official site
  3. Copy lines.txt file and lines directory to the data directory (13 353 records).
  4. In the file DataLoader.py change the following line:
    gtText_list = lineSplit[9].split('|')
    to this:
    gtText_list = lineSplit[8].split('|')
    This is required because the 8-th element (not 9-th) contains ground truth labels. For example:
    a01-000u-00 ok 154 19 408 746 1661 89 A|MOVE|to|stop|Mr.|Gaitskell|from
  5. Run the following command from src_tensorflow2 directory:
    python main.py --train

Environment:

Python: 3.7.9
Tensorflow: 2.7.0

Expected behaviour:

CER is expected to descend slowly approximately to the value specified in README.md: 8.32%.

Actual behaviour:

First try:
CER after epoch 1: 28.1%
CER after epoch 2: 21.0%
But from 3rd to at least 12th epoch CER is between 45% and 52%. And it is not going to go down.

Second try.
After 8th epoch:
Train loss: 62.25793147463152
Val loss: 64.84262824781013
Character error rate: 45.535652%

After 21th epoch:
Train loss: 56.68565004330704
Val loss: 66.37841461644028
Character error rate: 44.809107%

Could you describe the correct way to train the model?

Update 2022-06-09
It seems that the problem is reproduced only in src_tensorflow2 directory.
The code in src_tensorflow1 directory (using TF 1.15.5) after third epoch gives CER 19% and loss still going down.

Update 2022-06-10
The code in src_tensorflow1 directory (using TF 1.15.5) doesn't give stable results too.
I tried 3 more times to run the training from scratch. And CER was not decreasing from some epoch.

@mogam1l
Copy link

mogam1l commented Jun 21, 2022

I have the same issue currently. It seems to me like a batch size/ learning rate problem. They should probably be decreased

@monika153
Copy link

I have also used this code but was facing an issue while training. Kindly reply in detail about the changes that need to be made. Do I have to specify the location of the folders also???? If yes, then where????

@Lilyane1
Copy link

I have the same issue currently. It seems to me like a batch size/ learning rate problem. They should probably be decreased

Did this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants