Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect number of training data read while training syntactic HMM #7

Open
GoogleCodeExporter opened this issue Mar 15, 2015 · 0 comments

Comments

@GoogleCodeExporter
Copy link

Hello,
I was trying to train syntactic HMM on my data. My training data contains 10050 
parallel sentences with parsed target trees. 

wc output of my training data
-------------------------------
   10050   284765  1599230 corpus.en
   10050   804959  4284275 corpus.entrees
   10050   228873  5058993 corpus.ta
   30150  1318597 10942498 total


When I run the alignment, the logfile indicate that there are only 9811 
sentences read instead of 10050.  Here is what I am seeing in the logfile. 
Eventually after the training, I am seeing alignment only for 9811 sentences. 

PS: I don't have any testing data. My test data directories are empty. I have 
attached my config file too.

main() {
  Execution directory: en-ta/alignment_models/berkeley/lc_tok_10000_S
  Preparing Training Data
  Unknown number of training, 0 test
  Training models: 2 stages {
    Training stage 1: MODEL1 and MODEL1 jointly for 5 iterations {
      Initializing forward model [7.9s, cum. 7.9s]
      Initializing reverse model [5.2s, cum. 13s]
      Joint Train: 9811 sentences, jointly {
        Iteration 1/5 {
          Sentence 1/9811
          Sentence 2/9811
          Sentence 3/9811
          Sentence 169/9811
          Sentence 3304/9811
          Sentence 7650/9811
          Log-likelihood 1 = -1337616.882
          Log-likelihood 2 = -1336443.902
          ... 9805 lines omitted ...
        } [20s, cum. 20s]

pls, let me know if I am missing something.

Original issue reported on code.google.com by loganath...@gmail.com on 2 Aug 2013 at 10:10

Attachments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant