Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

跑aishell例子加stage7的LM报错问题 #2254

Closed
programYoung opened this issue Dec 21, 2023 · 8 comments
Closed

跑aishell例子加stage7的LM报错问题 #2254

programYoung opened this issue Dec 21, 2023 · 8 comments

Comments

@programYoung
Copy link

file data/local/lm/heldout: 10000 sentences, 89496 words, 0 OOVs
0 zeroprobs, logprob= -272791.2 ppl= 551.7352 ppl1= 1117.077
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.557482 56970 mutable-fst.h:119] MutableFst::Read: Unknown FST type "vector" (arc type = "standard"): standard input
arpa2fst --read-symbol-table=data/lang_test/words.txt --keep-symbols=true -
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.572870 56980 kaldi-io.cc:772] Error opening input stream data/lang_test/words.txt
E1221 07:52:41.572912 56980 kaldi-io.cc:842] Input::Stream(), not open.
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.573434 56984 symbol-table.h:230] SymbolTable::ReadText: Can't open file data/lang_test/words.txt
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.573830 56985 fst.cc:64] FstHeader::Read: Bad FST header: standard input
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.574206 56986 fst.cc:64] FstHeader::Read: Bad FST header: standard input
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.666182 56981 fst.cc:64] FstHeader::Read: Bad FST header: standard input
Checking how stochastic G is (the first of these numbers should be small):
fstisstochastic data/lang_test/G.fst
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.671222 56987 fst.cc:64] FstHeader::Read: Bad FST header: data/lang_test/G.fst
E1221 07:52:41.671249 56987 kaldi-fst-io.cc:62] Reading FST: error reading FST header from data/lang_test/G.fst
E1221 07:52:41.671262 56987 kaldi-fst-io.cc:74] FST with arc type is not supported.
E1221 07:52:41.671274 56987 fst.h:246] Fst::Read: Unknown FST type (arc type = standard):
E1221 07:52:41.671283 56987 kaldi-fst-io.cc:87] Could not read fst from data/lang_test/G.fst
tools/fst/make_tlg.sh: line 30: 56987 Segmentation fault (core dumped) fstisstochastic $tgt_lang/G.fst
fstdeterminizestar --use-log=true
fstminimizeencoded
fsttablecompose data/lang_test/L.fst data/lang_test/G.fst
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.773803 56988 kaldi-io.cc:772] Error opening input stream data/lang_test/L.fst
E1221 07:52:41.773844 56988 kaldi-io.cc:842] Input::Stream(), not open.
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.869423 56989 fst.cc:64] FstHeader::Read: Bad FST header: -
E1221 07:52:41.869463 56989 kaldi-fst-io.cc:38] Reading FST: error reading FST header from standard input
E1221 07:52:41.869478 56989 fst.h:827] FstImpl::ReadHeader: FST not of type vector:
E1221 07:52:41.869488 56989 kaldi-fst-io.cc:43] Could not read fst from standard input
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:41.964859 56990 fst.cc:64] FstHeader::Read: Bad FST header: -
E1221 07:52:41.964905 56990 kaldi-fst-io.cc:38] Reading FST: error reading FST header from standard input
E1221 07:52:41.964923 56990 fst.h:827] FstImpl::ReadHeader: FST not of type vector:
E1221 07:52:41.964939 56990 kaldi-fst-io.cc:43] Could not read fst from standard input
WARNING: Logging before InitGoogleLogging() is written to STDERR
E1221 07:52:42.051904 56991 fst.cc:64] FstHeader::Read: Bad FST header: standard input

按照需要cmake在runtime/libtorch里编译过了,报错出现问题排查发现是在tools/fst/compile_lexiocon_token_fst.sh中的
tools/fst/ctc_token_fst_compact.py $dir/tokens.txt |
fstcompile --isymbols=$dir/tokens.txt --osymbols=$dir/tokens.txt --keep_isymbols=false --keep_osymbols=false |
fstarcsort --sort_type=olabel > $dir/T.fst || exit 1;
没能正常运行成功。

@xingchensong
Copy link
Member

#2001 (comment) 参考下这个看能不能解决

@programYoung
Copy link
Author

#2001 (comment) 参考下这个看能不能解决

我的理解是wenet自带的1.7.2的openfst不能直接用于跑stage7的构图是吗,另外MutableFst::Read: Unknown FST type "vector" (arc type = "standard"): standard input的报错需要回退到1.6.5的稳定版本才能解决?

@xingchensong
Copy link
Member

先用原版1.7.2,泡一下看看还有没有mutable的错误

@programYoung
Copy link
Author

感谢,确实有用,换成原版的就没有错误了

@maiphong0411
Copy link

how to check openfst version ?

@programYoung
Copy link
Author

how to check openfst version ?

you can find it in /wenet/runtime/core/cmake/openfst.cmake

@maiphong0411
Copy link

I am using at local machine which can't connect external website, how can I use open-fst as http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.7.2.tar.gz ?

@maiphong0411
Copy link

I am using 1.6.5 and it works for me, thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants