Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happened to unknown words? #21

Open
zhangriqi opened this issue May 31, 2019 · 2 comments
Open

What happened to unknown words? #21

zhangriqi opened this issue May 31, 2019 · 2 comments

Comments

@zhangriqi
Copy link

Hello, I noticed there's in the dictionary and we only keep the most frequent words in the dictionary. But I don't really understand what happened to the new words (they are all 'unk' in the dictionary, is that right? )that's only in the test data but not in the training data set? Please tell me what I'm missing. Appreciate it.

@ruidan
Copy link
Owner

ruidan commented Jun 3, 2019

words not in the vocab will be mapped to a special token "<unk>".

@Gwynny
Copy link

Gwynny commented Dec 12, 2020

How is it mapped to embeddings? Is any place for them in embedding matrix? Also for padding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants