CAUTION
If you are the beginner of NLP, I don't recommend that you use this tag. I recommend that you use the latest version of master branch. I don't accept your request or complaint for this tag. (^O^).
We created the seed file of a neologism dictionary of a POS tagger on 2020-08-20.
The seed file in this tag (v0.0.7) will not update forever.
Therefore, this tag is very useful for the following applications.
- Experiments for evaluation of the research results
- Reproducibility of the experimental results of others
- Creation of the processing results of morphological analysis that doesn't update forever
We created the seed file using following resources.
- Dump data of hatena keyword
- Japanese postal code number data download (ken_all.lzh)
- The name-of-the-station list of whole country of Japan
- The entry data of person names (last name / first name)
- The entry data of emojis from Unicode 10.0 and Emoji 5.0
- The entry data of Kaomoji strings
- The entry data of adverbs
- The entry data of adjectives
- The entry data of adjective verbs
- The entry data of interjections
- The entry data of orthographic variant of general nouns
- A lot of documents, which crawled from Web