Word2Vec Bahasa Indonesia

Word2Vec untuk bahasa Indonesia dari dataset Wikipedia

Installation

git clone https://github.com/deryrahman/word2vec-bahasa-indonesia.git
cd word2vec-bahasa-indonesia
pip install -r requirements.txt

Train

python train.py

Some useful arguments

usage: train.py [-h] [--model_path MODEL_PATH]
                [--extracted_path EXTRACTED_PATH] [--dump_path DUMP_PATH]
                [--dim DIM] [--stem STEM]

Word2Vec: Generating word2vec model for bahasa Indonesia

optional arguments:
  -h, --help                        show this help message and exit
  --model_path MODEL_PATH           path for saving trained models
  --extracted_path EXTRACTED_PATH   path for extracting text
  --dump_path DUMP_PATH             path for dump data
  --dim DIM                         embedding size
  --stem STEM                       use stemmer or not. (default false)

Use Pre-Trained Model

You can use a trained model on the folder model or download directly from my drive. Extracted on model folder.

You can use example.py to get a quick insight how to use the model. Please look on gensim documentation as a reference.

References

Medium - diekanugraha

License

Open sourced under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.py		example.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word2Vec Bahasa Indonesia

Installation

Train

Use Pre-Trained Model

References

License

About

Releases

Packages

Languages

License

deryrahman/word2vec-bahasa-indonesia

Folders and files

Latest commit

History

Repository files navigation

Word2Vec Bahasa Indonesia

Installation

Train

Use Pre-Trained Model

References

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages