Skip to content

German Morphological Processing for Word Embeddings & Named Entity Recognition

License

Notifications You must be signed in to change notification settings

FID-Biodiversity/GermanWordEmbeddings-NER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

German Morphological Processing for Word Embeddings & Named Entity Recognition

This short script performs a grammar-dependent morphological processing of the raw text data. Such data can be either be a large text corpus used for computing the word embeddings or a smaller labeled dataset used for training the neural network according to a given downstream-task (e.g. named entity recognition). Using this script prior to any training process improves the quality of the original resources, utimately leading to an increase of the final performance.

The pre-trained word embeddings produced with this morphological processing are provided (under the CC-BY-4.0 license) at the following link.

NOTE: The results of this script (i.e. (1) word embeddings & (2) labled datasets) can be used to train the NER Tagger for reproducing and evaluating the performance boost. Further details can be found in the reference below. Please cite the reference if you happen to use it in your work.

Requirements

Data

Unlabeled text corpora

Labeled datasets for German named entity recognition

Cite

Sajawel Ahmed and Alexander Mehler, "Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora" in Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018. [PDF]

BibTeX

@InProceedings{Ahmed:Mehler:2018,
author		= {Sajawel Ahmed and Alexander Mehler},
title		= {{Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora}},
booktitle	= {Proceedings of the 17th IEEE International Conference on Machine Learning and Applications (ICMLA)},
location	= {Orlando, Florida, USA},
pdf		= {https://arxiv.org/pdf/1807.10675.pdf},
year		= 2018
}

About

German Morphological Processing for Word Embeddings & Named Entity Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages