Intrinsic and Extrinsic Evaluation of the Quality ofBiomedical Embeddings in Different Languages
Author: Paula M. Franceschini, Henrique D. P. dos Santos and Renata Vieira
Abstract: Lately, language models have been applied to severaltasks in biomedical natural language processing. Some publiclanguage models are available online, each built with differentcorpora. In this paper, we evaluate different public word embed-ding models trained with both general and biomedical corpora forEnglish and Portuguese. We present intrinsic evaluations basedon semantic analogies that use word pairs extracted from theMeSH biomedical thesaurus and also from benchmarks that areavailable for general-domain evaluation. For extrinsic evaluationswe rely on a classification task over Eletronic Health Records.Our experiments show that biomedical embeddings can bettercapture semantics for biomedical analogies in both languages. Onthe other hand for extrinsic evaluation, based on classificationtasks using the language models, larger general textual corporaappeared equally or more effective.
Keywords: Biomedical Embeddings, MeSH thesaurus, Mul-tilanguage Evaluation
This project belongs to GIAS at PUCRS, Brazil