Skip to content

Word2Vec word embedding analysis on Autism Spectrum Disorder (ASD) articles abstracts available on Pubmed.

Notifications You must be signed in to change notification settings

gbnegrini/word2vec-asd

Repository files navigation

Word2Vec word embedding analysis on Autism Spectrum Disorder (ASD) articles abstracts available on Pubmed.

As my first experience with Natural Language Processing (NLP) I decided to explore word embeddings in abstracts related to my PhD research topic which is Autism Spectrum Disorder (ASD).

Briefly, Biopython Entrez module was used to fetch the abstracts from NCBI's database and after a series of preprocessing steps (unwanting characters removal, stop words removal and word stemming) the Word2Vec neural network was used for learning vector representations of words ("word embeddings").

Files

- AutismArticlesRaw_v2.csv: dataset containing the abstracts, PMID of each article, article title and publication year.
- PreProcessedAbstracts_v2.csv: same as AutismArticlesRaw_v2.csv but with the preprocessed abstracts.
- autismPMIDlist.txt: PMID list used to fetch the abstracts from the NCBI's database.
- Part_1_ASD_papers_word2vec.ipynb: first analysis with 21148 abstracts.
- Part_2_ASD_papers_word2vec.ipynb: final analysis with improved dataset and 37518 abstracts.
- first_model.png: first model t-SNE plot.
- final_model.png: final model t-SNE plot.

References

About

Word2Vec word embedding analysis on Autism Spectrum Disorder (ASD) articles abstracts available on Pubmed.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published