Lexical Comparison Between News Mediums by Using Word Embeddings for Media Bias Identification
The PostgreSQL database containing vast number of various news articles was provided by the Data & Knowledge Engineering Group of the University of Wuppertal.
Preprocessing_and_training.ipynb contains data exploration, pre-processing and word embeddings training. Lexical_comparison.ipynb contains linear mapping matrix training and lexical comparison.
The folder contains files that were used for manual analysis of distant words, of top 20 most similar to controversial and bias words and distances (pure and adjusted cosine similarities) for the common vocabulary.
Folder contains Word2Vec models trained on HuffPost articles, Breitbart articles as well as vectors from HuffPost mapped to Breitbart.