Skip to content

Latest commit

 

History

History
104 lines (60 loc) · 4.41 KB

README.md

File metadata and controls

104 lines (60 loc) · 4.41 KB

replication_project_598

The goal of this repo is to complete a professional, publication-worthy replication of a notable scientific paper. This is part of UW's Data Science Master's Program, DATA 598 - Reproducibility for Data Science

Title: Political Fake News Characterization Replication Using Twitter

Contributors:

Maggie Weatherly

Juan Solorio

Anmol Srivastava

Andy De La Fuente

Matt Rhodes

Contributing:

We welcome contributions from everyone. Before you get started, please see our contributor guidelines. Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

Code of Conduct

Contributing

Content:

The goal of this repository is to reproduce the results found in the paper "Characterizing Political Fake News in Twitter by its Meta-Data". The main claim of the paper is that their are significant difference among multiple aspects of the tweets that contained Fake News vs Authentic News. Some of the main differences found were the distribution of followers, the number of URLs on tweets, and the verification of the users. This claim is important to replicate because fake news is everywhere and it is not going away anytime soon. We need to understand if the tweets made are becoming more advanced (harder to dicern) or if there is still a good enough amount of dissimilarity to identify a post that is fake news.

Citation: J. Amador, A. Oehmichen, M. Molina-Solana (2017). “Characterizing Political Fake News in Twitter by its Meta-Data”. arXiv:1712.05999v1.

Data:

The data was obtained from the original GitHub repository for the project :https://github.com/alumag/ADAFake/tree/master/source . The data contains the features: is_fake_news fake_news_category tweet_id created_at retweet_count text user_screen_name user_verified user_friends_count user_followers_count user_favourites_count tweet_source geo_coordinates num_hashtags num_mentions num_urls num_media

According to the original research team, "data was collected using search terms related to the presidential election held in the United States on November 8th 2016. Particularly, we queried Twitter’s streaming API, more precisely the filter endpoint of the streaming API, using the following hashtags and user handles: #MyVote2016, #ElectionDay, #electionnight, @realDonaldTrump and @HillaryClinton. The data collection ran for just one day (Nov 8th 2016)". An expert then cureated the tweets to classify them as Fake or not Fake in accordance to categorization provided by Rubin VL, Chen Y, Conroy NJ. Deception detection for news: three types of fakes. In Proceedings of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community. 2015, 83:1–83:4.

Dependencies:

  • Session info ----------------------------------------------------------------------------

setting value

version R version 3.6.2 (2019-12-12)

os Windows 10 x64

system x86_64, mingw32

ui RStudio

language (EN)

collate English_United States.1252

ctype English_United States.1252

tz America/Los_Angeles

date 2020-01-26

  • Packages --------------------------------------------------------------------------------

tensorflow 2.0.0 2019-10-02 [1] CRAN (R 3.6.2)

mlr * 2.17.0 2020-01-10 [1] CRAN (R 3.6.2)

NLP * 0.2-0 2018-10-18 [1] CRAN (R 3.6.0)

plyr * 1.8.4 2016-06-08 [1] CRAN (R 3.6.1)

keras * 2.2.5.0 2019-10-08 [1] CRAN (R 3.6.2)

magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.2)

Group Information:

Campus Wire Group URL: https://campuswire.com/c/G54B6BAAE/rooms/CD4A0F78D

Github Project Repo URL: https://github.com/mweath/replication_project_598.git