Gender Bias in Wikipedia Biographies

Objective

Wikipedia has been widely used as the training corpus for mnay different natural language processing (NLP) models. Meanwhile, according to [a survey] conducted by Wikimedia foundation, fewer than 18% of Wikipedia biographies are about women. To understand how women are represented in English Wikipedia biographies, this project aims to answer the following questions through text analysis in R:

To what extent do biographies about female subjects differ from those about male subjects?
What terms are more strongly associated with each gender?

Key Findings

By training classification models to predict subject gender based on biography text, we find that our models outperform no-skill, random baseline. In other words, Wikipedia editors tend to use different words and/or expressions when describing female and male subjects.
By looking at the most predictive terms for each gender, we find that Wikipedia editors are more likely to use terms relating to gender (e.g. "female", "woman") and family (e.g. "husband", "mother") in biographies about female, suggesting that male is often considered the norm.

Data

The dataset include 19.4K English Wikipedia biographies from four categories (artist, athlete, scientist, and politician) with the most biographical articles on Wikiepdia. The data was created by

Randomly sampling 5K entry names from each category using Wikipedia biography metadata
Retrieving article text for the sampled entries using wikipedia package

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
1_data_preparation.Rmd		1_data_preparation.Rmd
2_text_classification.Rmd		2_text_classification.Rmd
README.md		README.md
utils.R		utils.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gender Bias in Wikipedia Biographies

Objective

Key Findings

Data

About

Releases

Packages

Languages

yipenglai/Wikipedia-Gender-Bias

Folders and files

Latest commit

History

Repository files navigation

Gender Bias in Wikipedia Biographies

Objective

Key Findings

Data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages