Skip to content

Comparison between closed and active Stack Exchange communities; dynamic of the users' reputation; the influence of the core users on the community sustainability

Notifications You must be signed in to change notification settings

ana-vranic/Stack-Exchange-communities

Repository files navigation

StackExchange-communities

This code is used in paper:

Vranić, A., Tomašević, A., Alorić, A. et al. Sustainability of Stack Exchange Q&A communities: the role of trust. EPJ Data Sci. 12, 4 (2023). https://doi.org/10.1140/epjds/s13688-023-00381-x

Data

Communities are categorized as:

  • closed or "Area 51"
  • active or beta communities

Area 51 community filenames have prefix denoting the date origin of the StackExchange archive file containing the data. For example: 050112astronomy folder is related to Area 51 version of the astronomy community, while astronomy refers to the beta astronomy community.

XML raw data

Beta Stack Exchange communities are available here.

Area 51 Stack Exchange communities can be downloaded from Area51.

data/raw_data/... From raw xml data we select questions, answers, comments, accepted_answers, users and votes for the first 180 days of each community.


Interactions

data/interactions/...

For each community we have several .csv files containing all recorded interactions of a given type. These CSV files are obtained by transforming raw XML data using code provided in src/data_preparation.

  • ...interactions_post_questions.csv Posted questions
  • ...interactions_questions_answers.csv Questions and answers
  • ...interactions_comments.csv All posted comments
  • ...interactions_comments_questions.csv Comments posted directly on a question
  • ...interactions_comments_answers.csv Comments posted on answers
  • ...interactions_acc_answers.csv Accepted answers
  • ...interactions_votes.csv Votes cast on questions, answers and comments

Detailed explanation of the columns of these .csv files are given here.


Reputations

data/reputations/...

Values of dynamic reputation for each user for each of 180 days in given communites are stored as CSV files. eng refers to engagement reputation and pop refers to popularity reputation.

Each row of CSV is unique user in a given community and each column is each day starting from 0 (first day).


Code

  • src/data_preparation holds several scripts needed to transform original XML StackExchange raw data into time-stamped record of interactions of a given type. Data Preparation Pipeline explains the run order and the ouput of the scripts.

  • src/dynamical_reputation.py is the main module for estimating dynamical reputation in StackExchange communities. src/calculate_dynamical_reputation.ipynb shows usage of calculating dynamical reputation.

  • src/calculate_core_periphery.ipynb is a notebook for calculating core-periphery structure (we use Bayesian Core-Periphery Stochastic Block Models, while src/core_periphery_functions.py contains functions for transforming data into appropriate input and saving results into hdf5 format.

  • src/data_processing.ipynb is script for calculating the evolution of dynamical reputation, network and core-periphery properties. The results are stored in data/processed data, so they can be directly used for plotting figures

  • Figures.ipynb is notebook for plotting results. Script src/drawing_functions.py holds different drawing functions.

About

Comparison between closed and active Stack Exchange communities; dynamic of the users' reputation; the influence of the core users on the community sustainability

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages