The Hidden Universe of Data-Analysis
Current Supplementary Materials
Executive Report - describing the full study
Nate Breznau
Eike Mark Rinke
Alexander Wuttke
Hung H.V. Nguyen
Show all participant co-researchers
Muna Adem, Jule Adriaans, Amalia Alvarez-Benjumea, Henrik Andersen, Daniel Auer, Flavio Azevedo, Oke Bahnsen, Dave Balzer, Paul C. Bauer, Gerrit Bauer, Markus Baumann, Sharon Baute, Verena Benoit, Julian Bernauer, Carl Berning, Anna Berthold, Felix S.Bethke, Thomas Biegert, Katharina Blinzler, Johannes N. Blumenberg, Licia Bobzien, Andrea Bohman, Thijs Bol, Amie Bostic, Zuzanna Brzozowska, Katharina Burgdorf, Kaspar Burger, Kathrin Busch, Juan Carlos-Castillo, Nathan Chan, Pablo Christmann, Roxanne Connelly, Christian Czymara, Elena Damian, Alejandro Ecker, Achim Edelmann, Maureen A.Eger, Simon Ellerbrock, Anna Forke, Andrea Forster, Chris Gaasendam, Konstantin Gavras, Vernon Gayle, Theresa Gessler, Timo Gnambs, Amélie Godefroidt, Alexander Greinert, Max Grömping, Martin Groß, Stefan Gruber, Tobias Gummer, Andreas Hadjar, Jan Paul Heisig, Sebastian Hellmeier, Stefanie Heyne, Magdalena Hirsch, Mikael Hjerm, Oshrat Hochman, Jan H. Höffler, Andreas Hövermann, Sophia Hunger, Christian Hunkler, NoraHuth, Zsofia Ignacz, LauraJacobs, Jannes Jacobsen, Bastian Jaeger, Sebastian Jungkunz, Nils Jungmann, Mathias Kauff, Manuel Kleinert, Julia Klinger, Jan-Philipp Kolb, Marta Kołczyńska, John Kuk, Katharina Kunißen, Dafina Kurti, Philipp Lersch, Lea-Maria Löbel, Philipp Lutscher, Matthias Mader, Joan Madia, Natalia Malancu, Luis Maldonado, Helge Marahrens, Nicole Martin, Paul Martinez, Jochen Mayerl, Oscar J. Mayorga, Patricia McManus, Kyle McWagner, Cecil Meeusen, Daniel Meierrieks, Jonathan Mellon, Friedolin Merhout, Samuel Merk, Daniel Meyer, Jonathan Mijs, Cristobal Moya, Marcel Neunhoeffer, Daniel Nüst, Olav Nygård, Fabian Ochsenfeld, Gunnar Otte, Anna Pechenkina, Christopher Prosser, Louis Raes, Kevin Ralston, Miguel Ramos, Frank Reichert, Leticia Rettore Micheli, Arne Roets, Jonathan Rogers, Guido Ropers, Robin Samuel, Gregor Sand, Constanza Sanhueza Petrarca, Ariela Schachter, Merlin Schaeffer, David Schieferdecker, Elmar Schlueter, Katja Schmidt, Regine Schmidt, Alexander Schmidt-Catran, Claudia Schmiedeberg, Jürgen Schneider, Martijn Schoonvelde, Julia Schulte-Cloos, Sandy Schumann, Reinhard Schunck, Jürgen Schupp, Julian Seuring, Henning Silber, Willem Sleegers, Nico Sonntag, Alexander Staudt, Nadia Steiber, Nils Steiner, Sebastian Sternberg, Dieter Stiers, Dragana Stojmenovska, Nora Storz, Erich Striessnig, Anne-Kathrin Stroppe, Janna Teltemann, Andrey Tibajev, Brian Tung, Giacomo Vagni, Jasper Van Assche, Metavan der Linden, Jolanda van der Noll, Arno Van Hootegem, Stefan Vogtenhuber, Bogdan Voicu, Fieke Wagemans, Nadja Wehl, Hannah Werner, Brenton Wiernik, Fabian Winter, Christof Wolf, Nan Zhang, Conrad Ziller, Björn Zakula, Stefan Zins and Tomasz ŻółtakThis is the repository for preparation and analysis of data obtained from the Crowdsourced Replication Initiative (Breznau, Rinke and Wuttke et al 2018) and used as the basis for the paper Observing Many Researchers Using the Same Data and Hypothesis Reveals a Hidden Universe of Uncertainty.
Recently, many researchers independently testing the same hypothesis using the same data, reported tremendous variation in results across scientific disciplines. This variability must derive from differences in each research process. Therefore, observation of these differences should reduce the implied uncertainty. Through a controlled study involving 73 researchers/teams we tested this assumption. Taking all research steps as predictors explains at most 2.6% of total effect size variance, and 10% of the deviance in subjective conclusions. Expertise, prior beliefs and attitudes of researchers explain even less. Ultimately, each model was unique, and as a whole this study provides evidence of a vast universe of research design variability normally hidden from view in the presentation, consumption, and perhaps even creation of scientific results.
The workflow is provided in a literate programming format, R Markdown notebooks (.Rmd
), and split across a number of files as described below.
Next to the .Rmd
files, there are also .html
files of the same name. The latter contain HTML renderings of the notebooks with the created figures and tables, so that non-R users may view the workflow results more easily with any regular browser software.
For example, the file 01_CRI_Descriptives.Rmd
has a corresponding 01_CRI_Descriptives.html
file in the same folder for easy viewing without the need for running any R code.
Paths in the notebooks are handled with the here
package and the paths are all relative to the projects root directory (where this README.md file is located).
You can open an interactive environment to explore and execute the analysis yourself based on Binder (Project Jupyter, 2018):
The runtime environment created for the Binder uses an MRAN snapshot of 2020-03-29 (see file .binder/runtime.txt
) and installs all required R packages in the file .binder/install.R
.
The workflow includes a shinyapp that allows users to interact with results using specification curves.
We collected the code from 73 teams and cleaned it for public sharing. This involved qualitative identification of model specifications, ensuring replicability, extracting Average Marginal Effects (AMEs) and redacting any identifying features. The resulting codes are compiled by software type in the sub-folders of this project, ordered by team ID number (in folder team_code
, and sub-folders: team_code_SPSS
, team_code_Stata
, team_code_Mplus
and team_code_R
). The code in the team_code_R
) folder imports the results from all other codes to compile a final joined dataset of effect sizes and confidence interval measures.
Users should be aware that the main data files include team zero, which is the results and model specifications from the study of Brady and Finnigan (2014) providing a launching point for the CRI; team zero is dropped from our main analyses but provides a point of comparison.
Prior to our main analyses we import data from the Participant Survey including subjective voting on model quality, and the voting during the post-result deliberation. The code for these files (001-003) are contained in the folder data_prep
. It is not necessary to run these scripts as their output is already saved in the data
folder.
Our primary analyses and results are in the code
folder. Many of the results in this folder depend on data preparation done in the data_prep
folder.
All of the following are located in the main or sub-folders of the folder code
.
Filename | Location | Description | Output |
---|---|---|---|
001_CRI_Prep_Subj_Votes.Rmd |
data_prep |
Compile peer ranking of models | FigS4 |
002_CRI_Data_Prep.Rmd |
data_prep |
Primary data cleaning and merging; measurement of researcher characteristics | TblS1 ;TblS3 ;FigS3 ;FigS3_fit_stats |
003_CRI_Multiverse_Simulation.Rmd |
data_prep |
Sets up multiverse data | |
01_CRI_Descriptives.Rmd |
code |
Descriptive statistics; codebook of 107 model design steps | FigS5 ;FigS10 |
02_CRI_Common_Specifications.Rmd |
code |
identifying (dis)similarities across models | TblS4 |
03_CRI_Spec_Analysis.Rmd |
code |
Plotting specification curves | Fig1 ;FigS6 ;FigS7 ;FigS8 ;FigS9 |
04_CRI_Main_Analyses.Rmd |
code |
Main regression models explaining outcome variance within and between teams | Fig3 ;TblS5 ;TblS6(see bottom of S5) ;TblS7 |
05_CRI_Main_Analyses_Variance_Function.Rmd |
code |
Variance function regressions to explain variation in variance by team | Fig2 ;FigS11 ;FigS12 ;FigS13 ;TblS11 |
06_CRI_Multiverse.Rmd |
code |
Function to test all possible combinations of submitted model specifications to explain variance | TblS8 ;TblS10 |
07_CRI_DVspecific_Analyses.Rmd |
code |
re-running main models separately by dependent variable (6 ISSP survey questions) | TblS9 |
The following scripts run all notebook files in order to check there are no code issues.
source("all.R")
The data preparation code is in the sub-folder data_prep
. After the data preparation files, all data files ready for the data analysis are in the data
folder. There are numerous data files because the different participants' codes often require individual special files to run properly. The data files needed to reproduce all of the data analysis are:
Filename | Description | Source |
---|---|---|
MAIN FILES | Used in Main Analyses 01-07 | |
cri.csv |
Main data analysis file, model & team-levels. All specifications coded by the PIs, team test results and researcher characteristics in numeric format | Worked up in code/data_prep |
cri_str.csv |
A string-format only version of cri.csv |
Worked up in code/data_prep |
cri_team.csv |
A version of cri_str.csv aggregated team-level means (N = 89 because 16 teams conducted independent hypothesis tests by 'stock' and 'flow' immigration measures) |
Worked up in code/data_prep |
popdf_out.Rdata |
The peer review/deliberation scoring of model specifications as ranked by all participants; excepting non-response | Generated in sub-folder CRI/data_prep |
SUB-FILES | Used in Preparation of Data or App | |
Research Design Votes.xlsx |
Based on participant pre-registered designs, plus cursory review of all research designs. Not a fully accurate portrayal of final research designs because, (a) the broad range of specifications not reported in basic research designs and (b) the participant's often deviated from their proposed designs, if only slightly | This is a copy of the actual template (a Google Sheet) used to create the peer review voting system in the Participant Survey |
cri_shiny.csv |
The model-level data needed to run the shiny app | Generated in code/data_prep |
cri_shiny_team.csv |
The team-level data needed to run the shiny app | Generated in code/data_prep |
Install repo2docker
and then run
repo2docker --editable .