This repository contains the scripts used to obtain the data in the paper "A Survey of Open Source Software Repositories in the US Department of Energy's National Laboratories" in Computing in Science & Engineering (2024).
This work was exploratory in nature, and at first we did not expect a peer-reviewed publication would come of this curiosity-driven research. Consequently, the scripts here are "research-ware" and are decidedly not polished. We publish them for the sake of transparency and reproducibility.
We are in the planning process of writing a more professional framework (e.g., with documentation, clear APIs, etc.) to make this work more generalizable to more platforms (e.g., GitLab, PyPi, etc.) and to increase reproducibility going forward. Please contact sam@cs.uoregon.edu with questions.
Due to space limitations, we did not include all data which was scraped. However, you may encounter data from "BigQuery". This refers to Google's BigQuery GitHub database, which we also mined. We did not include those results in the paper, however.