Cite: Haghish, E. F. (2020). Developing, maintaining, and hosting Stata statistical software on GitHub. The Stata Journal, 20(4), 931-951.
github
is a Stata module for searching and installing Stata packages from GitHub, including previous releases of
a package. It is a combination of several Stata commands such as search
, findit
, and ssc
, but instead, made for managing Stata packages hosted on GitHub. In addition, the package provides new features for version control which are not available to Stata users elsewhere (for example, the package allows installing older releases of a Stata package to reproduce an analysis carried out with older software).
NEWS: Introducing gitget
command. gitget
further simplifies installing and updating a package. it provides a database for all existing Stata packages on GitHub, which allows you to install a package that is hosted on GitHub - and is known by gitget
, simply by typing:
gitget packagename
The gitget
command relies on a complete list of Stata packages on GitHub to identify the URL of a project. This list is created programmatically using a search program that detects Stata packages. First, all of the Stata repositories are identified and stored in repolist.dta. Next, the installable packages are extracted and stored in gitget.dta which also is installed with the package on your machine. The complete list of the archive is also available for preview within this repository
Table of Contents
- Introduction
- Installation
- Syntax
- Subcommands
- Building package installation files automatically
- List of Stata Packages Recognized by
gitget
command
In the recent years, Stata users have been increasingly using GitHub for sharing Stata code and datasets, developing statistical software, and hosting Stata packages, as shown in the figure below. This is indeed a good news for the Stata community and anyone promoting research transparency! The github
module, facilitates searching, installing, managing, and even building Stata packages. Furthermore, it allows installing development version or alternatively, stable releases of a package. It also allows specifying package dependencies that will be installed automatically, making GitHub even more appealing for Stata users and developers. Here, I quickly show you how to install and use the github
package.
You can install the latest version of the github
command by executing the following code:
net install github, from("https://haghish.github.io/github/")
The general syntax of the package can be summarized as:
github [subcommand] [...]
Where the [subcommand] can be:
Subcommand | Description |
---|---|
search |
finds a Stata package on GitHub |
findfile |
searches for a filee among all Stata packages on GitHub |
install |
installs a package from GitHub |
list |
provides information about packages installed with github |
query |
lists all previous releases of a packag |
check |
tests whether a repository is installable |
uninstall |
removes a package from Stata |
update |
updates a package from GitHub |
version |
returns the version of an installed package |
and the [...] can be whether username/repository or packagename based on the specified subcommand.
To install a package, all you need is the GitHub username and the name of the repository. The combination of username and repository name - seperated by a slash - provides the needed URL to the repository. For example, to install MarkDoc package, which is hosted on https://github.com/haghish/markdoc, it is enough to type:
github install haghish/markdoc [, stable version("")]
The github
package includes a database for the complete list of Stata packages hosted on GitHub. Therefore, you can also install a package just by specifying the package name. The gitget
command - which is a wrapper for github install
- can install or update Stata packages from GitHub only by asking the package name:
gitget packagename [, stable version("")]
For example, if you wish to install markdoc
package, typing gitget markdoc
would be as goo as typing github install haghish/markdoc
. If you wish to inspect the list of Stata packages hosted on GitHub, see the gitget.dta data set.
Options | Description |
---|---|
stable |
installs the latest stable release version of the package |
version(str) |
installs the specified release version |
You can search GitHub for Stata package using a keyword or many keywords. This is similar to Stata's search
or findit
commands, but instead, only used for searching GitHub packages:
github search weaver
Searching GitHub API effectively is very important. For this, the package includes a search GUI that shows the syntax you can use to narrow down your search or expand it to include other sources. The search command also analyzes the release dates for packages hosted on the net
command, which is a very useful feature. To launch the GUI, type:
db github
For example, if you use the github search
command to search for markdoc
package, you get the following output:
github
has a built-in database that keeps track of the packages installed on your machine, and of course, also tells you the versions of the packages installed on the machine. The version is taken from the unique release tags of the package, specified by the developer. You can list
the installed packages and get helpful information about them. This command also notifies you if there is an available update for any of your GitHub packages. For example, in the output below, we know that there are updates available for two of our packages. we can also click on the (update)
link to update the package to the latest release.
. github list
Not all packages are installable. Stata repositories must have toc and pkg files in order to be installable. You can check whether a package is installable or not using the check
subcommand.
github check haghish/markdoc
This is rather important point to pay attention to because the
github search
command that is used for searching Stata packages on GitHub, tends to dismiss Stata repositories that are not installable. In other words, if your repository does not include these files, it will not be considered a Stata package, unless you specify the optionall
in your search (in the search GUI check the show GitHub repositories that are not installable option). However, thegithub
package also includes a GUI for building these files. Using the GUI that comes withgithub
, you can easily build these files for your repository (see below).
To install a package, use the uninstall
subcommand, followed by the package name. For example:
github uninstall markdoc
GitHub allows archiving unlimited number of package versions. The github
command has an option for specifying
the package version, allowing installing previous package versions. For example, for installing an older
version of MarkDoc package, say 3.8.0
. you can type:
github install haghish/markdoc , version("3.8.0")
But were can you see the package versions? GitHub has a release tab that lists all of the previous releases of the software (See for example the previous releases of MarkDoc). But the good news is that github
has a subcommand for listing all of the previous releases in Stata results windows and allows you to install any of them (as well as their package dependencies for that particular version, if specified) with a single mouse click or programmatically. To do so, type:
github query username/repository
For example, to list MarkDoc's previous releases, type:
. github query haghish/markdoc
----------------------------------------
Version Release Date Install
----------------------------------------
3.8.5 2016-10-16 Install
3.8.4 2016-10-13 Install
3.8.3 2016-10-03 Install
3.8.2 2016-10-01 Install
3.8.1 2016-09-29 Install
3.8.0 2016-09-24 Install
3.7.9 2016-09-20 Install
3.7.8 2016-09-19 Install
3.7.7 2016-09-18 Install
3.7.6 2016-09-13 Install
3.7.5 2016-09-08 Install
3.7.4 2016-09-07 Install
3.7.3 2016-09-06 Install
3.7.2 2016-09-05 Install
3.7.0 2016-08-23 Install
3.6.9 2016-08-16 Install
3.6.7 2016-02-27 Install
----------------------------------------
When writing an analysis with a dynamic documentation software, such as MarkDoc, you should report the version of the packages that use are using in your analysis. You can obtain the version of an installed package programmatically using the version
subcommand, followed by the :
. github version markdoc
3.8.0
This command does not have any other uses because the github list
command already shows the version of the installed packages and also checks whether there is a newer version of them available...
Some packages rely on other packages. The github
command allows you to install the package
dependencies with or without a specific version. To do so:
- create a file named
dependency.do
and include it in the repository - this file is not meant to be installed in the PLUS directory therefore it should not be mentioned in the pkg file, when you are building the package (see below)
- include the code for installing the package dependencies in this do file. If the packages
are hosted on GitHub, use the
github
command for installing the package dependencies and even specify the requiered version. github
command looks fordependency.do
after installing the package and if it finds it in the repository, it executes it.
For example, MarkDoc package has a dependency.do
file that can serve as an example how the dependency file should be created. Naturally, the dependenc.do
file is only executable by github install
command.
The github
package includes a database of all files that are installed within packages hosted on GitHub. This is an important feature for the developers to avoid creating a file with a name used by another developer because such packages cannot be installed together on the same system (the problem also exists for program name). The github
package allows developers to check whether a particular name has been used in a package before.
The findfile
subcommand searches the githubfiles.dta
(installed with the package) and if found, it points out the repository that includes the file. For example, searching for files that have the keyword "dy" will result in:
. github findfile dy
---------------------------------
Searching githubfiles database
---------------------------------
dynpandoc.ado
dynpandoc.sthlp
dynamicPage1.mata
---------------------------------
These files also include the link to their repositories, if you click on them on the results window.
Imagine you have created an ado-file and Stata help files. How do you make your repository installable? You need to create a stata.toc and a packagename.pkg files manually, specify the required information, files that should be installed, etc. The github
package introduces the make
GUI that generates the package installations for you, using a strict layout. You can just select the files that you wish to install, specify the required information, and have your toc and pkg files ready. Then, as soon as you copy these files to your repository, it would be installable!
Change the working directory to the repository path and then run the GUI, typing:
db make
write down the required information and select the files that should be installed. Press OK, and enjoy!
The gitget
data set is downloaded along with github
package. This data set is updated monthly. Click here to see the complete list of gitget
packages.
E. F. Haghish
Center for Medical Biometry and Medical Informatics
University of Freiburg, Germany
@Haghish