README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# btmembers

### An R Package to Import Data on All Members of the Bundestag since 1949

<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
<!-- badges: end -->

**(EN)**

The German Bundestag distributes [data on all members of the parliament since 1949](https://www.bundestag.de/services/opendata/) in the form of an XML file. These data, however, can be difficult to work with because XML files store information in an arbitrary number of dimensions. btmembers downloads the file "Stammdaten aller Abgeordneten seit 1949 im XML-Format" from the Bundestag website and converts it into either (a) four data frames nested in a list (retaining all the information of the original XML file) or (b) a single, condensed data frame, where the units of analysis are the terms served by a member (member-terms).

**Not an R user?** You can also download the various datasets as [CSV](csv/) (encoding: UTF-8) or [Excel](excel/) files.

A **codebook** for the four data frames is available [here](codebook/codebook.pdf).


**(DE)**

Der Deutsche Bundestag verteilt [Stammdaten aller Abgeordneten seit 1949](https://www.bundestag.de/services/opendata/) in Form einer XML-Datei. Die Arbeit mit diesen Daten kann jedoch schwierig sein, da XML-Dateien Informationen in beliebig vielen Dimensionen speichern. btmembers lädt die Datei "Stammdaten aller Abgeordneten seit 1949 im XML-Format" von der Website des Bundestages herunter und konvertiert sie entweder (a) in vier _data frames_, die zu einer Liste gruppiert sind (wobei alle Informationen der ursprünglichen XML-Datei erhalten bleiben), oder (b) in einen einzigen, komprimierten _data frame_, bei dem die Analyseeinheiten die jeweiliegen Wahlperioden der einzelnen Abgeordneten sind (Abgeordneten-Wahlperioden). 

**Sie verwenden R nicht?** Sie können die verschiedenen Datensätze auch als [CSV-Dateien](csv/) (Zeichenkodierung: UTF-8) oder als [Excel-Dateien](excel/) herunterladen.

Ein **Codebook** für die vier Datensätze finden Sie [hier](codebook/codebook.pdf).

## Installation

You can install btmembers from GitHub with:

``` r
# install.packages("devtools")
devtools::install_github("jolyphil/btmembers")
```
## Usage

### Importing a list with all the information from the Bundestag

btmembers exports a single function: `import_members()`. By default, this function returns a list containing four data frames (`namen`, `bio`, `wp`, and `inst`), which together preserve all the information contained in the XML file provided by the Bundestag. 

```{r import-list}
library(btmembers)
members <- import_members()
summary(members)
```


#### Data frame `namen`

The data frame `namen` contains data on names of members of the Bundestag. Each row represents a name. Members can have multiple names (N~names~ > N~members~). 

```{r namen}
head(members$namen)
```


#### Data frame `bio`

The data frame `bio` contains biographical information on members. Each row represents a biographical entry. There is one entry by member (N~bio~ = N~members~). 

```{r bio}
head(members$bio)
```


#### Data frame `wp`

The data frame `wp` contains data on parliamentary terms served by members. Each row represents a member-term. Many members have served multiple terms (N~terms~ > N~members~).

```{r wp}
head(members$wp)
```


#### Data frame `inst`

The data frame `inst` contains records of functions occupied by members inside institutions of the Bundestag. Each row represents a member-term-function. Members might have had multiple functions during the same term (N~functions~ > N~terms~ > N~members~).

```{r inst}
head(members$inst)
```


### Importing a condensed data frame

Instead of importing a list with all the information from the Bundestag, `import_members()` can also load a condensed data frame. Each row corresponds to a member-term. Most of the information contained in the original data is preserved, except only the most recent name of the member is retained and functions are removed. A new column named `fraktion` is added to the data. `fraktion` is a recoded variable and refers to the faction the member spent most time in during a given parliamentary term.

```{r import-df}
members_df <- import_members(condensed_df = TRUE)
head(members_df[c("nachname", "vorname", "wp", "fraktion")])
```


### Joining data frames

You can join data frames by `id` (the members' identification numbers) or `id` _and_ `wp` (the parliamentary terms).

```{r join, message = FALSE, warning=FALSE}
library(dplyr)

members$namen %>%
  group_by(id) %>%
  slice_max(historie_von) %>% # keep most recent name
  ungroup() %>%
  left_join(members$bio, by = "id") %>%
  left_join(members$wp, by = "id") %>%
  left_join(members$inst, by = c("id", "wp")) %>%
  select(nachname, vorname, wp, ins_lang, mdbins_von, mdbins_bis) %>%
  head()
```


## Citation

To cite the package 'btmembers' in publications use:

> Joly, P. (2023). _btmembers: Import Data on All Members of the Bundestag since 1949_. R package version 0.2.3. https://github.com/jolyphil/btmembers

The package should be cited with [the original source](https://www.bundestag.de/services/opendata).