Skip to content

R scripts for accessing Form 8871 and Form 8872 data on 527 Political Action Committees.

Notifications You must be signed in to change notification settings

Nonprofit-Open-Data-Collective/irs-527-political-action-committee-disclosures

Repository files navigation

irs-527-political-action-committee-disclosures

R scripts for accessing Form 8871 and Form 8872 data on 527 Political Action Committees and producing the following datasets.

POL-ORGS-FM-8871.csv  >>  organization details
POL-ORGS-SCHED-D.csv  >>  director and officer information
POL-ORGS-SCHED-E.csv  >>  election authority IDs 
POL-ORGS-SCHED-R.csv  >>  related entities records
------------
POL-ORGS-FM-8872.csv  >>  required annual disclosures 
POL-ORGS-SCHED-A.csv  >>  individual donations
POL-ORGS-SCHED-B.csv  >>  organizational donations 

Overview of Filing Requirements

https://www.irs.gov/charities-non-profits/political-organizations/political-organization-filing-and-disclosure

Download POFD Data (Political Organization Filing Disclosures)

The links above connect to the online database of public filings by Section 527 political organizations. It contains all electronic filings made by 527 organizations. Paper filings go back to January 2012.

https://forms.irs.gov/app/pod/dataDownload/dataDownload

This file is updated every Sunday at 1:00AM.

Download links:

OR separately by org type:

Data Overview

The data is stored in ASCII pipe-delimited format:

H|20230513|0311|B|
F|20230513|0311|387364|
1|8871|9661837|1|0|0|824170729|Jones for County Council|319 15th St||Columbus|IN|47201||no@email|20180126|David A. Jones|319 15th St||Columbus|IN|47201||David A Jones|319 15th St||Columbus|IN|47201||319 15th St||Columbus|IN|47201||0||0|This is the campaign committee to elect David Jones to the Bartholomew County Council.  |20180126|2018-01-26 19:52:00|1|1
D|9661837|152013|Jones for County Council|824170729|David A Jones|Candidate|319 15th St||Columbus|IN|47201||
E|9662148|21805|82-4204086|FL|
R|9662240|72908|Hospital and Healthsystem Assoc  of PA Political Action Comm HAPAC|232125904|Hospital and Healthsystem Assoc of Pennsylvania|Connected|30 North Third Street|Suite 600|Harrisburg|PA|17101||
2|8872|426|20001128|20001231|0|0|1|0|Holland and Knight CCE|912063482|315 South Calhoun Street|Suite 600|Tallahassee|FL|32301||pgreene@hklaw.com|19791017|Patricia B. Greene|315 South Calhoun Street|Suite 600|Tallahassee|FL|32301|1872|Patricia B. Greene|315 South Calhoun Street|Suite 600|Tallahassee|FL|32301|1872|315 South Calhoun Street|Suite 600|Tallahassee|FL|32301||4|||20001107|GA|0|0|0|0|2001-01-30 17:38:50|
A|487|861|Lawyers for Louisiana|720856107|Ronnie G. Penton|209 Hoppen Pl.||Bogalusa|LA|70427|3827|Self employed|100|Attorney|1200||
B|481|20123|IMPACT|592217012|Thomas Howell Ferguson, PA|PO Drawer 14569||Tallahassee|FL|32317|4569|Not Applicable|785|Not Applicable|||

TABLES

Data File Metadata:

  • H: FILE HEADER for each data release
  • F: Every file will contain a Footer Record.

Form 8871 Tables

  • 1: The 8871 Form Record Header contains the main form data. All EAINs, Officers/Directors and Related Entities for the form will follow the ‘1’ Form Record in the E, D, and R records.
  • D: If Director and Officer records exist for an 8871 Form, each record will be printed out in a “D” record. The “D” records for a given 8871 Form will follow it’s related “1” Record. There may be several “D” records for any one “1” record.
  • E: If the PAC has provided an Election Authority Identification Number (EAIN) on the 8871 Form, each record will be printed out in an “E” record. The “E” records for a given 8871 Form will follow its related “1” Record. There may be several “E” records for any one “1” record.
  • R: If Related Entities records exist for an 8871 Form, each record will be printed out in a “R” record. The “R” records for a given 8871 Form will follow it’s related “1” Record. There may be several “R” records for any one “1” record.

Form 8872 Tables

  • 2: The Form Record Header contains the main form data. All Schedule As and Bs for that form will follow the ‘2’ Form Record in ‘A’ and ‘B’ Records.
  • A: If a Schedule A exists for an 8872 Form, each Schedule A record will be printed out in an “A” record. The “A” records for a given 8872 Form will follow its related “2” Record. There may be several “A” records for any one “2” record.
  • B: If a Schedule B exists for an 8872 Form, each Schedule B record will be printed out in an “B” record. The “B” records for a given 8872 Form will follow its related “2” Record. There may be several “B” records for any one “2” record.

DATA DICTIONARY

USAGE

Demo database build functionality with the sample data subset "HMDataFile.txt".

## SOURCE PARSING FUNCTIONS

source( "https://raw.githubusercontent.com/Nonprofit-Open-Data-Collective/irs-527-political-action-committee-disclosures/main/parse-pol-org-disclosures.R" )


##  READ HMDataFile.txt COPY FROM GITHUB

URL <- "https://github.com/Nonprofit-Open-Data-Collective/irs-527-political-action-committee-disclosures/raw/main/HMDataFile.txt"
d <- get_form_x( file.name=URL, form.type="A" )


##  LOAD FILE ONCE, THEN PASS TO BUILD FUNCTIONS
##     Avoids repeating data load when
##     creating multiple tables

txt <- read_textfile(URL)
d <- get_form_x( x=txt, form.type="A" )


##  BUILD ALL TABLES AND 
##  WRITE THEM TO FILE

build_all( URL )                 # using HMDataFile.txt demo dataset above
build_all( "FullDataFile.txt" )  # from the full data file download

Data Parsing Details

Some demo scripts for how to part the ASCII data with R and what types of problems (data corruption) you can expect (approximately 0.1% of the data has been corrupted by special characters breaking the ASCII format).

Using the "SUBSET ORGS H-M" data from above: "HMDataFile.txt" once unzipped.

The letter at the start of each line designates the form type in the file. For example, in this file:

FORM Freq
A 214901
B 121371
D 26121
1 11075
2 8238
R 3475
E 2183

Note that numbers will change depending on when you download the data - these are cumulative files.

The form types and their corresponding variable names are found in the PolOrgsFileLayout.doc.

See an R FUNCTION that generalizes the steps below.

library( dplyr )
library( knitr )
library( stringr )
library( pander )

# read local: 
# fileName <- "HMDataFile.txt"
# con <- file( fileName, open="r" )

url <- "https://raw.githubusercontent.com/Nonprofit-Open-Data-Collective/irs-527-political-action-committee-disclosures/main/HMDataFile.txt"
con <- file( url, open="r" )
line <- readLines(con)
close(con)

f1 <- substr( line, 1, 1 )
f2 <- substr( line, 2, 2 )

head( line[   f2 == "|" ], 25 )
head( line[ ! f2 == "|" ], 25 )

# CHECK PROBLEM LINES
# which( line == "questions, hold candidates accountable, elect" )
# line[ 21765:21776 ]

#  [4] "D|9627494|109594|IBEW 481 Legislative Campaign Acct|273917449|Steve Menser|Chairman|1828 N Meridian St|Suite 205|Indianapolis|IN|46202||"                                                                                                                             
#  [5] "1|8871|9627554|0|1|1|454540511|Higher Heights Political Fund|100 Church Street|S820|New York|NY|10007||no@email|20120215|Lora Haggard|29 Briarwood Drive||Ringgold|GA|30736||Hasoni Pratt|100 Church Street|S820|New York|NY|10007||100 Church Street|S820|New York|NY|10007||0||0|The purpose of the organization shall be to influence and"                                                                                                                                                                                               
#  [6] "impact elections by mobilizing Black women to raise"                                                                                                                                                                                                                 
#  [7] "questions, hold candidates accountable, elect"                                                                                                                                                                                                                       
#  [8] "more Black women to public office, and support"                                                                                                                                                                                                                       
#  [9] "candidates committed to advance policies"                                                                                                                                                                                                                             
# [10] "that affect Black women.|20140711|2014-07-11 23:31:50|1|1"                                                                                                                                                                                                           
# [11] "D|9627554|109602|Higher Heights Political Fund|454540511|Hasoni Pratts|Treasurer|100 Church Street|S820|New York|NY|10007||"                                                                                                                                         
# [12] "1|8871|9627597|0|1|0|464431052|iVote Fund|722 12th Street NW|Third Floor|Washington|DC|20005||no@email|20140106|Ellen Kurz|722 12th Street NW|Third Floor|Washington|DC|20005||Ellen Kurz|722 12th Street NW|Third Floor|Washington|DC|20005||722 12th Street NW|Third Floor|Washington|DC|20005||0||0|To educate and active the general public regarding the importance of voting and the role of Secretaries of State in the voting process and to make independent expenditures in support of or opposition to candidates for Secretary of State|20140711|2014-07-14 01:02:55|0|0"

It appears as if the organization on line 5 may have included pipes or some other special character in it's mission statement, and the IRS did not do validation of that field (check for special characters before ASCII conversion).

These cases need to be fixed by hand (or clever programming). We will ignore them for now.

### FIX MISSING FIRST LETTER
###   Add a space if line begins with pipe

line[ f1 == "|" ] <- paste0( " ", line[ f1 == "|" ] )
f1 <- substr( line, 1, 1 )
f2 <- substr( line, 2, 2 )

valid.f <- f1[ f2 == "|" ]

unique( valid.f )
# [1] "H" "1" "R" "D" "E" "." " " "2" "A" "B" "F"

valid.f[ valid.f == "" ]  <- "empty"
valid.f[ valid.f == " " ] <- "space"

table(valid.f) %>% 
  sort( decreasing=T ) %>% 
  as.data.frame() 

table(valid.f) %>% 
  sort( decreasing=T ) %>% 
  as.data.frame() %>% 
  kable()

Types of Forms Available - see PolOrgsFileLayout.doc for definitions:

valid.f Freq
A 214901
B 121371
D 26121
1 11075
2 8238
R 3475
E 2183
space 2
. 1
F 1
H 1

Get Schedule A records:

line.a <- line[ f1 == "A" & f2 == "|" ]

head( line.a )

list.a <- 
  line.a %>% 
  strsplit( "\\|" )

# should have 17 variables: 

sapply( list.a, length ) %>% table()
#     13     15     17 
#      7     14 214880

list.length <- 
  sapply( list.a, length )

check.these <- list.length != 17
list.a[ check.these ]

which( list.length != 17 )
line.a[ 69284 ]

# check the original entries for context: 

grep( "A\\|9591920\\|1810036", line, value=F ) # first part of problematic case

# line[ 161973:161976 ]
#
# [1] "A|9591920|1810035|Hospital and Healthsystem Assoc  of PA Political Action Comm HAPAC|232125904|Mr. Thomas V. Whalen Jr.|609K Springhouse Road||Allentown|PA|18104|4692|Lehigh Valley Hospital|200|NA|200|20091019|"
# [2] "A|9591920|1810036|Hospital and Healthsystem Assoc  of PA Political Action Comm HAPAC|232125904|Dr. Elliot J. Sussman MD|PO Box 689||Allentown|PA|18105|1556|Lehigh Valley Hospital & Health"                       
# [3] "            Network, Inc.|500|President & CEO|775|20091019|"                                                                                                                                                       
# [4] "A|9591920|1810037|Hospital and Healthsystem Assoc  of PA Political Action Comm HAPAC|232125904|Michael C Mullane|14 Hamlet Hill Road||Baltimore|MD|21210|1501|Temple University Hospital|200|Sr. VP|200|20091019|" 

# RECORD 2 WAS SPLIT ACROSS TWO ROWS - NEEDS MANUAL FIXING

Get variable names from the PolOrgsFileLayout.doc file by copying the tables into Excel then filtering by lines that are not pipe delimiters.

This is the Schedule A table, for example (FORMTYPE, ORDER, and VARNAME were added):

IRS.Field.Name Size Format FORMTYPE ORDER VARNAME
Record Type 1 Alphanumeric A 1 YES
Pipe Delimiter 1 Character A 2 NO
Form ID Number Up to 38 digits Numeric A 3 YES
Pipe Delimiter 1 Character A 4 NO
SCHED A ID Up to 38 digits Numeric A 5 YES
Pipe Delimiter 1 Character A 6 NO
ORG NAME 70 AlphaNumeric A 7 YES
Pipe Delimiter 1 Character A 8 NO
EIN 9 AlphaNumeric A 9 YES
Pipe Delimiter 1 Character A 10 NO
CONTRIBUTOR NAME 50 AlphaNumeric A 11 YES
Pipe Delimiter 1 Character A 12 NO
CONTRIBUTOR ADDRESS 1 50 AlphaNumeric A 13 YES
Pipe Delimiter 1 Character A 14 NO
CONTRIBUTOR ADDRESS 2 50 AlphaNumeric A 15 YES
Pipe Delimiter 1 Character A 16 NO
CONTRIBUTOR ADDRESS CITY 50 AlphaNumeric A 17 YES
Pipe Delimiter 1 Character A 18 NO
CONTRIBUTOR ADDRESS STATE 2 AlphaNumeric A 19 YES
Pipe Delimiter 1 Character A 20 NO
CONTRIBUTOR ADDRESS ZIP CODE 5 AlphaNumeric A 21 YES
Pipe Delimiter 1 Character A 22 NO
CONTRIBUTOR ADDRESS ZIP EXT 4 AlphaNumeric A 23 YES
Pipe Delimiter 1 Character A 24 NO
CONTRIBUTOR EMPLOYER 70 AlphaNumeric A 25 YES
Pipe Delimiter 1 Character A 26 NO
CONTRIBUTION AMOUNT 17 AlphaNumeric A 27 YES
Pipe Delimiter 1 Character A 28 NO
CONTRIBUTOR OCCUPATION 70 AlphaNumeric A 29 YES
Pipe Delimiter 1 Character A 30 NO
AGG CONTRIBUTION YTD 17 AlphaNumeric A 31 YES
Pipe Delimiter 1 Character A 32 NO
CONTRIBUTION DATE 8 AlphaNumeric A 33 YES
Pipe Delimiter 1 Character A 34 NO

Variable names shortened:

varnames <- 
  c("TYPE", "FORM_ID", "SCHED_A_ID", "ORG_NAME", 
    "EIN", "CONT_NAME", "CONT_ADDRESS_1", "CONT_ADDRESS_2", 
    "CONT_CITY", "CONT_STATE", "CONT_ZIP", 
    "CONT_ZIP_EXT", "CONT_EMPLOYER", "CONT_AMOUNT", 
    "CONT_OCCUPATION", "CONT_AMOUNT_YTD", "CONT_DATE" )

Then create the table with all of the Schedule A lines that are not corrupted.

these.ok <- list.length == 17
line.a.ok <- line.a[ these.ok ]

df <-
  line.a.ok %>% 
  strsplit( "\\|" ) %>%
  do.call( rbind, . ) %>%
  data.frame()

varnames <- 
  c("TYPE", "FORM_ID", "SCHED_A_ID", "ORG_NAME", 
    "EIN", "CONT_NAME", "CONT_ADDRESS_1", "CONT_ADDRESS_2", 
    "CONT_CITY", "CONT_STATE", "CONT_ZIP", 
    "CONT_ZIP_EXT", "CONT_EMPLOYER", "CONT_AMOUNT", 
    "CONT_OCCUPATION", "CONT_AMOUNT_YTD", "CONT_DATE" )

names(df) <- varnames
head( df ) %>% pander( style="rmarkdown" )
TYPE FORM_ID SCHED_A_ID
A 9555268 38296
A 9555268 38297
A 9555268 38298
A 9555268 38299
A 9555268 38300
A 9555268 38301

Table: Table continues below

ORG_NAME EIN
HAWAII STATE TEACHERS ASSOCIATION POLITICAL ACTION COMMITTEE 521073928
HAWAII STATE TEACHERS ASSOCIATION POLITICAL ACTION COMMITTEE 521073928
HAWAII STATE TEACHERS ASSOCIATION POLITICAL ACTION COMMITTEE 521073928
HAWAII STATE TEACHERS ASSOCIATION POLITICAL ACTION COMMITTEE 521073928
HAWAII STATE TEACHERS ASSOCIATION POLITICAL ACTION COMMITTEE 521073928
HAWAII STATE TEACHERS ASSOCIATION POLITICAL ACTION COMMITTEE 521073928

Table: Table continues below

CONT_NAME CONT_ADDRESS_1 CONT_ADDRESS_2
National Education Association 1201 16th Street, NW
Hawaii State Teachers Association 1200 ALA KAPUNA STREET
Hawaii State Teachers Association 1200 ALA KAPUNA STREET
Hawaii State Teachers Association 1200 ALA KAPUNA STREET
Hawaii State Teachers Association 1200 ALA KAPUNA STREET
Hawaii State Teachers Association 1200 ALA KAPUNA STREET

Table: Table continues below

CONT_CITY CONT_STATE CONT_ZIP CONT_ZIP_EXT CONT_EMPLOYER
Washington DC 20036 3290 n/a
HONOLULU HI 96819 n/a
HONOLULU HI 96819 n/a
HONOLULU HI 96819 n/a
HONOLULU HI 96819 n/a
HONOLULU HI 96819 n/a

Table: Table continues below

CONT_AMOUNT CONT_OCCUPATION CONT_AMOUNT_YTD CONT_DATE
5000 n/a 5000 20030214
15083 n/a 92298 20030630
15301 n/a 77215 20030531
15291 n/a 61914 20030430
15320 n/a 46623 20030331
15289 n/a 31303 20030228

About

R scripts for accessing Form 8871 and Form 8872 data on 527 Political Action Committees.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages