TSA Complaint Counts

In its FOIA Electronic Reading Room, the US Transportation Security Administration (TSA) publishes semi-regular reports on the monthly numbers of traveler complaints by airport, category, and subcategory.

Unfortunately, they post these data only as PDFs (example here), rather than as machine-readable data files, and at unpredictable intervals. Because of the idiosyncratic fashion in which the records are provided, some additional effort is needed to get the data in a format that can be easily analyzed.

This repository, created by the Data Liberation Project and volunteers:

Fetches new PDFs as they become available
Parses the raw data from these PDFs
Converts that data into CSV files
Standardizes the results

Data guidance

The Data Liberation Project recommends using the files in output/03-standardized/, which contain monthly complaint counts for January 2015 – March 2024 and are divided into three levels of granularity:

output/03-standardized/complaints-by-airport.csv: Overall complaint count by airport and month.
output/03-standardized/complaints-by-category.csv: Complaint count by airport, month, and complaint category.
output/03-standardized/complaints-by-subcategory.csv: Complaint count by airport, month, complaint category, and complaint subcategory.

These CSV files use the following fields, where applicable:

Column	Example Value	Description
pdf_report_date	`2019-12`	The month (YYYY-MM) of the PDF from which this row is sourced.
airport	`ABE`	The airport's three-letter code. Note: Null/blank values appear to represent complaints not associated with any airport in particular. They are not grand totals.
category	`Mishandling of Passenger Property`	The complaint category.
subcategory	`Mishandling of Passenger Property - Damaged/Missing Items--Checked Bag`	The complaint subcategory.
year_month	`2016-01`	The complaint month.
count	`1`	The number of complaints matching that airport, category, and subcategory for that month.
clean_cat	`Mishandling of Passenger Property`	The standardized/cleaned category label; see below for details.
clean_subcat	`Damaged/Missing Items--Checked Baggage`	The standardized/cleaned subcategory label; see below for details.
clean_cat_status	`original`	See below for details.
clean_subcat_status	`imputed`	See below for details.
is_category_prefix_removed	`True`	Whether the standardization process trimmed off the (redundant) category label from the subcategory label.

Caveat: Comparisons over time

In March 2024, a TSA spokesperson provided comments to FedScoop reporter Rebecca Heilweil indicating that (at least) some of the increase in complaints over time can be attributed to the agency making it easier to submit PreCheck complaints:

The spokesperson said that changes to several platforms and customer service tools are responsible for the rise in complaints. In May 2021, the agency created a new TSA PreCheck webform that saw complaints increase around 79% in the following four months. That August, the agency deployed messaging enhancements that, in combination with the new online form, saw complaints grow by 62% in the subsequent four months. (Switching to Salesforce for the TSA Contact Center at the end of 2020 also meant that the airport field in the data started to populate).

Caveat: Ambiguous subcategories

Due to how the TSA formats its PDFs, subcategories are sometimes rendered ambiguously.

For instance, the PDFs sometimes list a Mishandling of Passenger Property subcategory that has been truncated to Damaged/Missing Items--C. This name is ambiguous because it could represent either of the following (both of which appear elsewhere, in non-truncated entries):

Damaged/Missing Items--Carry-on Luggage
Damaged/Missing Items--Checked Baggage

Similar types of ambiguous subcategories appear under the following three categories:

Mishandling of Passenger Property- all subcategories beginning with “Damaged/Missing Items”
Property - Special Handling- all subcategories
Expedited Passenger Screening Program- all subcategories

For this reason, subcategory totals in these categories may not be definitive.

Subcategories impacted by ambiguous values are marked with an asterisk. For example, both the ambiguous subcategory *Damaged Items, and related categories *Damaged Items - Carry-On and *Damaged Items - Checked are marked with asterisks to warn of potentially inaccurate totals.

Caveat: Null Values

By airport: Preliminary analysis of output/03-standardized/complaints-by-airport.csv shows that nearly half of all complaints overall have a null value for airport. Though the reason for these null values is currently unknown, this seems to suggest that complaints may not be associated with any particular airport. Also noteworthy, this proportion declines to about a quarter of all complaints by 2022.

By subcategory: Records having a subcategory of Expedited Passenger Screening Program are cleaned to render simply *. This is because the subcategory is simply a repetition of the category with no further detail provided.

Data cleaning

In the CSV files in output/03-standardized, the fields clean_cat_status and clean_subcat_status track how data is cleaned/standardized, based on assumptions about how the data was likely truncated in the TSA's PDF complaint report outputs. There are four possible values:

Status	Description	Change Made
`original`	Not truncated	Original category retained
`imputed`	Only one possible truncation	Category imputed to likely value
`ambiguous`	Two or more truncatations possible	Original category retained
`missing`	No corresponding value in the Data Liberation Project's lookup tables	Category left blank

Imputation criteria

For imputed categories and subcategories, we have reasonable certainty of what was truncated. For example:

Additional Information Required/Insufficient Inf

... is likely truncated from (and thus transformed to):

Additional Information Required/Insufficient Information

Trimmed attributes

Many TSA complaint subcategories repeat the category within the subcategory. The standardization process trims out these repeated categories. For example, the following record:

Category	Subcategory
Advanced Imaging Technology (AIT)	Advanced Imaging Technology (AIT) - Non-Flyer

...is transformed to:

Category	Subcategory
Advanced Imaging Technology (AIT)	Non-Flyer

These trimmed subcategories are flagged as True in the column is_category_prefix_removed.

Data Cleaning Lookup Tables

The values of clean_cat and clean_cat_status are determined for each record in output/03-standardized/complaints-by-category.csv using the lookup table lookups/lkp_cleaner_categories.csv based on matching values of Category.

Similarly, the values of clean_cat, clean_cat_status, clean_subcat, and clean_subcat_status are determined for each record in output/03-standardized/complaints-by-subcategory.csv using the lookup table lookups/lkp_cleaner_subcategories.csv based on matching values of Category and Subcategory.

Scripts

The repository's pipeline consists of the following scripts:

scripts/00-scrape.py: Scrapes the TSA FOIA electronic reading room for all relevant PDFs and downloads them to the pdfs/ directory.
scripts/01-parse.py: Parses each of the PDFs, extracting the structured data and saving it to the output/01-parsed/ directory. For each PDF, the output includes two CSV files: one containing one row per fiscal year total (as stated literally in the PDF), and one containing one row per monthly total.
scripts/02-combine.py: Combines the monthly-total CSVs into a single, deduplicated CSV (since each report covers multiple years, and the coverage periods overlap across reports), saved to output/02-combined/. Also saves three additional subsets, each focused on a single level of granularity (airport, airport-category, and airport-category-subcategory).
- In instances where the same month's data is available in multiple PDF reports, we use the counts from the most recent report. A given entry's count rarely changes over time; when it does, it's typically by small amounts.
scripts/03-standardize.py: Standardizes the category and subcategory labels, since the reports' PDF layouts often trim the full description, and saves the results to output/03-standardized/. The script also trims redundant category prefixes from subcategory descriptions. See the Data guidance section above for interpreting the results.

Repository structure

.
├── .github
│   └── workflows
│       └── run.yml
├── .gitignore
├── Makefile
├── README.md
├── lookups
│   ├── lkp_cleaner_categories.csv
│   └── lkp_cleaner_subcategories.csv
├── output
│   ├── 01-parsed
│   │   ├── report-2019-02-fiscal-year.csv
│   │   ├── report-2019-02-month.csv
│   │   ├── …
│   │   ├── report-2023-10-fiscal-year.csv
│   │   └── report-2023-10-month.csv
│   ├── 02-combined
│   │   ├── complaints-by-airport-raw.csv
│   │   ├── complaints-all-by-month.csv
│   │   ├── complaints-by-category-raw.csv
│   │   └── complaints-by-subcategory-raw.csv
│   ├── 03-standardized
│   │   ├── complaints-by-airport.csv
│   │   ├── complaints-by-category.csv
│   │   └── complaints-by-subcategory.csv
│   └── misc
│       └── report-disagreements.csv
├── pdfs
│   ├── tsa-contact-center-traveler-complaints-report-2019-02.pdf
│   ├── …
│   └── tsa-contact-center-traveler-complaints-report-2023-10.pdf
├── requirements.in
├── requirements.txt
└── scripts
    ├── 00-scrape.py
    ├── 01-parse.py
    ├── 02-combine.py
    └── 03-standardize.py

Credits

This data pipeline has been developed by Jake Zucker, Rob Reid, Emily Keller-O'Donnell, Asako Mikami, and Jeremy Singer-Vine, collaborating through the Data Liberation Project.

Licensing

This repository's code is available under the MIT License terms. The PDFs in the pdfs/ directory are public domain. All other data files are available under Creative Commons' CC BY-SA 4.0 license terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSA Complaint Counts

Data guidance

Caveat: Comparisons over time

Caveat: Ambiguous subcategories

Caveat: Null Values

Data cleaning

Imputation criteria

Trimmed attributes

Data Cleaning Lookup Tables

Scripts

Repository structure

Credits

Licensing

About

Releases

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
lookups		lookups
output		output
pdfs		pdfs
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt

data-liberation-project/tsa-complaint-counts

Folders and files

Latest commit

History

Repository files navigation

TSA Complaint Counts

Data guidance

Caveat: Comparisons over time

Caveat: Ambiguous subcategories

Caveat: Null Values

Data cleaning

Imputation criteria

Trimmed attributes

Data Cleaning Lookup Tables

Scripts

Repository structure

Credits

Licensing

About

Resources

Stars

Watchers

Forks

Releases

Languages