Big City Bias: Evaluating the Impact of Metropolitan Size on Computational Job Market Abilities of Language Models
Large language models have emerged as a useful technology for job matching, for both candidates and employers. Job matching is often based on a particular geographic location, such as a city or region. However, LMs have known biases, commonly derived from their training data. In this work, we aim to quantify the metropolitan size bias encoded within large language models, evaluating zero-shot salary, employer presence, and commute duration predictions in 384 of the United States’ metropolitan regions. Across all benchmarks, we observe correlations between metropolitan population and the accuracy of predictions, with the smallest 10 metropolitan regions showing upwards of 300% worse benchmark performance than the largest 10.
Note: MacOS is the preferred development environment.
- NodeJS:
brew install node
- NPM: Included with Node installation
- An internet browser (visualization only)
Note: Run commands from the project root
Install dependencies:
npm install
Render in-browser visualizations:
npm run viz
Render PDF visualizations:
python3 viz/data.py && python3 viz/correlation.py
To re-run evaluations, configure the .env
file as follows:
CACHE="FALSE"
OPENAI_API_KEY="<YOUR_KEY_HERE>"
REPLICATE_API_KEY="<YOUR_KEY_HERE>"
You can then run:
npm run eval:viz
Note: You must provide your own salary data in data/metro/us_metro_salary.csv
Note: A complete evaluation fires thousands of completion requests to each model. Use at your own (financial) risk!
/cache
=> contains cached evaluation outputs with a file for each model (e.gevaluation-gpt-3.5-turbo.json
)/data/metro
=> contains CSV data sources for evaluations (e.gus_metro_commute.csv
)/src/index.ts
=> root evaluation file, everything starts here/src/Analysis.ts
=> statistical evaluation of evaluation output. Writes toviz/data.json
for visualization./viz
=> contains browser and pdf visualization files. All charts render data fromviz/data.json
.
TODO: Describe visualizations
data/metro/us_metro_commute.csv
: Commuteorigin
,destination
, andduration
data obtained via the Google Maps Platform. See util/createCommuteDataset.ts to learn more.data/metro/us_metro_employers.csv
: People Data Labs, "Top Employers by US Metro"data/metro/us_metro_population.csv
: U.S. Census Bureau, "Annual Resident Population Estimates for Metropolitan and Micropolitan Statistical Areas and Their Geographic Components for the United States: April 1, 2020 to July 1, 2022 (CBSA-EST2022)"data/metro/us_metro_salary.csv
: Proprietary and confidential to Indeed.com. Data is redacted from public release.