Skip to content

Latest commit

 

History

History
7 lines (5 loc) · 478 Bytes

README.md

File metadata and controls

7 lines (5 loc) · 478 Bytes

Occupation coding datasets

  1. GenEasy: A collection of 500 synthetic job listings linked to select ESCO occupation codes, crafted using GPT-4.
  2. GenHard: Identical to the above, but with job titles diverging from the textual descriptors of their respective codes.
  3. Real_indeed: A set of 100 genuine job listings sourced from Indeed, annotated manually.

Each dataset consists of columns for ID, job title, description, label, and other potential supplementary data.