A Python script to extract individual applications from a combined PDF file, such as for Oxford HR application packs.
- Python 3.7+
- PyPDF2
- click
-
Clone this repository:
git clone https://github.com/synthetic-society/corehr-pdf-split.git cd corehr-pdf-split
-
Install dependencies with pixi:
pixi install
You can also install the two small dependencies using pip
or your preferred Python package manager.
Run the script from the command line:
pixi run python main.py --input-pdf <path_to_input_pdf> --output-dir <path_to_output_directory>
For example:
pixi run python main.py --input-pdf applicationspack.pdf --output-dir output
This will process the applicationspack.pdf
file and save individual applications in the output
directory. The output folder will be created if it does not exist yet. Each applicant's PDF is saved with a filename format: LastName,FirstName [ApplicantID].pdf
.
This project is available under the MIT License.
Contributions, issues, and feature requests are welcome.