This project aims at detecting identifying entities at AP-HP's Clinical Data Warehouse:
Label | Description |
---|---|
ADRESSE |
Street address, eg 33 boulevard de Picpus |
DATE |
Any absolute date other than a birthdate |
DATE_NAISSANCE |
Birthdate |
HOPITAL |
Hospital name, eg Hôpital Rothschild |
IPP |
Internal AP-HP identifier for patients, displayed as a number |
MAIL |
Email address |
NDA |
Internal AP-HP identifier for visits, displayed as a number |
NOM |
Any last name (patients, doctors, third parties) |
PRENOM |
Any first name (patients, doctors, etc) |
SECU |
Social security number |
TEL |
Any phone number |
VILLE |
Any city |
ZIP |
Any zip code |
Please find our arXiv preprint at the following link: https://arxiv.org/pdf/2303.13451.pdf.
If you use EDS-Pseudo, please cite us as below:
@article{tannier2023development,
title={Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse},
author={Tannier, Xavier and Wajsb{\"u}rt, Perceval and Calliger, Alice and Dura, Basile and Mouchet, Alexandre and Hilka, Martin and Bey, Romain},
journal={arXiv preprint arXiv:2303.13451},
year={2023}
}
Visit the documentation for more information!
We would like to thank Assistance Publique – Hôpitaux de Paris and AP-HP Foundation for funding this project.