Skip to content

Clean personally identifiable information from dirty dirty text using spaCy.

License

Notifications You must be signed in to change notification settings

LeapBeyond/scrubadub_spacy

Repository files navigation

scrubadub_spacy

scrubadub removes personally identifiable information from text. scrubadub_spacy is an extension that uses spaCy NLP models to remove personal information from text.

This package contains two extra detectors:

  • scrubadub_spacy.detectors.SpacyEnityDetector - A detector that uses the spacy NER model to find locations, names, dates and other entities.
  • scrubadub_spacy.detectors.SpacyNameDetector - A detector that uses the spacy NER model and context words to find names in text.

For more information on how to use this package see the scrubadub spacy documentation and the scrubadub repository.

Build Status Version Downloads Test Coverage Documentation Status

New maintainers

LeapBeyond are excited to be supporting scrubadub with ongoing maintenance and development. Thanks to all of the contributors who made this package a success, but especially @deanmalmgren, IDEO and Datascope.