Develop a PII Removal Module for a Mental Health Care WhatsApp Chatbot #5
Replies: 3 comments 2 replies
-
Hey Anjineyulu, thanks for starting this discussion. Students are expected to try different methods and come up with best method they can get. Feel free to ask any question on this discussion for the same.You can start from this dataset - https://drive.google.com/file/d/11wZFqqdgM3r3idPWuXPUdeqMrSHtVs-m/view?usp=sharing It contains 4434 generated texts with their corresponding annotated labels in this formatDescription: document (str): ID of the essay Note: (source:https://www.kaggle.com/datasets/alejopaullier/pii-external-dataset) |
Beta Was this translation helpful? Give feedback.
-
@anjineyulutv Can we explore Differential Privacy and Homomorphic encryption? |
Beta Was this translation helpful? Give feedback.
-
Hey everyone, I did a bit of research on this and as mentioned above, found that Presidio has a lot of functionalities which we can use and modify for this. The code below is a brief example of how we can do this. Presidio has a bunch of inbuilt entities and we can add our own entities, rules and models Points to Note
CodePFB the sample code. The Jupyter notebook for the same is https://colab.research.google.com/drive/18qzbCMq8S55g4QMGLf1qMgksXyVkTflu?usp=sharing
The final result will be
Do let me know your thoughts on this. |
Beta Was this translation helpful? Give feedback.
-
Here's a task for students to create a mental health care WhatsApp chatbot that removes personally identifiable information (PII):
Task: Develop a PII Removal Module for a Mental Health Care WhatsApp Chatbot
Objective:
Create a module for a WhatsApp chatbot that can identify and remove personally identifiable information (PII) from user messages before processing them. This will help protect user privacy and ensure compliance with data protection regulations.
Requirements:
Suggested libraries and techniques:
Natural Language Processing (NLP) libraries:
Regular expressions (regex):
Machine learning techniques:
PII-specific libraries:
Text preprocessing techniques:
Data structures:
Testing frameworks:
Here's a basic outline to get students started:
By working on this task, students will gain experience in NLP, regex, data privacy, and building practical applications for mental health care. They'll also learn about the importance of protecting user data in sensitive applications.
Beta Was this translation helpful? Give feedback.
All reactions