From 22dd9c5b2e75cc6bd29dddf9861012de6c0f7bab Mon Sep 17 00:00:00 2001 From: Sid Mohan Date: Tue, 14 May 2024 09:36:24 -0700 Subject: [PATCH] updated README --- README.md | 94 ++++++++++++++++------- examples/b43b72693226422bac5f083c941ecfdb | 1 - examples/getting_started.ipynb | 49 +----------- 3 files changed, 68 insertions(+), 76 deletions(-) delete mode 160000 examples/b43b72693226422bac5f083c941ecfdb diff --git a/README.md b/README.md index a3758d9f..e0cedf85 100644 --- a/README.md +++ b/README.md @@ -54,7 +54,7 @@ pip install datafog ### Usage -The [Getting Started notebook](/datafog-python/examples/getting_started.ipynb) features a standalone Colab notebook that lets you get up and running in no time. +The [Getting Started notebook](/datafog-python/examples/getting_started.ipynb) features a standalone Colab notebook. #### Text PII Annotation @@ -63,54 +63,90 @@ To annotate PII in a given text, lets start with a set of clinical notes: ``` !git clone https://gist.github.com/b43b72693226422bac5f083c941ecfdb.git -``` +# Define the directory path +folder_path = 'clinical_notes/' + +# List all files in the directory +file_list = os.listdir(folder_path) +text_files = sorted([file for file in file_list if file.endswith('.txt')]) -```python -from datafog import TextPIIAnnotator +with open(os.path.join(folder_path, text_files[0]), 'r') as file: + clinical_note = file.read() -text = "John Doe lives at 1234 Elm St, Springfield." -text_annotator = TextPIIAnnotator() -annotated_text = text_annotator.run(text) -print(annotated_text) +display(Markdown(clinical_note)) +``` +which looks like this: ``` -This will output the annotated text with PII labeled, such as `{"LOC": ["Springfield"]}`. +**Date:** April 10, 2024 -#### Image Text Extraction and Annotation +**Patient:** Emily Johnson, 35 years old -To extract text from an image and perform PII annotation, you can use the `DataFog` class: +**MRN:** 00987654 -```python -from datafog import DataFog +**Chief Complaint:** "I've been experiencing severe back pain and numbness in my legs." -image_url = "https://pbs.twimg.com/media/GM3-wpeWkAAP-cX.jpg" -datafog = DataFog() -annotated_text = await datafog.run_ocr_pipeline([image_url]) -print(annotated_text) -``` +**History of Present Illness:** The patient is a 35-year-old who presents with a 2-month history of worsening back pain, numbness in both legs, and occasional tingling sensations. The patient reports working as a freelance writer and has been experiencing increased stress due to tight deadlines and financial struggles. + +**Past Medical History:** Hypothyroidism -This will download the image, extract the text using OCR, and annotate any PII found in the extracted text. +**Social History:** +The patient shares a small apartment with two roommates and relies on public transportation. They mention feeling overwhelmed with work and personal responsibilities, often sacrificing sleep to meet deadlines. The patient expresses concern over the high cost of healthcare and the need for affordable medication options. -#### Text Processing +**Review of Systems:** Denies fever, chest pain, or shortness of breath. Reports occasional headaches. -To process and annotate text using the DataFog pipeline, you can use the `DataFog` class: +**Physical Examination:** +- General: Appears tired but is alert and oriented. +- Vitals: BP 128/80, HR 72, Temp 98.6°F, Resp 14/min -```python -from datafog import DataFog +**Assessment/Plan:** +- Continue to monitor blood pressure and thyroid function. +- Discuss affordable medication options with a pharmacist. +- Refer to a social worker to address housing concerns and access to healthcare services. +- Encourage the patient to engage with community support groups for social support. +- Schedule a follow-up appointment in 4 weeks or sooner if symptoms worsen. + +**Comments:** The patient's health concerns are compounded by socioeconomic factors, including employment status, housing stability, and access to healthcare. Addressing these social determinants of health is crucial for improving the patient's overall well-being. -text = ["Tokyo is the capital of Japan"] -datafog = DataFog() -annotated_text = await datafog.run_text_pipeline(text) -print(annotated_text) ``` -This will process the given text and annotate entities such as person names and locations. +we can then set up our pipeline to accept these files + +``` +async def run_text_pipeline_demo(): + results = await datafog.run_text_pipeline(texts) + print("Text Pipeline Results:", results) + return results + + +texts = [clinical_note] +loop = asyncio.get_event_loop() +results = loop.run_until_complete(run_text_pipeline_demo()) +``` -For more detailed usage and examples, please refer to the API documentation. Note: The DataFog library uses asynchronous programming, so make sure to use the `async`/`await` syntax when calling the appropriate methods. +#### OCR PII Annotation + +Let's use a image (which could easily be a converted or scanned PDF) + +![Executive Email](https://pbs.twimg.com/media/GM3-wpeWkAAP-cX.jpg) + +``` +datafog = DataFog(operations='extract_text') +url_list = ['https://pbs.twimg.com/media/GM3-wpeWkAAP-cX.jpg'] + +async def run_ocr_pipeline_demo(): + results = await datafog.run_ocr_pipeline(url_list) + print("OCR Pipeline Results:", results) + +loop = asyncio.get_event_loop() +loop.run_until_complete(run_ocr_pipeline_demo()) + +``` +You'll notice that we use async functions liberally throughout the SDK - given the nature of the functions we're providing and the extension of DataFog into API/other formats, this allows the functions to be more easily adapted for those uses. ## Contributing diff --git a/examples/b43b72693226422bac5f083c941ecfdb b/examples/b43b72693226422bac5f083c941ecfdb deleted file mode 160000 index b819affb..00000000 --- a/examples/b43b72693226422bac5f083c941ecfdb +++ /dev/null @@ -1 +0,0 @@ -Subproject commit b819affb51796204d59987893f89dee18428ed5d diff --git a/examples/getting_started.ipynb b/examples/getting_started.ipynb index 9749f341..478e7e96 100644 --- a/examples/getting_started.ipynb +++ b/examples/getting_started.ipynb @@ -220,55 +220,12 @@ }, { "cell_type": "code", - "execution_count": 13, + "execution_count": null, "metadata": {}, - "outputs": [ - { - "data": { - "text/markdown": [ - "\n", - "**Date:** April 10, 2024\n", - "\n", - "**Patient:** Emily Johnson, 35 years old\n", - "\n", - "**MRN:** 00987654\n", - "\n", - "**Chief Complaint:** \"I've been experiencing severe back pain and numbness in my legs.\"\n", - "\n", - "**History of Present Illness:** The patient is a 35-year-old who presents with a 2-month history of worsening back pain, numbness in both legs, and occasional tingling sensations. The patient reports working as a freelance writer and has been experiencing increased stress due to tight deadlines and financial struggles.\n", - "\n", - "**Past Medical History:** Hypothyroidism\n", - "\n", - "**Social History:**\n", - "The patient shares a small apartment with two roommates and relies on public transportation. They mention feeling overwhelmed with work and personal responsibilities, often sacrificing sleep to meet deadlines. The patient expresses concern over the high cost of healthcare and the need for affordable medication options.\n", - "\n", - "**Review of Systems:** Denies fever, chest pain, or shortness of breath. Reports occasional headaches.\n", - "\n", - "**Physical Examination:**\n", - "- General: Appears tired but is alert and oriented.\n", - "- Vitals: BP 128/80, HR 72, Temp 98.6°F, Resp 14/min\n", - "\n", - "**Assessment/Plan:**\n", - "- Continue to monitor blood pressure and thyroid function.\n", - "- Discuss affordable medication options with a pharmacist.\n", - "- Refer to a social worker to address housing concerns and access to healthcare services.\n", - "- Encourage the patient to engage with community support groups for social support.\n", - "- Schedule a follow-up appointment in 4 weeks or sooner if symptoms worsen.\n", - "\n", - "**Comments:** The patient's health concerns are compounded by socioeconomic factors, including employment status, housing stability, and access to healthcare. Addressing these social determinants of health is crucial for improving the patient's overall well-being.\n", - "\n" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], + "outputs": [], "source": [ "# Define the directory path\n", - "folder_path = 'b43b72693226422bac5f083c941ecfdb/'\n", + "folder_path = 'clinical_notes/'\n", "\n", "# List all files in the directory\n", "file_list = os.listdir(folder_path)\n",