asyml · Leolty · Jul 8, 2022 · Jul 8, 2022 · Jul 8, 2022 · Jul 8, 2022
diff --git a/examples/clinical_pipeline/README.md b/examples/clinical_pipeline/README.md
@@ -0,0 +1,144 @@
+## A Clinical Information Processing Example
+
+This example shows how we can construct a project to make ForteHealth and Stave work side by side.
+
+## Install extra dependencies
+
+To install the latest code directly from source,
+
+```bash
+pip install git+https://git@github.com/asyml/forte-wrappers#egg=forte.elastic\&subdirectory=src/elastic
+pip install git+https://git@github.com/asyml/forte-wrappers#egg=forte.spacy\&subdirectory=src/spacy
+pip install git+https://git@github.com/asyml/forte-wrappers#egg=forte.spacy\&subdirectory=src/nltk
+pip install git+https://github.com/asyml/ForteHealth.git
+pip install git+https://github.com/astml/stave.git
+```
+
+To install from PyPI,
+```bash
+pip install forte.elastic
+pip install forte.spacy
+pip install forte.nltk
+pip install forte.health
+pip install stave
+```
+
+## Downloading the models
+
+This example includes the following six functions:
+1. Sentence Segementation
+2. Tokenization
+3. Pos Tag
+4. Bio Named Entity Recognition
+5. Nagation Context Analysis
+6. ICD Coding Detection
+
+Before running the pipeline, we need to download the some of the models
+
+```bash
+python ./download_models.py 
+```
+
+**Note**: The above script will save the model in `resources/NCBI-disease`. Use `--path` option to save the model into a different directory.
+
+## Set up the configuration
+Before run Elastic Searcher and Stave, we need to ensure that the current configuration is compatible with the environment of our computer.
+
+Please check and change the following configurations in `clinical_config.yml`:
+
+1. Ensure `LastUtteranceSearcher.stave_db_path`(line 16) is the correct path -> `$Home/.stave`, e.g.,  `"/home/name/.stave"`
+2. Ensure `Stave.username`(line 26) and `Stave.pw`(line 27) is `"admin"` and `"admin"`.
+
+
+
+
+## Prepare elastic searcher
+Download corresponding elasticsearch archive from https://www.elastic.co/downloads/past-releases/elasticsearch-7-17-2, unzip it and run `elasticsearch-7-17-2/bin/elasticsearch` to start the service. 
+
+Run the following to check if elasticsearch is running properly:
+```bash
+curl -XGET localhost:9200/_cluster/health?pretty
+```
+
+Make sure you create index 'elastic_indexer' in the cluster before working with this example, you can run the following command:
+```bash
+curl -X PUT localhost:9200/elastic_indexer
+```
+
+You can also follow the online blog for more information:
+
+https://www.elastic.co/guide/en/elasticsearch/reference/current/starting-elasticsearch.html
+
+## Generate ontologies and config
+We also need to generate the Stave configuration file.
+
+See the three json files in the directory:
+1. `clinical_pipeline_ontology.json`: this file defines the ontology we define in clinical domain.
+2. `stave_chat_config.json`: this file defines the configuration of the chatbox in stave.
+3. `stave_chat_config.json`: this file defines the configuration of the defualt nlp interface in stave.
+
+Now run
+```bash
+python generate_stave_config.py
+```
+
+This command can merge clinical ontologies and base ontologies, and generate the configuration of the stave project. If it executes successfully, you can see two new json files in current directory: `defualt_onto_project.json` and `chat_project.json`.
+
+## Run indexer and Stave
+First, you should start an Elastic Indexer backend.
+
+Then, to start the Stave server that our pipeline will connect to for visualization purposes, run
+```bash
+stave -s start -o -l -n 8899
+```
+
+Here, you need to make sure `LastUtteranceSearcher.url` and `Stave.url` in `clinical_config.yml` are both `"http://localhost:8899"`. Or you can change the port 8899 to any port you like.
+
+
+
+## Run demo pipeline
+
+Now, open a new terminal, other than the one running stave server. You can run the following command to parse some files and index them.
+```bash
+python clinical_processing_pipeline.py path_to_mimiciii/1.4/NOTEEVENTS.csv.gz path_to_mimiciii_output 100 1
+```
+
+The last argument, `use_mimiciii_reader` is whether to use the `Mimic3DischargeNoteReader()`. If you set the argument to `1`, you will need to make sure the input data is mimic iii dataset, else `0` for `PlainTextReader()`.
+
+If you do not have the mimic iii datasets and just want to test the function, you can run the following command to test the function with the given sample data:
+
+```bash
+python clinical_processing_pipeline.py sample_data/ path_to_sample_output/ -1 0
+```
+
+If we just need to check the remote pipeline connection to Stave. 
+
+You can mask out Line 76 to Line 118 in `clinical_processing_pipeline.py`.
+
+Hence, if you just wish to run the demo pipeline with existing database entries, and wish to just connect with Stave for visualization, You can mask out Line 74 to Line 118 in `clinical_processing_pipeline.py` and run this command:
+
+```bash
+python clinical_processing_pipeline.py ./ ./ 100 0
+```
+
+Here, we also write out the raw data pack to `/path_to_sample_output`, and only
+index the first 100 notes. Remove the `100` parameter to index all documents.
+
+## Visualization
+
+You can go ahead and open `http://localhost:8899` on your browser to access Stave UI.
+Next, you will see 2 projects, named as `clinical_pipeline_base` and `clinical_pipeline_chat` by default.
+
+<img width="1437" alt="image" src="https://user-images.githubusercontent.com/14886942/174163073-d9c86f57-76c9-46f4-ade0-c0a81d7d71a6.png">
+
+Click on `clinical_pipeline_chat` and then the document that resides within to go to the chatbot/search UI. Enter the keywords you want to search for in the elasticsearch indices. The pipeline would then return a bunch of documents that match your keywords. Click on those document links to access the Annotation Viewer UI for those documents.
+
+<img width="1437" alt="image" src="https://user-images.githubusercontent.com/14886942/174163371-f6e56a25-7b51-48d9-969a-f8d7140c8c89.png">
+
+<img width="1431" alt="image" src="https://user-images.githubusercontent.com/14886942/174163438-eef1c877-38f5-43e8-b792-9de33a467b33.png">
+
+
+## Add the output data 
+We write out the raw data pack to  `/path_to_sample_output`, so you can see many json files in the directory.
+
+Click on `clinical_pipeline_base` and add the json file to the documents. Click on those document links to access the Annotation Viewer UI for those documents.
diff --git a/examples/clinical_pipeline/chat_query_document.json b/examples/clinical_pipeline/chat_query_document.json
@@ -0,0 +1,44 @@
+{
+  "name": "query_chatbot.json",
+  "project_id": 99,
+  "textPack": {
+    "py/object": "forte.data.data_pack.DataPack",
+    "py/state": {
+      "creation_records": {},
+      "field_records": {},
+      "links": [],
+      "groups": [],
+      "meta": {
+        "py/object": "forte.data.data_pack.Meta",
+        "py/state": {
+          "pack_name": "query_chatbot",
+          "_pack_id": 3,
+          "language": "eng",
+          "span_unit": "character"
+        }
+      },
+      "_text": "Welcome! Please type in a query to retrieve relevant clinical reports.",
+      "annotations": [
+        {
+          "py/object": "ft.onto.base_ontology.Utterance",
+          "py/state": {
+            "_span": {
+              "py/object": "forte.data.span.Span",
+              "begin": 0,
+              "end": 70
+            },
+            "_tid": 0,
+            "speaker": "ai"
+          }
+        }
+      ],
+      "generics": [],
+      "replace_back_operations": [],
+      "processed_original_spans": [],
+      "orig_text_len": 70,
+      "serialization": {
+        "next_id": 1
+      }
+    }
+  }
+}
diff --git a/examples/clinical_pipeline/clinical_config.yml b/examples/clinical_pipeline/clinical_config.yml
@@ -0,0 +1,31 @@
+BERTTokenizer:
+  model_path: "resources/NCBI-disease"
+
+Spacy:
+  processors: ["sentence", "tokenize", "pos", "dep", "ner", "umls_link"]
+  medical_onto_type: "ftx.medical.clinical_ontology.MedicalEntityMention"
+  umls_onto_type: "ftx.medical.clinical_ontology.UMLSConceptLink"
+  lang: "en_ner_bc5cdr_md"
+
+BioBERTNERPredictor:
+  model_path: "resources/NCBI-disease"
+  ner_type: "DISEASE"
+  ignore_labels: ["O"]
+
+LastUtteranceSearcher:
+  stave_db_path: "//home//name//.stave//db.sqlite3"
+  url: "http://localhost:8899"
+
+Remote:
+  port: 8008
+  input_format: "DataPack"
+  service_name: "Medical_Chatbot"
+
+Stave:
+  url: "http://localhost:8899"
+  username: admin
+  pw: admin
+
+viewer_project_json: "default_onto_project.json"
+chat_project_json: "chat_project.json"
+chat_document_json: "chat_query_document.json"
diff --git a/examples/clinical_pipeline/clinical_pipeline_ontology.json b/examples/clinical_pipeline/clinical_pipeline_ontology.json
@@ -0,0 +1,49 @@
+{
+    "name": "clinical_pipeline_ontology",
+    "definitions": [
+        {
+            "entry_name": "ftx.medical.clinical_ontology.NegationContext",
+            "parent_entry": "forte.data.ontology.top.Annotation",
+            "description": "A span based annotation `NegationContext`, used to represent the negation context of a named entity.",
+            "attributes": [
+                {
+                    "name": "polarity",
+                    "type": "bool"
+                }
+            ]
+        },
+        {
+            "entry_name": "ftx.medical.clinical_ontology.MedicalEntityMention",
+            "parent_entry": "ft.onto.base_ontology.EntityMention",
+            "description": "A span based annotation class MedicalEntityMention, used to represent an Entity Mention in medical domain",
+            "attributes": [
+                {
+                    "name": "umls_link",
+                    "type": "str"
+                },
+                {
+                    "name": "umls_entities",
+                    "type": "List",
+                    "item_type": "ftx.medical.clinical_ontology.UMLSConceptLink"
+                }
+            ]
+        },
+        {
+            "entry_name": "ftx.medical.clinical_ontology.MedicalArticle",
+            "parent_entry": "forte.data.ontology.top.Annotation",
+            "description": "An annotation based representation for the whole medical text chunk/document",
+            "attributes": [
+                {
+                    "name": "icd_version",
+                    "type": "int",
+                    "description": "The version of ICD-Coding being used."
+                },
+                {
+                    "name": "icd_code",
+                    "type": "str",
+                    "description": "The ICD code assigned to current medical article."
+                }
+            ]
+        }
+    ]
+}