Cardiological Examinations Graph

This repository contains the final project of the Big Data Analytics and Business Intelligence course (AY 20/21) at the University of Naples Federico II.

Assignment

A dataset containing medical information of different patients is provided. Patient's information includes its examinations, with the relative anamnesis and diagnosis, written in Italian. The aim of the project is to build a Named Entity Recognition (NER) system capable of extracting diseases and symptoms from textual inputs. Labeled data are provided to train the model. Once the NER system is developed, a graph-based database must be implemented, integrating patients information (examinations, diseases, symptoms, etc.).

Project

The project, developed in team of 3, is structured as following:

preprocessing directory contains data cleaning and data preparation steps, both performed using PySpark in a distributed environment, provided by Databricks.
ner-system directory contains training and inference of the NER system, implemented using John Snow Labs (Spark NLP). The system is based on a version of BERT pre-trained on an Italian dataset, and it has been fine-tuned for 10 epochs, exploiting GPUs on Google Colab. However, due to the scarcity of labeled data, the model reached limited performance.
graph-based-database directory contains the code used to populate a graph-based database, built using Neo4j. The database stores all the relevant information of the patients, interconnecting each patient with its examinations, diseases, symptoms, drugs and doctors.
data-visualization directory contains:
- Three dashboards developed in Microsoft Power BI, which show respectively general information about doctors, the clinical situation of a given patient, and correlation among diseases, symptoms and drugs.
- A bloom perspective implemented in Neo4j Bloom, which allows the user to explore the graph without the need to know Cypher Query Language.
documentation directory contains an exhaustive documentation of the project, written in Italian.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cardiological Examinations Graph

Assignment

Project

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data-visualization		data-visualization
documentation		documentation
graph-based-database		graph-based-database
ner-system		ner-system
preprocessing		preprocessing
.gitattributes		.gitattributes
README.md		README.md

fabiod20/big-data-analytics-and-business-intelligence

Folders and files

Latest commit

History

Repository files navigation

Cardiological Examinations Graph

Assignment

Project

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages