Name	Name	Last commit message	Last commit date
parent directory ..
readme.md	readme.md

purpose
developer machine
install neo4j & mysql
load mysql data and pull into neo4j
pending parts

purpose

this page describes how to setup the local and cloud environments as summarized here

developer machine

pre-requisites - developer machine

local machine
- ensure you have mac/linux/WSL as your local development environment
- you must have admin permissions on your local machine
installed software
- you must have python 3.8 or later and git 2.25 or later installed on your machine
- you must have Docker desktop installed on your machine

python virtual environment

all packages needed for running this code can be deployed inside a virtual environment
these are the steps
- first clone this repo locally
- then open the folder
  - cd KnowledgeGraphAI
then create a Python virtual environment
- python3 -m venv python_venv
then activate the env
- on windows - .\python_venv\Scripts\activate
- on mac - source python_venv/bin/activate
- and now install all dependencies
  - pip install -r requirements.txt

vscode devcontainer

if you use vscode devcontainers like me, then you dont have to do anything noted above in the python virtual env section
i have setup the repo nicely to be launch-ready the moment you download it
launch vscode, and then open the command palette (ctrl+shift+p), and then select "Remote-Containers: Open Folder in Container"
navigate to the cloned repository folder and you are done...all requirements will be automatically installed
you can also make modifications to these 2 files as needed
- devcontainer.json
- postStart.sh

streamlit - developer machine

to visualize the file arrival stats using Streamlit, follow these steps:
- setup your repo by downloading it and installing the required packages as shown in the python virtual environment or the vscode devcontainers sections above
- now open this repository folder inside your terminal and run the Streamlit app:
  - streamlit run welcome.py
- Streamlit will start a local web server, and you will see a URL in the terminal (usually http://localhost:8501)
- open this URL in your web browser to access the interactive charts and visualizations To set up MySQL and Neo4j on your Mac using Docker Compose and enable data extraction from MySQL to Neo4j, follow these detailed steps. We'll also include verification steps to ensure the deployment is progressing as expected.

install neo4j & mysql

setup configurations

create a docker-compose.yml file to define the services for mysql and neo4j
place the file at the root of the repo here

version: "3.8"

services:
  mysql:
    image: mysql:latest
    container_name: mysql-container
    environment:
      MYSQL_ROOT_PASSWORD: rootpassword
      MYSQL_DATABASE: mydatabase
      MYSQL_USER: user
      MYSQL_PASSWORD: password
    ports:
      - "3306:3306"
    volumes:
      - mysql-data:/var/lib/mysql

  neo4j:
    image: neo4j:latest
    container_name: neo4j-container
    environment:
      NEO4J_AUTH: neo4j/password
    ports:
      - "7474:7474"
      - "7687:7687"
    volumes:
      - neo4j-data:/data
      - ./plugins:/var/lib/neo4j/plugins
      - ./conf:/var/lib/neo4j/conf
      - ./conf/apoc.conf:/var/lib/neo4j/conf/apoc.conf

volumes:
  mysql-data:
  neo4j-data:

create apoc.conf that will later allow me to connect from neo4j to mysql
notice the last line in the conf file - if you want to connect to a different database then you need to add those db details

apoc.import.file.enabled=true
apoc.export.file.enabled=true
apoc.import.file.use_neo4j_config=true
apoc.import.file.use_system_temp_directory=false
apoc.import.file.directories=/import,/plugins
apoc.load.jdbc.allowlist=jdbc:mysql://mysql-container:3306
apoc.jdbc.mysql.url=jdbc:mysql://mysql-container:3306/sakila?user=user&password=password

create neo4j.conf for neo4j specific configurations

dbms.default_listen_address=0.0.0.0
dbms.default_advertised_address=localhost
dbms.security.auth_enabled=true

start compose services

run the docker compose file to start mysql and neo4j services

docker-compose up -d

check if the containers are deployed

docker ps

sample output when above command is run

CONTAINER ID   IMAGE          COMMAND                  CREATED          STATUS          PORTS                                                      NAMES
15c08cb08a7c   mysql:latest   "docker-entrypoint.s…"   13 minutes ago   Up 10 minutes   0.0.0.0:3306->3306/tcp, 33060/tcp                          mysql-container
ef3fbeb764ba   neo4j:latest   "tini -g -- /startup…"   13 minutes ago   Up 8 minutes    0.0.0.0:7474->7474/tcp, 7473/tcp, 0.0.0.0:7687->7687/tcp   neo4j-container

verify mysql setup

connect to mysql container:

docker exec -it mysql-container mysql -u root -p

enter the password rootpassword and verify you can log in to mysql
verify mysql database

SHOW DATABASES;

ensure mydatabase is listed as an output for command above
select the database

USE mydatabase;

create table and insert data

CREATE TABLE test (id INT PRIMARY KEY, name VARCHAR(50));
INSERT INTO test VALUES (1, 'TestName');

exit the container by typing exit at the mysql prompt or by pressing Ctrl+D
restart the container:

docker-compose restart mysql

connect to the mysql container again

docker exec -it mysql-container mysql -u root -p

verify if the table created earlier has persisted

SELECT * FROM test;

verify neo4j setup

open your web browser and navigate to http://localhost:7474
log in using the username neo4j and password password
create a node and check if it persists after restarting the neo4j container.

CREATE (n:TestNode {name: 'TestNodeName'});
MATCH (n:TestNode) RETURN n;

restart the container:

docker-compose restart mysql

go back to neo4j in the browser and run the query

MATCH (n:TestNode) RETURN n;

verify if the node created earlier has persisted

verify mysql & neo4j connectivity

intro
- this section shows how to confirm if mysql and neo4j are able to talk to each other
mysql driver
- download MySQL JDBC driver from here
- from the drop-down select the option "Platform Independent" if you are on Mac
- place the jar inside the plugins directory
apoc driver
- download the apoc driver from here
- ensure that you download the apoc version that matches your neo4j instance version as mentioned here
- for example, i first found out my neo4j version number using
  - docker ps - copy the id of the neo4j container
  - docker exec <container_id_or_name> neo4j --version
- the output i got was 5.20.0
- so then i downloaded the version 5.19.0 from here
- place the downloaded jar inside the plugins directory
load drivers
- bounce the containers
  - docker-compose down
  - docker-compose up -d
- check if containers are running again
  - docker ps
verify connectivity
- open neo4j inside browser
  - http://localhost:7474/browser/
- run these queries

CALL apoc.load.driver('com.mysql.cj.jdbc.Driver');

CALL apoc.load.jdbc('jdbc:mysql://user:password@mysql-container:3306/mydatabase', 'SELECT * FROM test') YIELD row
RETURN row;

check response
- you should see the record you created earlier inside mysql

{
	"id": 1,
	"name": "TestName"
}

load mysql data and pull into neo4j

load sample mysql data

purpose for loading sample data
- this section deals with loading some sample data inside the mysql database setup in the previous sections
- this data will then be imported inside neo4j using the connection setup in the previous section
- then we will be able to verify if changes in mysql are replicated inside neo4j and then is our AI able to RAG this data
about the sample database
- the Sakila database simulates a DVD rental store, including information about movies, actors, and customers
- it contains tables for films, actors, and film categories, as well as rental transactions and payment records
- customer and staff details, along with store locations, are also included
- the database tracks which films are rented, their rental history, and associated payments
- it serves as a practical example for learning SQL queries and database management
download and unpack
- download the database files from here
- unzip the file and place the artefacts inside the plugins folder
- copy these files inside the mysql container
  - docker cp ./sample_data/sakila-schema.sql mysql-container:/sakila-schema.sql
  - docker cp ./sample_data/sakila-data.sql mysql-container:/sakila-data.sql
load the database
- log in to the mysql container and load the sakila database
  - docker exec -it mysql-container mysql -u root -p
- use the password - rootpassword
- load the sakila schema
  - SOURCE /sakila-schema.sql;
- load the sakila data
  - SOURCE /sakila-data.sql;
verify the data load

SHOW DATABASES;
USE sakila;
SHOW TABLES;
SELECT * FROM actor LIMIT 10;

grant permissions to neo4j
- in the next section i will be pulling data from mysql into neo4j using cypher queries
- before doing this, i need to grant necessary privileges to the user

GRANT ALL PRIVILEGES ON sakila.* TO 'user'@'%';
FLUSH PRIVILEGES;

verify the privileges are set correctly
- SHOW GRANTS FOR 'user'@'%';
bounce
- bounce the containers
  - docker-compose down
  - docker-compose up -d
- check if containers are running again
  - docker ps

pull mysql data into neo4j

purpose
- in the previous sections i have setup the mysql and neo4j containers using docker-compose and also loaded some sample data inside mysql
- in this section i want to pull the mysql data inside neo4j and setup a permenant connection so that any future changes to mysql data are replicated into neo4j
create apoc procedures to load data from mysql
- open the Neo4j browser at http://localhost:7474/browser/ and log in with neo4j & password
- then run the following cypher queries to create procedures to load data from the mysql sakila database
- note that all sakila entities will be loaded to neo4j from mysql as shown here

load sakila data into neo4j

ensures the MySQL driver is loaded

CALL apoc.load.driver('com.mysql.cj.jdbc.Driver');

this query fetches and merges actor data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM actor') YIELD row
MERGE (a:Actor {actor_id: row.actor_id})
SET a.first_name = row.first_name, a.last_name = row.last_name, a.last_update = row.last_update

this query establishes the relationship between films and categories

CALL apoc.load.jdbc('mysql', 'SELECT * FROM film_category') YIELD row
MERGE (f:Film {film_id: row.film_id})
MERGE (c:Category {category_id: row.category_id})
MERGE (f)-[:BELONGS_TO]->(c);

this query fetches and merges category data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM category') YIELD row
MERGE (c:Category {category_id: row.category_id})
SET c.name = row.name, c.last_update = row.last_update;

this query fetches and merges film data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM film') YIELD row
MERGE (f:Film {film_id: row.film_id})
SET f.title = row.title, f.description = row.description, f.release_year = row.release_year,
		f.language_id = row.language_id, f.original_language_id = row.original_language_id,
		f.rental_duration = row.rental_duration, f.rental_rate = row.rental_rate,
		f.length = row.length, f.replacement_cost = row.replacement_cost,
		f.rating = row.rating, f.special_features = row.special_features,
		f.last_update = row.last_update;

this query establishes the relationship between actors and films

CALL apoc.load.jdbc('mysql', 'SELECT * FROM film_actor') YIELD row
MERGE (a:Actor {actor_id: row.actor_id})
MERGE (f:Film {film_id: row.film_id})
MERGE (a)-[:ACTED_IN]->(f);

this query fetches and merges language data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM language') YIELD row
MERGE (l:Language {language_id: row.language_id})
SET l.name = row.name, l.last_update = row.last_update;

this query fetches and merges customer data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM customer') YIELD row
MERGE (c:Customer {customer_id: row.customer_id})
SET c.store_id = row.store_id, c.first_name = row.first_name, c.last_name = row.last_name,
		c.email = row.email, c.address_id = row.address_id, c.active = row.active,
		c.create_date = row.create_date, c.last_update = row.last_update;

this query fetches and merges address data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM address') YIELD row
MERGE (a:Address {address_id: row.address_id})
SET a.address = row.address, a.address2 = row.address2, a.district = row.district,
		a.city_id = row.city_id, a.postal_code = row.postal_code, a.phone = row.phone,
		a.last_update = row.last_update;

this query fetches and merges city data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM city') YIELD row
MERGE (c:City {city_id: row.city_id})
SET c.city = row.city, c.country_id = row.country_id, c.last_update = row.last_update;

this query fetches and merges country data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM country') YIELD row
MERGE (c:Country {country_id: row.country_id})
SET c.country = row.country, c.last_update = row.last_update;

this query fetches and merges rental data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM rental') YIELD row
MERGE (r:Rental {rental_id: row.rental_id})
SET r.rental_date = row.rental_date, r.inventory_id = row.inventory_id, r.customer_id = row.customer_id,
		r.return_date = row.return_date, r.staff_id = row.staff_id, r.last_update = row.last_update;

this query fetches and merges inventory data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM inventory') YIELD row
MERGE (i:Inventory {inventory_id: row.inventory_id})
SET i.film_id = row.film_id, i.store_id = row.store_id, i.last_update = row.last_update;

this query fetches and merges payment data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM payment') YIELD row
MERGE (p:Payment {payment_id: row.payment_id})
SET p.customer_id = row.customer_id, p.staff_id = row.staff_id, p.rental_id = row.rental_id,
		p.amount = row.amount, p.payment_date = row.payment_date, p.last_update = row.last_update;

this query fetches and merges staff data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM staff') YIELD row
MERGE (s:Staff {staff_id: row.staff_id})
SET s.first_name = row.first_name, s.last_name = row.last_name, s.address_id = row.address_id,
		s.email = row.email, s.store_id = row.store_id, s.active = row.active, s.username = row.username,
		s.password = row.password, s.last_update = row.last_update;

this query fetches and merges store data

CALL apoc.load.jdbc('mysql', 'SELECT * FROM store') YIELD row
MERGE (s:Store {store_id: row.store_id})
SET s.manager_staff_id = row.manager_staff_id, s.address_id = row.address_id, s.last_update = row.last_update;

establish relationship inside neo4j sakila

this query establishes the relationship between films and languages

CALL apoc.load.jdbc('mysql', 'SELECT film_id, language_id FROM film') YIELD row
MATCH (f:Film {film_id: row.film_id})
MATCH (l:Language {language_id: row.language_id})
MERGE (f)-[:IN_LANGUAGE]->(l);

this query establishes the relationship between customers and addresses

CALL apoc.load.jdbc('mysql', 'SELECT customer_id, address_id FROM customer') YIELD row
MATCH (c:Customer {customer_id: row.customer_id})
MATCH (a:Address {address_id: row.address_id})
MERGE (c)-[:LIVES_AT]->(a);

this query establishes the relationship between addresses and cities

CALL apoc.load.jdbc('mysql', 'SELECT address_id, city_id FROM address') YIELD row
MATCH (a:Address {address_id: row.address_id})
MATCH (c:City {city_id: row.city_id})
MERGE (a)-[:IN_CITY]->(c);

this query establishes the relationship between cities and countries

CALL apoc.load.jdbc('mysql', 'SELECT city_id, country_id FROM city') YIELD row
MATCH (c:City {city_id: row.city_id})
MATCH (co:Country {country_id: row.country_id})
MERGE (c)-[:IN_COUNTRY]->(co);

this query establishes the relationship between rentals and customers

CALL apoc.load.jdbc('mysql', 'SELECT rental_id, customer_id FROM rental') YIELD row
MATCH (r:Rental {rental_id: row.rental_id})
MATCH (c:Customer {customer_id: row.customer_id})
MERGE (r)-[:RENTED_BY]->(c);

this query establishes the relationship between rentals and inventories

CALL apoc.load.jdbc('mysql', 'SELECT rental_id, inventory_id FROM rental') YIELD row
MATCH (r:Rental {rental_id: row.rental_id})
MATCH (i:Inventory {inventory_id: row.inventory_id})
MERGE (r)-[:CONTAINS]->(i);

this query establishes the relationship between payments and customers

CALL apoc.load.jdbc('mysql', 'SELECT payment_id, customer_id FROM payment') YIELD row
MATCH (p:Payment {payment_id: row.payment_id})
MATCH (c:Customer {customer_id: row.customer_id})
MERGE (p)-[:MADE_BY]->(c);

this query establishes the relationship between payments and rentals

CALL apoc.load.jdbc('mysql', 'SELECT payment_id, rental_id FROM payment') YIELD row
MATCH (p:Payment {payment_id: row.payment_id})
MATCH (r:Rental {rental_id: row.rental_id})
MERGE (p)-[:FOR_RENTAL]->(r);

this query establishes the relationship between staff and addresses

CALL apoc.load.jdbc('mysql', 'SELECT s.staff_id, a.address_id FROM staff s JOIN store st ON s.store_id = st.store_id JOIN address a ON st.address_id = a.address_id') YIELD row
MATCH (s:Staff {staff_id: row.staff_id})
MATCH (a:Address {address_id: row.address_id})
MERGE (s)-[:LIVES_AT]->(a);

this query establishes the relationship between stores and addresses

CALL apoc.load.jdbc('mysql', 'SELECT store_id, address_id FROM store') YIELD row
MATCH (s:Store {store_id: row.store_id})
MATCH (a:Address {address_id: row.address_id})
MERGE (s)-[:LOCATED_AT]->(a);

this query establishes the relationship between stores and managers

CALL apoc.load.jdbc('mysql', 'SELECT store_id, manager_staff_id FROM store') YIELD row
MATCH (s:Store {store_id: row.store_id})
MATCH (m:Staff {staff_id: row.manager_staff_id})
MERGE (s)-[:MANAGED_BY]->(m);

explore loaded sakila data in neo4j

purpose
- previous sections have shown how to
  - setup a neo4j and mysql containers using docker-compose
  - connect them using mysql.jar and apoc.jar
  - load sakila data into mysql
  - fetch sakila data from mysql into neo4j
- this section looks at how to explore the pulled sakila data inside neo4j
complete schema and data
- this query will render the complete schema and the loaded data along with their relationships in neo4j inside the graph window
- the query matches and returns all nodes and relationships in the neo4j database
- to fetch more than 300 records, update the values in the "Graph visualization" section of the "settings" tab inside neo4j browser sidebar

MATCH (n)-[r]->(m)
RETURN n, r, m
LIMIT 10 // modify the limit based on how many records you would like to fetch

count the number of films an actor has acted in

MATCH (a:Actor {actor_id: 1})-[:ACTED_IN]->(f:Film)
RETURN a.first_name, a.last_name, COUNT(f) AS number_of_films

distribution of films across categories

MATCH (f:Film)-[:BELONGS_TO]->(c:Category)
RETURN c.name AS category, COUNT(f) AS number_of_films
ORDER BY number_of_films DESC

connection between films and their languages

MATCH (f:Film)-[:IN_LANGUAGE]->(l:Language)
RETURN f.title, l.name AS language
LIMIT 10;

geographical distribution of stores, customers, and rentals

MATCH (s:Store)-[:LOCATED_AT]->(a:Address)-[:IN_CITY]->(c:City)-[:IN_COUNTRY]->(co:Country)
RETURN s.store_id, a.address, c.city, co.country

network of actors based on films they have acted in together

MATCH (a1:Actor)-[:ACTED_IN]->(f:Film)<-[:ACTED_IN]-(a2:Actor)
RETURN a1.first_name AS actor1_first_name, a1.last_name AS actor1_last_name, a2.first_name AS actor2_first_name, a2.last_name AS actor2_last_name, f.title AS film_title
LIMIT 10;

this query will display actors and the films they have acted in, highlighting the "ACTED_IN" relationship

MATCH (a:Actor)-[:ACTED_IN]->(f:Film)
RETURN a, f
LIMIT 100

this query shows the "BELONGS_TO" relationship between films and categories

MATCH (f:Film)-[:BELONGS_TO]->(c:Category)
RETURN f, c
LIMIT 100

this query highlights the "IN_LANGUAGE" relationship between films and languages

MATCH (f:Film)-[:IN_LANGUAGE]->(l:Language)
RETURN f, l
LIMIT 100

this query visualizes the "LOCATED_AT" relationship for stores, and their locations down to the country level

MATCH (s:Store)-[:LOCATED_AT]->(a:Address)-[:IN_CITY]->(c:City)-[:IN_COUNTRY]->(co:Country)
RETURN s, a, c, co
LIMIT 100

this query shows the network of actors who have co-starred in films, highlighting the connections between them through shared films

MATCH (a1:Actor)-[:ACTED_IN]->(f:Film)<-[:ACTED_IN]-(a2:Actor)
RETURN a1, f, a2
LIMIT 10

pending parts

user flow

this is the overall journey for the user
- the user will ask questions from streamlit frontend to the LLM
- the LLM reaches out to neo4j and formulates queries that will secure the necessary data
- neo4j answers with the desired answer and LLM will provide the answer to the frontend
see below the 3 major sections that are still pending to be developed
- streamlit frontend
- integrate LLM with LangChain
- implement query handling

streamlit frontend

streamlit app will act as a frontend for the GenAI application
it will connect the neo4j and LLM parts to give the ability to the user to
- create UI components to interact with the neo4j database
- integrate neo4j queries into the streamlit app to display data

integrate LLM with LangChain

use LangChain to connect the streamlit app with Llama (or OpenAI, Gemini)
configure the LLMGraphTransformer to process and respond to user queries
ensure secure API connections and handle authentication

implement query handling

allow users to input queries in natural language
use LLMs to interpret queries and convert them into cypher queries for neo4j
fetch the results from neo4j and display them in the streamlit app

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

setup

setup

readme.md

purpose

developer machine

pre-requisites - developer machine

python virtual environment

vscode devcontainer

streamlit - developer machine

install neo4j & mysql

setup configurations

start compose services

verify mysql setup

verify neo4j setup

verify mysql & neo4j connectivity

load mysql data and pull into neo4j

load sample mysql data

pull mysql data into neo4j

load sakila data into neo4j

establish relationship inside neo4j sakila

explore loaded sakila data in neo4j

pending parts

user flow

streamlit frontend

integrate LLM with LangChain

implement query handling

Files

setup

Directory actions

More options

Directory actions

More options

Latest commit

History

setup

Folders and files

parent directory

readme.md

purpose

developer machine

pre-requisites - developer machine

python virtual environment

vscode devcontainer

streamlit - developer machine

install neo4j & mysql

setup configurations

start compose services

verify mysql setup

verify neo4j setup

verify mysql & neo4j connectivity

load mysql data and pull into neo4j

load sample mysql data

pull mysql data into neo4j

load sakila data into neo4j

establish relationship inside neo4j sakila

explore loaded sakila data in neo4j

pending parts

user flow

streamlit frontend

integrate LLM with LangChain

implement query handling