Chatbots aren’t new, but better chatbots are finally here thanks to AI services. Searching documentation is greatly enhanced with vector search instead of traditional keyword searches. And while training models on unstructured data may seem daunting, using RAG (Retrieval Augmented Generation) we can create a savvier chatbot that’s efficiently trained with new information.
The purpose of this solution is to show how to train a chatbot on a rich knowledge base of several types of documents, PDF files, and data stored in Database tables. This tutorial will show how to transform raw documents containing unstructured data into structured data, store them in OCI Object storage bucket, and utilize advanced AI models to generate contextual responses from natural language queries. Best of all, it’s highly modular so you can use a variety of models.
- Getting Access to Generative AI Agents
- Create an OCI Object Storage Bucket and Knowledge Repository.
- Create Knowledge Base
- Create Data Source
- Start Ingestion Job
- Create Generative AI Agent
- Training PDFs and Chat Interface
- How do we get data from the Oracle database, generate a PDF and train it?
- Uploading the PDF into OCI Object storage
- Training new documents in the knowledge repository
- How to upload .CSV file to object storage using dbms_cloud.export_data
- Creating Web User interface with Streamlit & Python
- Combining Internal Knowledge Base search with Oracle Generative AI LLM and Oracle Database
- Conclusion
You can get access to Generative AI Agents resources with OCI Identity and Access Management (IAM) policies.
By default, only users in the Administrators group have access to all OCI resources, including Generative AI Agents resources. If you're a member of another group, ask your administrator to assign you the least privileges that are required to perform your responsibilities by reviewing the following sections.
For this, we would need our tenancy to subscribe to Oracle Generative AI Agents (Beta). If you do not find it in the other cloud areas, please check the Chicago region.
Architecture
Architecture
Login to cloud.oracle.com and from the top navigation, Under the top left navigation, select Storage and Buckets
Click on Create Bucket
Provide Bucket Name Standard tier storage, let other options be default and click on Create.
Upload PDF files to this object storage bucket.
Under the Top Navigation Menu, select Analytics and AI > AI Services > Generative AI Agents (new Beta)
Click on Knowledge Bases and Create a Knowledge Base
Under Create knowledge base, provide name, select compartment and select Object storage. You can also choose OCI OpenSearch. However, we are covering Object storage in this article
Click on the create Data Source button
Provide Data source name, and type will be Object storage, Select the bucket name in the compartment, select all in the bucket or you can select only the required PDFs in the OCI Object storage bucket.
Check on automatically starting the ingestion job for the above data sources. Please note you can add multiple data sources here. Click on create button.
From the left navigation, click on Create agent
Provide the agent name, select the knowledge base, and provide a chat welcome message.
Click on the Create button
Now, our chat agent is ready and trained on internal PDF data from OCI object storage.
From the left navigation, click on the chat button, and now you are all set to ask questions or have a conversation with the chatbot.
I have trained my chatbot on the following PDFs Breast cancer facts & figures
and lets ask questions, we will be greeted by a welcome message
Chatbot: Hi user, I am your friend Ask AI how can i help you today?
User: what is Breast cancer?
User: How is breast cancer diagnosed?
Let us now check another PDF on COVID-19 Corona Virus FAQs
Disclaimer: I have provided the source to download these PDFs, and I don't own the content on any of these PDF files. They are just used for demonstration and training purposes only for this article.
User: What is coronavirus and COVID-19?
You can click on View citation, if you would like to know which PDF has been referred to answer this chat.
User: How to prevent covid-19 infection?
We can quickly spin up an Oracle APEX instance. or use your existing Oracle APEX instance
Create a Oracle APEX page and create a report using the example SQL query shown below.
select ID, CATEGORY, STATES_NAME_EN,
REGION_EN, NAME_EN, SHORT_DESCRIPTION_EN
from UNESCO_SITES where rownum < 100
You can download the UNESCO CSV file to create table from my GitHub Repo. (PDF file)
Run the Oracle APEX page with Interactive report as shown below.
Download the PDF report as shown, alternatively you write PL/SQL procedure to do the same that is to create a PDF and upload it an Object storage
The PDF will be as shown below
You can use Oracle APEX to create a connection to OCI and Upload PDF directly to OCI Object storage, this is not covered in this article.
Please refer to my LiveLabs on AI for Healthcare on how to upload PDF files into Object storage using PL/SQL procedure.
Upload our new PDFs into the Object storage bucket (of the Data source that has been selected)
Click on our Data source and create a new Ingestion Jobs
Provide the job name and click on create button.
So now we should be good with our chatbot training. with a new PDF and search again
User: describe Kakadu National Park
User: please tell me about Monasteries of Haghpat and Sanahin
user: where is Quebrada de Humachuaca
This looks great with all the search results scanning our repositories, which consist of scanned PDF files and data coming from Oracle Database tables exported using Oracle APEX.
Important Note (**reference)**
- PDF and txt files are the only supported bucket objects in Generative AI Agents.
- If your data is not ready, you can point the data source to empty folders in a bucket and later, populate the folders with data. After you populate the folders with data, you can ingest the data into the data source.
This is purely optional; you can skip this section as well.
What if you want to create a .CSV file from a SQL query and directly upload the file into OCI Object storage using PL/SQL?
For example, If we want to create a file with following SQL
select NAME_EN, SHORT_DESCRIPTION_EN, CATEGORY from DEMOUSER.UNESCO_SITES where rownum < 100
DBMS_CLOUD.EXPORT_DATA is a handy package to know and can be very helpful.
Login SQL Web Developer as an ADMIN user and Grant the following privileges, assuming owns the table from where you want to create .csv files
-- Login as ADMIN User
grant execute on DBMS_CLOUD to DEMOUSER;
grant execute on DBMS_CLOUD_AI to DEMOUSER;
Create Credential with OCI API key using DBMS_CLOUD.CREATE_CREDENTIAL
-- replace the values based on your OCI cloud tenancy and User settings
BEGIN
DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => '<credential-name>',
user_ocid => '<replace with your OCI user OCID>',
tenancy_ocid => '<replace with your OCI tenancy OCID>',
private_key => '<replace with your OCI private key>',
fingerprint => '<replace with your fingerprint>''
);
END;
/
From PL/SQL, create a .csv and directly upload the file to OCI Object storage using DBMS_CLOUD.EXPORT_DATA
-- Replace tenancy namespace, bucket name and file name as per your requirements
BEGIN
DBMS_CLOUD.EXPORT_DATA (
credential_name => '<credential-name>',
file_uri_list => 'https://objectstorage.<region-identifier>.oraclecloud.com/n/<tenancy-namespace>/b/<bucket-name>/o/<file-name>.csv',
format => '{"type":"CSV","delimiter":",","maxfilesize":536870912,"header":true,"compression":null,"escape":"true","quote":"\""}',
query => 'select NAME_EN, SHORT_DESCRIPTION_EN,CATEGORY from DEMOUSER.UNESCO_SITES where rownum < 100');
END;
/
So, this will not only create a .csv file but also will upload to OCI object storage bucket
Download source code zip from Oracle Generative AI Playground
and extract the genai_playground-main.zip file
Edit secrets.toml, it should be as shown below, please change according to your tenancy
-- secrets.toml file
endpoint = "https://agent-runtime.generativeai.us-chicago-1.oci.oraclecloud.com"
agent_endpoint_id = "ocid1.genaiagentendpoint.oc1.us-chicago-1.<your-agent-ocid>"
llm_endpoint = "https://inference.generativeai.us-chicago-1.oci.oraclecloud.com"
compartment_id = "<your-compartment-ocid>"
logo = "Oracle.png"
user_avatar = ":material/record_voice_over:"
assisstant_avatar = "o.png"
Upgrade PIP if required (I am running MacOS) with python3.10
[notice] A new release of pip is available: 24.1.1 -> 24.1.2
[notice] To update, run: pip install --upgrade pip
madhusudhanrao@MadhuMac genai_playground-main % pip install --upgrade pip
Run the requirements.txt
pip install -r requirements.txt
Install OCI Command Line Interface - OIC CLI (Official Guide)
You will also need to complete the OCI CLI configuration task, please refer this article if required
Run the application
madhusudhanrao@MadhuMac genai_playground-main % streamlit run Home.py
Open the network ports 8501, if you are running this on an external cloud server
sudo iptables -I INPUT 6 -m state --state NEW -p tcp --dport 8501 -j ACCEPT
Another example
Creating Dark Theme UI, Update Config.toml
[theme]
primaryColor="white" #ocean #2C5967
backgroundColor="black" #neutral 1 #F5F4F2
secondaryBackgroundColor="#DFDCD8" #neutral 2
textColor="white" #obark #312D2A
font="sans serif"
and run the server again
streamlit run Home.py