Skip to content

An application to write and run SQL queries, returning answers to natural language questions, using langchain and open source LLM models through HuggingFace.

License

Notifications You must be signed in to change notification settings

BrettlyCD/text-to-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-to-SQL Copilot

Text-to-SQL Copilot is a tool to support users who see SQL databases as a barrier to actionable insights. Taking your natural language question as input, it uses a generative text model to write a SQL statement based on your data model. Then runs it on your database and analyses the results. And it does this all at no cost using HuggingFace Inference API.

copilot_demo

Setup

Dataset

This was built specifically off of the Spider dataset. Follw these steps to recreate:

  1. Download the data from this Google Drive
  2. Unzip the file
  3. Save the root 'spider' folder under the src/data/raw/ directory

Setup Process

This application pulls the schema information from the SQLite database files and utilizes a locally stored Chroma Vector database to identify which schema to use to answer questions. Run the following commands to compile the database info and build the vector database:

pip3 install -r requirements.txt
python3 setup.py

This will take about 10-15 minutes to fully run.

HuggingFace API Token

Currently, this project relies on the google flan-t5-xxl languauge model. It is accessed for free through the HuggingFace Inference API. In order to use this method, you need to create an API token and save in in a .env file in the root of the repo:

touch .env

Open the .env file and enter your HuggingFace API token:

env_example

Using SQL Copilot

Navigate to the src/app directory and start the program with the following command:

python3 main.py

Then input your question - happy SQL-ing!

Citation

Chase, H. (2022). LangChain [Computer software]. https://github.com/hwchase17/langchain

Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., ... & Radev, D. (2018). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.

About

An application to write and run SQL queries, returning answers to natural language questions, using langchain and open source LLM models through HuggingFace.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published