Text-to-SQL Copilot

Text-to-SQL Copilot is a tool to support users who see SQL databases as a barrier to actionable insights. Taking your natural language question as input, it uses a generative text model to write a SQL statement based on your data model. Then runs it on your database and analyses the results. And it does this all at no cost using HuggingFace Inference API.

Setup

Dataset

This was built specifically off of the Spider dataset. Follw these steps to recreate:

Download the data from this Google Drive
Unzip the file
Save the root 'spider' folder under the src/data/raw/ directory

Setup Process

This application pulls the schema information from the SQLite database files and utilizes a locally stored Chroma Vector database to identify which schema to use to answer questions. Run the following commands to compile the database info and build the vector database:

pip3 install -r requirements.txt

python3 setup.py

This will take about 10-15 minutes to fully run.

HuggingFace API Token

Currently, this project relies on the google flan-t5-xxl languauge model. It is accessed for free through the HuggingFace Inference API. In order to use this method, you need to create an API token and save in in a .env file in the root of the repo:

touch .env

Open the .env file and enter your HuggingFace API token:

Using SQL Copilot

Navigate to the src/app directory and start the program with the following command:

python3 main.py

Then input your question - happy SQL-ing!

Citation

Chase, H. (2022). LangChain [Computer software]. https://github.com/hwchase17/langchain

Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., ... & Radev, D. (2018). Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-to-SQL Copilot

Setup

Dataset

Setup Process

HuggingFace API Token

Using SQL Copilot

Citation

About

Releases

Packages

Languages

License

BrettlyCD/text-to-sql

Folders and files

Latest commit

History

Repository files navigation

Text-to-SQL Copilot

Setup

Dataset

Setup Process

HuggingFace API Token

Using SQL Copilot

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages