-
Notifications
You must be signed in to change notification settings - Fork 34
Project Proposal
This project is going to build a natural language interface for relational databases, following the ideas of Fei Li (2014) Constructing an Interactive Natural Language Interface for Relational Databases. Here we only focus on how to select data (no insert, add, drop).
This project is of course meaningful. Since SQL queries are not easy for everyone to learn, this interface will be an excellent tool for everyone to query data from relational databases. People have been working on this for decades, but it is really more about the understanding of natural language rather than the generating of SQL queries. Because SQL queries have well defined grammars and all the tricky ambiguities lie in natural languages. Thankfully, nowadays we have high accuracy dependency syntax parser to make NLIDB really feasible. In this course project, we are both intereseted in natural language processing and its application in database application.
We intend to closely follow the paper by Fei Li. The main steps of translating a natural language to an SQL query are as follows:
- Parse the natural language using a dependency parser (Stanford NLP Parser).
- Map the parse tree nodes to SQL keywords and table and attribute names.
- Adjust the structure of the parse tree to make it follow the structure of an SQL query. The result of this procedure is then called a query tree.
- Translate the query tree to an SQL query.
- During step 2 and 3 there is interactive communication with the user to let the user choose the desired mappings or structures out of ranked choices.
Steps 2 - 4 are the most complicated parts, because we need to code up the grammatical rules that the parse trees should follow to get translated to SQL queries, which requires deep understanding of the SQL language. Each member of our group will focus on one step among steps 2 - 4. Also there will be a user interface in our system coded with Java.
We plan to use Microsoft Academic Search Database for testing as in Fei Li 2014, but PostgreSQL as our database (since we've learnt to use it in CS516).
Fei Li 2014 has already achieved satisfactory results, with higher rate of success than present Microsoft Academic Search website. While we proceed, we might be able to make improvements based on Fei Li's paper. For now, the possible improvements we can think of are (though we probably don't have the time to implement them all before course ends):
- Make the grammar rules configurable. (Use a text file to define the rules, instead of hard coding them.)
- Make the system trainable with paired up natural language and SQL query data.
Possible obstables we might encounter are:
- The system is really complicated, and except for the avaible Standford NLP Parser, we need to code the user interface, parser tree node mapper, parser tree structure adjuster and the query tree translator by ourselves.
- We are not very familiar with NLP, and have no experience in translating natural language into SQL language, so we should study related knowledge by ourselves.