a typescript query builder for arangodb's ArangoSearch
See working demo at hp4k1h5.github.io.
- aqlQueryBuilder.js v0.1.9
query
, andcollection
types renamedQuery
andCollection
- bump deps
This package is tested against the following versions:
- arangojs: 8.1.0
- arangodb: stable 3.10.0
ArangoSearch provides a low-level API for interacting with Arango Search Views through the Arango Query Language (AQL). This library aims to provide a query parser and AQL query builder to enable full boolean search operations across all available Arango Search View capabilities, including, PHRASE
and TOKENS
operations. With minimal syntax overhead the user can generate multi-lingual and language-specific, complex phrase, (proximity... TBD) and tokenized search terms.
For example, passing a search phrase like:
some +words -not +"phrase search" -"not these" ?"could have"`
to buildAQL()
's query
parameter, will
produce a query like the following (see this example and
those found in tests/
:
FOR doc IN view
SEARCH
(PHRASE(doc.text, "phrase search", analyzer)) AND MIN_MATCH(
ANALYZER(
TOKENS("words", analyzer)
ALL IN doc.text, analyzer),
1) OR ((PHRASE(doc.text, "phrase search", analyzer)) AND MIN_MATCH(
ANALYZER(
TOKENS("words", analyzer)
ALL IN doc.text, analyzer),
1) AND (PHRASE(doc.text, "could have", analyzer)) OR MIN_MATCH(
ANALYZER(
TOKENS(other, analyzer)
ANY IN doc.text, analyzer),
1))
AND
NOT (PHRASE(doc.text, "not these", analyzer))
AND MIN_MATCH(
ANALYZER(
TOKENS("nor", analyzer)
NONE IN doc.text, analyzer),
1)
OPTIONS {"collections": ["col"]}
SORT TFIDF(doc) DESC
LIMIT 0, 1
RETURN doc`
See default query syntax and this schematic example for more details.
If multiple collections are passed, the query is replicated across all passed collections, see examples in 'tests/cols.ts'. In the future this will also accommodate multiple key searches.
- running generated AQL queries will require a running ArangoDB v3.8+ instance. This package is tested against arangodb v3.8.4 π₯
-
run
yarn add @hp4k1h5/aqlquerybuilder.js
ornpm install --save @hp4k1h5/aqlquerybuilder.js
in a directory containing apackage.json
file.
or
clone this repository in your node compatible project. And runyarn
from inside the directory. -
import/require the exported functions
// use either
import {buildAQL, parseQuery} from '@hp4k1h5/aqlquerybuilder.js'
// or
const {buildAQL} = require('@hp4k1h5/aqlquerybuilder.js')
This has been tested for
- β node v17.3.0
- β typescript v4.9.4
for up-to-date documentation, run yarn doc && serve docs/
from the project directory root.
AQLqueryBuilder aims to provide cross-collection and cross-language, multi-key boolean search capabilities to the library's user.
Please ensure that the data you are trying to query against is indexed by an ArangoSearch View. The query will target all combinations of provided collections, analyzers and keys an simultaneously. This allows for granular multi-language search. This library is primarily built for academic and textual searches, and is ideally suited for documents like books, articles, and other text-heavy media.
AQLqueryBuilder works best as a document query tool. Leveraging ArangoSearch's built-in language stemming analyzers allows for complex search phrases to be run against any number of language-specific collections simultaneously.
Example:
import {buildAQL} from '@hp4k1h5/aqlquerybuilder.js'
const queryObject = {
"view": "the_arango-search_view-name",
"collections": [{
"name": "collection_name",
"analyzer": "analyzer_name",
"keys": ["text"]
}],
"terms": "+'query string' -for +parseQuery ?to parse"
}
const limit = {start:0, count: 20}
const aqlQuery = buildAQL(queryObject, limit)
// ... const cursor = await db.query(aqlQuery)
// ... const cursor = await db.query(buildAQL(queryObject, {start:20, count:20})
Generate documentation with yarn doc && serve docs/
or see more examples in tests/search.ts
buildAQL
accepts two parameters: query
and limit
query
is an object with the following keys:
β’ view: string (required): the name of the ArangoSearch view the query will be run against
β’ collections (required): an array of objects of the indexed collections to query. Objects have the following shape:
{
"name": "collection-name",
"analyzer": "analyzer_name",
"keys": ["text", "summary", "notes"]
}
keys
are optional, though if key names are provided to query.key
, and not all those keys are indexed by the collection, it is advisable to explicitly list only those keys on documents in that collection that are indexed by the analyzer. If a collection is indexed by multiple analyzers please list each collection-analyzer combination along with their relevant keys, unless a unified set of keys is provided to query.key
.
β’ terms (required): either an array of term
interfaces or a query string
to be parsed by parseQuery()
.
- term (optional): a JSON object representing the search.
- val string (required): a search term or phrase. For PHRASE
searches, include one "quoted phrase" per val. For TOKENS you may combine
multiple terms under a single analyzer or use one
term
per token. - op string (required): boolean operator; one of
+ ? -
, representingAND OR NOT
respectively - type string (required): one of
"phr"
for PHRASES or"tok"
for TOKENS.
- val string (required): a search term or phrase. For PHRASE
searches, include one "quoted phrase" per val. For TOKENS you may combine
multiple terms under a single analyzer or use one
β’ key (optional | default: "text"): the name of the Arango document key to search within.
β’ filters (optional): a list of filter interfaces. See arango FILTER operations for more details. All Arango operators Filters have the following shape:
{
"field": "the name of the field, i.e. Arango document key to filter on",
"op": "one of: == != < > <= >= IN NOT IN LIKE NOT LIKE =~ !~ ",
"val": "the "
}
an object with the following keys:
start
(integer) 0-based result pagination lower bound.count
(integer) total number of results to return.
to bring back up to the first 10 results
{"start":0, "count":10}
and the next page would be
{"start":10, "count":10}
{
"view": "the_arango-search_view-name",
"collections": [
{
"name":"collection_name",
"analyzer": "analyzer_name",
"keys": ["text", "summary", "notes"]
},
{
"name":"collection_es",
"analyzer": "analyzer_es",
"keys": ["texto", "resumen", "notas"]
}
],
"terms": "either a +query ?\"string for parseQuery to parse\" texto en espaΓ±ol",
"terms": [
{"type": "phr", "op": "?", "val": "\"!!! OR a list of query objects\""},
{"type": "phr", "op": "?", "val": "optional phrases"},
{"type": "tok", "op": "-", "val": "excluded tokens"},
{"type": "tok", "op": "+", "val": "mandatory tokens"}
],
"filters": [
{
"field": "field_name",
"op": ">",
"val": 0
}
]
}
Boolean operators form the basis of mathematical sets and database logic. They connect your search words together to either narrow or broaden your set of results. The three basic boolean operators are: AND, OR, and NOT.
from mit's Database Search Tips
- Mandatory terms and phrases. All results MUST INCLUDE these terms and phrases.
- Optional terms and phrases. If there are ANDS or NOTS, these serve as match score "boosters". If there are no ANDS or NOTS, ORS become required in results.
- Search results MUST NOT INCLUDE these terms and phrases. If a result that would otherwise have matched, contains one or more terms or phrases, it will not be included in the result set. If there are no required or optional terms, all results that do NOT match these terms will be returned.
for more information on boolean search logic see above
The default syntax accepted by buildAQL()
's query paramter key terms
when passing in a string, instead of a term
interface compatible array is as follows:
- Everything inside single or double quotes is considered a
PHRASE
- Everything else is considered a word to be analyzed by
TOKENS
- Every individual search word and quoted phrase may be optionally prefixed
by one of the following symbols
+ ? -
, or the plus-sign, the question-mark, and the minus-sign. If a word has no operator prefix, it is considered optional and is counted as anOR
.
Please see tests/parse.ts for more examples.
input one +two -"buckle my shoe"
and parseQuery()
will interpret that query string as follows:
ANDS | ORS | NOTS | |
---|---|---|---|
PHRASE | "buckle my shoe" | ||
TOKENS | two | one |
The generated AQL query, when run, will bring back only results that contain "two", that do not contain the phrase "buckle my shoe", and that optionally contain "one". In this case, documents that contain "one" will be likely to score higher than those that do not.
When the above phrase one +two -"buckle my shoe"
is run against the following documents:
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
β Document A β β Document B β β Document C β
β ---------- β β ---------- β β ---------- β
β β β three four β β one β
β one two β β β β β
β β β and two β β β
β buckle my shoeβ β β β β
ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββ
only Document B is returned;
Document A is excluded by the phrase "buckle my shoe"
Document C does not contain the mandatory word "two"
please see bugs
please see CONTRIBUTING