This repository contains the code to enable semantic search on the Voxel51 documentation from Python or the command line. The search is powered by FiftyOne, OpenAI's text-embedding-ada-002 model, and Qdrant vector search.
- 2021-06-14: The
fiftyone-docs-search
package has been updated in the following ways:- FiftyOne Documentation embeddings have been updated to FiftyOne 0.21.0.
- Splitting of documents is simplified and more robust. LangChain splitters are used in conjunction with our custom Markdown parsing.
- The
block_type
argument has been removed to make search results more robust.
- Clone the repository:
git clone https://github.com/voxel51/fiftyone-docs-search
cd fiftyone-docs-search
- Install the package:
pip install -e .
- Register your OpenAI API key (create one):
export OPENAI_API_KEY=XXXXXXXX
- Launch a Qdrant server:
docker pull qdrant/qdrant
docker run -d -p 6333:6333 qdrant/qdrant
The fiftyone-docs-search
package provides a command line interface for
searching the Voxel51 documentation. To use it, run:
fiftyone-docs-search query <query>
where <query>
is the search query. For example:
fiftyone-docs-search query "how to load a dataset"
The following flags can give you control over the search behavior:
--num_results
: the number of results returned--open_url
: whether to open the top result in your browser--score
: whether to return the score of each result--doc_types
: the types of docs to search over (e.g., "tutorials", "api", "guides")
You can also use the --help
flag to see all available options:
fiftyone-docs-search --help
If you find fiftyone-docs-search query
cumbersome, you can alias the command, by adding the following to your ~/.bashrc
or ~/.zshrc
file:
alias fosearch='fiftyone-docs-search query'
The fiftyone-docs-search
package also provides a Python API for searching the
Voxel51 documentation. To use it, run:
from fiftyone.docs_search import FiftyOneDocsSearch
fods = FiftyOneDocsSearch()
results = fods("how to load a dataset")
You can set defaults for the search behavior by passing arguments to the constructor:
fods = FiftyOneDocsSearch(
num_results=5,
open_url=True,
score=True,
doc_types=["tutorials", "api", "guides"],
)
For any individual search, you can override these defaults by passing arguments.
The fiftyone-docs-search
package is versioned to match the version of the
Voxel51 FiftyOne documentation that it is searching. For example, the v0.20.1
version of the fiftyone-docs-search
package is designed to search the
v0.20.1
version of the Voxel51 FiftyOne documentation.
By default, if you do not have a Qdrant collection instantiated yet, when you
run a search, the fiftyone-docs-search
package will automatically download
a JSON file containing a vector indexing of the latest version of the Voxel51
FiftyOne documentation.
If you would like, you can also build the index yourself from a local copy of the Voxel51 FiftyOne documentation. To do so, first clone the FiftyOne repo if you haven't already:
git clone https://github.com/voxel51/fiftyone
and install FiftyOne, as described in the detailed installation instructions here.
Build a local version of the docs by running:
bash docs/generate_docs.bash
Then, set a FIFTYONE_DIR
environment variable to the path to the local
FiftyOne repo. For example, if you cloned the repo to ~/fiftyone
, you would
run:
export FIFTYONE_DIR=~/fiftyone
Finally, run the following command to build the index:
fiftyone-docs-search create
If you would like to save the Qdrant index to JSON, you can run:
fiftyone-docs-search save -o <path to JSON file>
Contributions are welcome!
If you've made it this far, we'd greatly appreciate if you'd take a moment to check out FiftyOne and give us a star!
FiftyOne is an open source library for building high-quality datasets and computer vision models. It's the engine that powers this project.
Thanks for visiting! 😊
If you want join a fast-growing community of engineers, researchers, and practitioners who love computer vision, join the FiftyOne Slack community! 🚀🚀🚀