You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Graph-based storage system developed by SeMI Technologies. Supports semantic search and automatic classification.
Supports graph data structure for complex queries and relationships. Strong natural language understanding (NLU) capabilities for exceptional semantic search performance.
Knowledge graphs, semantic search.
2019
Mature with several production-level customers
Comprehensive documentation, good community support
High QPS support depending on configuration and optimization
8/10
Yes
Yes
Pinecone
Highly scalable vector search service provided by Pinecone Systems Inc., focused on supporting vector and metadata search.
Highly scalable service design, sub-second level vector search, especially suitable for applications requiring precise similarity matching and ranking.
Recommendation systems, personalized search.
2021
Emerging and rapidly developing
User-friendly, clear documentation
Designed for high performance, strong scalability
9/10
Yes
No
Pgvector on PostgreSQL
PostgreSQL extension supported by Supabase, allows vector search on the widely used database system.
As an extension of PostgreSQL, enables existing PostgreSQL users to easily integrate vector search, seamlessly combining relational data and vector search.
Integration into existing database applications.
2021
Newer, for PostgreSQL users
Integrates with PostgreSQL, moderate documentation
Performance dependent on PostgreSQL configuration
6/10
Yes (via cloud-provided PostgreSQL services)
Yes
Milvus
Open-source vector database developed by Zilliz, scalable, supports massive vector searches.
Open-source and scalable, specially optimized for real-time vector search of massive data, supports efficient retrieval of hundreds of millions of vectors, suitable for bioinformatics, image retrieval, etc.
Bioinformatics, image retrieval.
2019
Mature, active community
Easy to learn, rich documentation
High performance, optimized for real-time search of massive data
9/10
Yes
Yes
Qdrant
Open-source database supporting vector and metadata filtering, developed by the Qdrant team.
Offers filtering queries for vectors and metadata, supports complex condition combinations, widely applied in multimodal search and recommendation systems.
Multimodal search, recommendation systems.
2020
Newer but rapidly developing
Friendly API, documentation in progress
Performance optimized, supports high concurrency queries
8/10
Possible (via Kubernetes, etc.)
Yes
Chroma
Vector database focused on image search, specific development team information not detailed.
Focuses on vectorization and search of image content, optimized for image similarity search performance, suitable for visual content retrieval and management systems.
Image retrieval, visual search.
Not detailed
Information not detailed
Information not detailed
Optimization for image search might improve QPS in specific scenarios
Efficient combination of time-series data analysis and vector search.
Financial services, real-time monitoring.
Not detailed
KDB+ is a long-standing stable product
Friendly for KDB+ users, detailed documentation
Excellent performance in time-series data
8/10
Possible (needs self-configured cloud deployment)
No (base KDB+)
Elasticsearch
Full-text search engine, supports basic vector search, developed by Elastic N.V.
Mature search engine based on Lucene, provides powerful full-text search and data analysis capabilities, widely used in log analysis, full-text search, etc.
Log analysis, full-text search.
2010
Very mature, widely used in production
Very comprehensive documentation, strong community support
Mature and stable, but may not be as effective in pure vector search as specialized vector databases
7/10
Yes
Yes
Hyperspace
Index system extension for Apache Spark, developed by Microsoft.
Provides indexing for Apache Spark, significantly improving Spark's data processing and query performance, suitable for big data analysis and processing.
Accelerates Spark queries, performance gains depend on specific use cases
7/10
Possible (needs self-configured cloud deployment)
Yes
Datastax Astra/Apache Cassandra
Distributed NoSQL database, Apache Cassandra supported by Apache Software Foundation, DataStax Astra is a cloud service based on Cassandra.
High availability and scalability, suitable for scenarios requiring large-scale data distributed storage and access, like online applications, big data platforms, etc.
Large-scale data management, real-time applications.
2008 (Cassandra)
Very mature, widely used
Learning curve for new users, comprehensive documentation
High scalability and fault tolerance
7/10
Yes
Yes (Apache Cassandra)
Vectara
Serverless, AI-driven search service provided by Vectara.
Provides a serverless AI-driven search service, simplifying deployment and management of search systems, making it convenient in enterprise search and content discovery.
Enterprise search, content discovery.
2021
Emerging, with growth potential
User-friendly interface, documentation in progress
Serverless architecture may provide high performance
8/10
Yes
?
Cratedb
Distributed SQL database, supports spatio-temporal data, developed by Crate.io.
Optimized for IoT and big data real-time analytics processing, supports SQL queries, suitable for applications needing fast data analysis and processing.
IoT, real-time analysis.
2014
Mature, focused on real-time analytics
Easy to learn, comprehensive documentation
Optimized for real-time analysis of IoT and big data
7/10
Yes
Yes
TimescaleDB
A PostgreSQL extension for time-series data, developed by TimescaleDB, Inc.
Extends PostgreSQL to optimize the processing of time-series data, offers efficient data compression and partitioning, suitable for monitoring, IoT, etc.
Time series analysis, monitoring.
2017
Relatively mature, focused on time-series data
PostgreSQL compatible, easy to migrate and learn
Simplifies the management and analysis of time-series data
10-Jul
Yes (via cloud providers)
Yes
Deep Lake
A data lake launched by Databricks, optimized for AI workloads, based on Apache Spark and Delta Lake technologies.
Designed for large-scale data sets, optimized for AI and machine learning workloads, supports complex data analysis and processing.
Large-scale real-time computing engine developed by Yahoo, supports dynamic data's real-time indexing and querying.
Supports large-scale data's real-time indexing and querying, and instant deployment of machine learning models, suitable for dynamic content recommendation and advertising systems.
Learning curve present but comprehensive documentation
Designed for real-time processing of large-scale data
10-Sep
Yes (self-configuration required)
Yes
Faiss
A high-efficiency similarity search library developed by Facebook AI Research, designed for deep learning applications.
Developed by Facebook AI Research, optimized for efficient similarity search in large datasets, especially suitable for deep learning applications.
Similarity item retrieval, deep learning.
2017
Endorsed by Facebook AI Research, good stability
Detailed API, supports multiple languages
Excellent performance in similarity search libraries
10-Sep
Possible (self-configured cloud deployment required)
Yes
Marqo
An open-source vector database for tensor search, developed by Marqo.
Simplifies the indexing and searching process of deep learning models, supports searching of various data types, making AI model and data retrieval more accessible and efficient.
Semantic search, document retrieval.
2022
Newer, growing community
Documentation improving, friendly API
Optimized for deep learning model search
10-Aug
Yes (can be deployed in cloud environments)
Yes
Google Vector Search
Vector search service provided by Google Cloud, utilizing Google's AI technology.
Utilizes Google Cloud infrastructure to provide high-performance vector search services, integrating advanced AI features, suitable for scenarios requiring large-scale, high-performance search solutions.
Integrated with Google Cloud services, comprehensive documentation
Utilizes Google infrastructure for excellent performance
10-Sep
Yes
No
Neo4jvector
Vector search extension for Neo4j, developed by Neo4j.
Combines the capabilities of Neo4j graph database with vector search functionality, enabling the use of vector similarity in graph queries, suitable for applications requiring both graph data and vector search.
Vector search in graph databases
Not detailed
Emerging, focused on graph database extension
Requires understanding of Neo4j, documentation improving
Combines the advantages of graph databases and vector search, performance rating depends on specific use cases
10-Aug
Possible (self-configured cloud deployment required)
Yes (base Neo4j)
Azure Cognitive Search
Search service on the Microsoft Azure platform
Integrates various Azure AI tools, offering rich data processing and search capabilities, suitable for scenarios requiring intelligent search solutions.
AI-enhanced search
Not detailed
Mature, Microsoft Azure product
Easy to use, integrates with Azure services
Integrates various Azure AI tools, enhancing search intelligence, excellent performance
10-Aug
Yes
No
Baidu Vector DB
Developed by Baidu, designed for AI applications
A high-performance vector retrieval system designed for AI and machine learning applications, supports efficient processing of large-scale data, suitable for AI application scenarios.
Vector retrieval
Not detailed
Baidu background, high stability expected
Requires adaptation to Baidu ecosystem, documentation primarily in Chinese
High performance, suitable for large-scale AI scenarios, performance depends on specific use cases
10-Aug
Yes
No
Redis
Open-source, supports multiple data structures, maintained by Redis Labs
Offers multiple data structure support and high-speed caching, suitable as a real-time application's data storage and processing backend.
High-performance key-value storage
2009
Very mature, widely used
Easy to use, rich documentation
Fast read/write, supports a rich set of data types, exceptionally high performance
10-Sep
Yes (via cloud providers)
Yes
DynamoDB
Fully managed database service provided by Amazon Web Services
Offers a highly scalable and fully managed NoSQL database solution, suitable for applications requiring high availability and large-scale data processing.
NoSQL database service
2012
Very mature, widely used
AWS integration, comprehensive documentation
High scalability, supports key-value and document data, high performance
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Sharing the Research about VectorData
1. 65 Types of Vector Databases on the Market
Currently, not all information has been filled in. We welcome your help to provide more details.
Download or Edit Online: (https://docs.google.com/spreadsheets/d/18iQl6hUkXny7daCj3xO7u_OmWW2yZJtg/edit?usp=sharing&ouid=105396890618967809956&rtpof=true&sd=true)
2. Research Summary on Top 20 Vector Database
Download or Edit Online (中文/English) : https://docs.google.com/spreadsheets/d/12Q9LRGrlLuyZC52m9nxNghT58Jdyzigv/edit?usp=sharing&ouid=105396890618967809956&rtpof=true&sd=true
<style> </style>Reference:
https://lakefs.io/blog/12-vector-databases-2023/
https://archive.is/cPx8g
https://superlinked.com/vector-db-comparison/
https://www.vecdbs.com/
https://milvus.io/docs/index.md
https://milvus.io/docs/v2.2.x/install-pymilvus.md
Beta Was this translation helpful? Give feedback.
All reactions