Do I need a vector database for RAG?

Not necessarily. For small knowledge bases (under 100K chunks), pgvector or even in-memory search works fine. Dedicated vector databases become necessary at larger scale.

Which vector database is fastest?

Depends on your workload. Qdrant and Weaviate are among the fastest open-source options. Pinecone is fast with zero management overhead. Benchmark with your specific data and query patterns.

How much does a vector database cost?

pgvector is free (uses your existing Postgres). Managed services range from free tiers to hundreds/month depending on vector count and query volume. Self-hosted open-source costs are infrastructure only.

Vector Databases Explained Without the Hype

Vector Databases Explained Without the Hype covers essential concepts for understanding modern AI development
Practical implementation requires attention to infrastructure, data quality, and evaluation
The technology is evolving rapidly, making continuous learning essential
Start with small, well-defined use cases before scaling to production
Combining multiple approaches often yields better results than any single technique

What Is a Vector Database and Why Do You Need One?

A vector database is a specialized data store designed to handle vector embeddings — mathematical representations of data in high-dimensional space. Unlike traditional databases that search for exact matches or range queries, vector databases excel at similarity search: finding items that are conceptually close to a query, even when they share no exact keywords.

Vector databases have become essential infrastructure for AI applications. Every time you use semantic search, ask a chatbot a question, or get a recommendation, there’s a good chance a vector database is doing the heavy lifting behind the scenes.

How Do Vector Databases Store and Index Vectors?

The core challenge in vector databases is the “curse of dimensionality.” As the number of dimensions increases, traditional indexing methods break down. A vector embedding for text might have 768 or 1536 dimensions — far too many for conventional B-tree or hash indexes.

Vector databases solve this with Approximate Nearest Neighbor (ANN) algorithms. The most popular approaches include:

HNSW (Hierarchical Navigable Small World): Builds a multi-layer graph structure that enables logarithmic-time search. It offers excellent recall but uses significant memory.
IVF (Inverted File Index): Clusters vectors into groups, then only searches the most relevant clusters. More memory-efficient than HNSW but potentially slower.
PQ (Product Quantization): Compresses vectors by splitting them into sub-vectors and quantizing each one. Dramatically reduces memory usage at the cost of some accuracy.

Each algorithm makes different trade-offs between speed, memory, and accuracy. Most production vector databases support multiple index types so you can choose based on your workload.

What’s the Difference Between a Vector Database and a Vector Index Library?

This is a common point of confusion. Libraries like FAISS (Facebook AI Similarity Search) provide low-level vector indexing operations. Vector databases like Pinecone, Weaviate, Qdrant, and Chroma build on these libraries to add full database features: CRUD operations, filtering, metadata storage, replication, and backup.

For production applications, a vector database is almost always the right choice. It handles the operational complexity of keeping indexes fresh, managing concurrent queries, and scaling across multiple machines. Vector index libraries are better suited for batch processing and research workloads.

How Are Vector Databases Used in Production?

The most common production use case is RAG (Retrieval-Augmented Generation) pipelines. When a user asks a question, the system:

Embeds the question into a vector
Searches the vector database for similar document chunks
Retrieves the top-k results
Passes them to an LLM as context

Beyond RAG, vector databases power recommendation engines (“find products similar to this one”), image search (“find visually similar images”), anomaly detection (“find data points that don’t fit”), and drug discovery (“find molecular structures similar to known compounds”).

The ecosystem has matured rapidly. Open-source options like Chroma and Qdrant offer local-first development. Managed services like Pinecone and Weaviate Cloud handle infrastructure at scale. The right choice depends on your throughput requirements, latency targets, and operational capabilities.

Vector databases have become one of the most hyped infrastructure categories in AI, with dozens of startups and established companies competing for attention. But beneath the marketing, the core concept is straightforward, and understanding it helps you make better decisions about whether and how to use one.

How Do Vector Databases Actually Work Under the Hood?

The fundamental operation in a vector database is approximate nearest neighbor (ANN) search. When you store a vector embedding — a list of hundreds or thousands of floating-point numbers representing the semantic meaning of a piece of content — the database indexes these vectors so that it can quickly find the ones most similar to a query vector. The key word is approximate: exact nearest neighbor search is computationally infeasible at scale, so every vector database trades a small amount of accuracy for massive performance gains.

The most common indexing algorithm is HNSW (Hierarchical Navigable Small World graphs). HNSW builds a multi-layered graph structure where each layer is a sparser approximation of the layer below it. Search starts at the top layer, which covers the entire dataset with coarse granularity, and progressively descends to lower layers for finer-grained search around promising regions. This approach achieves logarithmic search complexity — going from scanning millions of vectors brute-force to finding the nearest neighbors in milliseconds.

Other indexing approaches include IVF (Inverted File Index), which partitions the vector space into clusters and only searches the most relevant clusters, and product quantization, which compresses vectors to reduce memory usage. The choice of algorithm depends on your specific trade-offs: HNSW offers the best search speed at the cost of higher memory usage, IVF is more memory-efficient but slower, and product quantization works well for extremely large datasets where memory is the primary constraint.

When Should You Choose a Dedicated Vector Database Over pgvector?

The most practical question for most teams is not which vector database vendor to choose but whether they need a dedicated vector database at all. pgvector, the vector similarity search extension for PostgreSQL, handles most workloads up to about one million vectors with perfectly acceptable performance. It integrates directly with your existing database, eliminating the operational overhead of running a separate system.

You should consider a dedicated vector database — options like Qdrant, Weaviate, Pinecone, or Milvus — when your use case hits specific thresholds. The most common triggers are: vector count exceeding one million, requiring sub-10 millisecond query latency at high throughput, needing advanced filtering capabilities that combine vector similarity with metadata filters, or deploying at a scale where the PostgreSQL query planner cannot optimize vector searches effectively.

How Should Teams Evaluate Vector Database Options?

Evaluation should start with your workload profile, not vendor features. How many vectors do you have now and how many will you have in six months? What latency requirements does your application need? Do you need hybrid search combining vector similarity with keyword matching? What is your budget for infrastructure?

Run your actual data through proof-of-concept tests rather than relying on published benchmarks. ANN benchmark results are useful for understanding algorithmic trade-offs but rarely reflect your specific data distribution, query patterns, and latency requirements. The vector database that performs best on benchmarks may not be the right choice for your workload.

For most teams starting with vector search, the recommended path is: start with pgvector for simplicity and low operational overhead. Monitor performance as your data grows. When you hit concrete performance bottlenecks, evaluate dedicated solutions against your specific workload. Most teams will not outgrow pgvector until they reach millions of vectors.

How Do Vector Databases Actually Work Under the Hood?

When Should You Choose a Dedicated Vector Database Over pgvector?

How Should Teams Evaluate Vector Database Options?

What Vector Databases Actually Do

A vector database stores and efficiently searches high-dimensional vectors. That is it. The vectors themselves are typically embeddings — numerical representations of text, images, or other data produced by machine learning models. The database job is to find which stored vectors are most similar to a query vector.

Traditional databases excel at exact matches: find the row where id equals 42, or where name equals “Smith.” Vector databases excel at similarity matches: find the ten vectors most similar to this query vector, where similarity is measured by cosine distance, dot product, or Euclidean distance in high-dimensional space.

Why They Exist

The naive approach to similarity search — comparing your query vector against every stored vector — works fine for small datasets. With a thousand vectors of 1536 dimensions, brute-force search takes milliseconds. But with a million vectors, it takes seconds. With a billion vectors, it is impractical.

Vector databases use approximate nearest neighbor (ANN) algorithms to make similarity search fast at scale. They trade a small amount of accuracy (you might miss the absolute nearest neighbor occasionally) for dramatic speed improvements (milliseconds instead of seconds for million-vector datasets).

The two dominant ANN algorithms are:

HNSW (Hierarchical Navigable Small World): Builds a graph structure where similar vectors are connected. Search navigates this graph from a random entry point toward the query vector. Fast queries, higher memory usage, good for datasets that fit in RAM.

IVF (Inverted File Index): Clusters vectors into groups and only searches the most relevant clusters for each query. Lower memory usage, slightly slower queries, better for very large datasets.

When You Need One

You need a vector database when:

You are building RAG and need to retrieve relevant documents from a large knowledge base
You are building semantic search (finding similar items by meaning rather than keywords)
You are building recommendation systems based on content similarity
You have more than roughly 100,000 vectors and need sub-second query times

You probably do not need a dedicated vector database when:

Your dataset is small (under 100K vectors) — pgvector or even brute-force search works fine
You only need keyword search — traditional full-text search is simpler and often better
You are prototyping — start with the simplest option and add complexity when scale demands it

The Options

pgvector (PostgreSQL Extension)

Adds vector similarity search to your existing PostgreSQL database. No new infrastructure, familiar SQL interface, supports both exact and approximate search.

Best for: teams already using PostgreSQL, datasets under a few million vectors, applications where you need to combine vector search with traditional SQL queries (filtering by metadata, joining with other tables).

Limitations: performance degrades at very large scale compared to purpose-built solutions, limited to what PostgreSQL can handle in terms of concurrent queries.

Pinecone (Managed Service)

Fully managed vector database with a simple API. No infrastructure to manage, automatic scaling, built-in filtering and metadata support.

Best for: teams that want zero infrastructure management, applications that need to scale without engineering effort, rapid prototyping that might grow to production.

Limitations: vendor lock-in, cost can be high at scale, less control over performance tuning.

Weaviate (Open Source)

Feature-rich open-source vector database with built-in vectorization, hybrid search (vector + keyword), and a GraphQL API.

Best for: teams that want open-source with rich features, applications needing hybrid search, teams comfortable managing their own infrastructure.

Limitations: more complex to operate than managed services, resource-intensive for large deployments.

Qdrant (Open Source)

High-performance open-source vector database written in Rust. Focuses on speed and efficiency with a clean REST API.

Best for: performance-sensitive applications, teams that want open-source with excellent performance characteristics, deployments where resource efficiency matters.

Limitations: smaller ecosystem than some competitors, fewer built-in integrations.

Chroma (Open Source)

Lightweight, developer-friendly vector database designed for AI applications. Runs in-process or as a server.

Best for: local development and prototyping, small to medium datasets, developers who want the simplest possible setup.

Limitations: not designed for very large scale, fewer production features than mature alternatives.

Choosing the Right Option

The decision tree is simpler than vendors want you to believe:

Already using PostgreSQL and dataset under 5M vectors? Use pgvector.
Want zero infrastructure management and budget allows? Use Pinecone.
Need open-source with rich features? Use Weaviate or Qdrant.
Just prototyping? Use Chroma or pgvector.
Need maximum performance at massive scale? Evaluate Qdrant, Weaviate, and Milvus with your specific workload.

Performance Considerations

The factors that most affect vector database performance:

Dimensionality: Higher-dimensional vectors (1536 vs 384) require more storage and slower search. Use the minimum dimensions that maintain quality for your use case.

Index type: HNSW is faster for queries but uses more memory. IVF uses less memory but queries are slightly slower. Choose based on your memory budget and latency requirements.

Filtering: If you need to filter results by metadata (only search documents from a specific category), the database ability to combine vector search with metadata filtering matters enormously. Some databases filter before search (fast but might miss results), others filter after (accurate but slower).

Update frequency: If your data changes frequently, consider how the database handles index updates. Some require periodic re-indexing, others handle real-time updates efficiently.

Vector databases enable fast similarity search over high-dimensional embeddings using ANN algorithms
pgvector is sufficient for most teams starting out — no new infrastructure needed
You probably do not need a dedicated vector database until you exceed 100K-1M vectors
Choose based on scale, infrastructure preferences, and whether you need managed vs self-hosted
Dimensionality, index type, and filtering strategy are the key performance levers

The vector database market is noisy, but the underlying technology is well-understood. Start with the simplest option that meets your current needs, and migrate to something more specialized only when you have concrete evidence that your current solution is insufficient.

Vector databases use approximate nearest neighbor search (ANN), not exact matching — a speed vs accuracy trade-off
HNSW offers the best search speed; IVF is more memory-efficient; product quantization works for massive datasets
Start with pgvector for simplicity — most teams won’t need a dedicated database until >1M vectors
Evaluate with your actual data, not published benchmarks
Performance bottlenecks (latency, throughput, filtering complexity) drive the migration decision

Vector Databases Explained Without the Hype

What Is a Vector Database and Why Do You Need One?

How Do Vector Databases Store and Index Vectors?

What’s the Difference Between a Vector Database and a Vector Index Library?

How Are Vector Databases Used in Production?

How Do Vector Databases Actually Work Under the Hood?

When Should You Choose a Dedicated Vector Database Over pgvector?

How Should Teams Evaluate Vector Database Options?

How Do Vector Databases Actually Work Under the Hood?

When Should You Choose a Dedicated Vector Database Over pgvector?

How Should Teams Evaluate Vector Database Options?

What Vector Databases Actually Do

Why They Exist

When You Need One

The Options

pgvector (PostgreSQL Extension)

Pinecone (Managed Service)

Weaviate (Open Source)

Qdrant (Open Source)

Chroma (Open Source)

Choosing the Right Option

Performance Considerations

Frequently Asked Questions

Sources

Comments

Related Articles

What Large Language Models Actually Do (in Plain English)

Retrieval-Augmented Generation: How Chatbots Stop Hallucinating

Fine-Tuning vs RAG vs Prompting: Which Do You Actually Need?