Vector Search
Vector search (also called vector similarity search or semantic search) is the technique of finding information by comparing embedding vectors rather than matching keywords. It represents a fundamental shift in how discovery works—from lexical matching to semantic understanding.
Traditional search engines match query terms against document terms. Vector search converts both queries and documents into high-dimensional vectors and finds the nearest neighbors in that space. A search for "affordable electric cars" would find content about "budget EVs" or "low-cost battery vehicles" even if those exact words never appear—because the concepts are close in embedding space.
Vector Databases
The infrastructure supporting vector search has matured into a distinct category: the vector database. These systems are purpose-built to store, index, and query billions of high-dimensional vectors with millisecond latency. Unlike traditional databases optimized for exact-match lookups or range queries, vector databases use approximate nearest neighbor (ANN) algorithms—HNSW, IVF, product quantization—to find semantically similar items without scanning every record.
The landscape includes both purpose-built vector databases and traditional databases that have added vector capabilities:
- Purpose-built: Pinecone (fully managed, serverless), Weaviate (open-source, hybrid search), Qdrant (Rust-based, high performance), Milvus (CNCF graduated, distributed), and Chroma (lightweight, developer-friendly)
- Vector extensions to existing databases: PostgreSQL via pgvector, MongoDB Atlas Vector Search, Elasticsearch dense vector fields, and Redis Vector Similarity Search
The choice between purpose-built and extension depends on scale and architecture. Purpose-built vector databases deliver higher query throughput at billions of vectors, while extensions let teams add semantic search without introducing a new system into their stack.
The Embeddings Foundation
Vector search is only as good as its embeddings—the numerical representations that capture semantic meaning. Modern embedding models from OpenAI, Cohere, Google, and open-source projects like Sentence Transformers convert text, images, audio, and code into dense vectors (typically 768–3072 dimensions). The quality of these embeddings has improved dramatically: state-of-the-art models now capture nuanced relationships between concepts, including negation, analogy, and domain-specific jargon.
This tight coupling between LLMs and vector search creates a reinforcing loop: better language models produce better embeddings, which produce better retrieval, which produces better grounded LLM responses via RAG.
Applications and the Agentic Web
Vector search is the enabling technology behind Retrieval-Augmented Generation (RAG)—the dominant architecture for grounding LLM responses in specific knowledge bases. It powers AI search engines like Perplexity, product recommendation systems, content similarity matching, and the GEO landscape where AI systems discover and cite relevant content. As the agentic web matures, vector search becomes the "memory layer" that gives AI agents access to relevant context from massive data stores.