vector-embeddings

1 posts

google

MUVERA: Making multi-vector retrieval as fast as single-vector search (opens in new tab)

MUVERA is a state-of-the-art retrieval algorithm that simplifies the computationally intensive process of multi-vector retrieval by converting it into a single-vector Maximum Inner Product Search (MIPS). By transforming complex multi-vector sets into Fixed Dimensional Encodings (FDEs), the system maintains the high accuracy of models like ColBERT while achieving the speed and scalability of traditional search infrastructures. This approach allows for efficient retrieval across massive datasets by leveraging highly optimized geometric search techniques that were previously incompatible with multi-vector similarity measures. ## The Limitations of Multi-Vector Retrieval While traditional models use a single embedding for an entire document, multi-vector models generate an embedding for every token, providing superior semantic depth but creating significant overhead. * Multi-vector representations lead to a massive increase in embedding volume, requiring more storage and processing power. * Similarity is typically calculated using "Chamfer matching," a non-linear operation that measures the maximum similarity between query tokens and document tokens. * Because Chamfer similarity is more complex than a standard dot-product, it cannot directly use sublinear search algorithms, often necessitating expensive exhaustive comparisons. ## Fixed Dimensional Encodings (FDEs) The core innovation of MUVERA is the reduction of multi-vector sets into a single, manageable vector representation that preserves mathematical relationships. * FDEs are single vectors designed so that their inner product closely approximates the original multi-vector Chamfer similarity. * The transformation process is "data-oblivious," meaning the mapping does not need to be trained on or adjusted for specific datasets or changes in data distribution. * By squeezing group information into a fixed-length format, MUVERA allows complex data points to be stored and queried using existing single-vector indexing structures. ## The MUVERA Retrieval Pipeline The algorithm functions as a multi-stage process that prioritizes both speed and precision through a retrieve-and-rerank architecture. * **FDE Generation:** Query and document multi-vector sets are mapped into FDEs to capture essential similarity information. * **MIPS-based Retrieval:** A standard MIPS solver indexes the document FDEs and rapidly identifies a set of likely candidates for a given query. * **Re-ranking:** The initial candidates are refined using the original, exact Chamfer similarity score to ensure the highest possible accuracy in the final results. MUVERA provides a practical framework for scaling high-accuracy multi-vector models to massive datasets without the traditional latency penalties. Its ability to bridge the gap between complex semantic modeling and optimized search infrastructure makes it a versatile tool for modern information retrieval systems.