line Jul 23, 2025

Milvus: Building a (opens in new tab)

vector-db high-availability embedding milvus real-time-recommendation approximate-nearest-neighbor chaos-testing

LINE VOOM transitioned its recommendation system from a batch-based offline process to a real-time infrastructure to solve critical content freshness issues. By adopting Milvus, an open-source vector database, the team enabled the immediate indexing and searching of new video content as soon as it is uploaded. This implementation ensures that time-sensitive posts are recommended to users without the previous 24-hour delay, significantly enhancing user engagement.

Limitations of the Legacy Recommendation System

The original system relied on daily offline batch processing for embedding generation and similarity searches.
New content, such as holiday greetings or trending sports clips, suffered from a "lack of immediacy," often taking up to a full day to appear in user feeds.
To improve user experience, the team needed to shift from offline candidate pools to an online system capable of real-time Approximate Nearest Neighbor (ANN) searches.

Selecting Milvus as the Vector Database

The team evaluated Milvus and Qdrant based on performance, open-source status, and on-premise compatibility.
Milvus was selected due to its superior performance, handling 2,406 requests per second compared to Qdrant's 326, with lower query latency (1ms vs 4ms).
Key architectural advantages of Milvus included the separation of storage and computing, support for both stream and batch inserts, and a diverse range of supported in-memory index types.

Reliability Verification via Chaos Testing

Given the complexity of Milvus clusters, the team performed chaos testing by intentionally injecting failures like pod kills and scaling events.
Tests revealed critical vulnerabilities: killing the Querycoord led to collection release and search failure, while losing the Etcd quorum caused total metadata loss.
These findings highlighted the need for robust high-availability (HA) configurations to prevent service interruptions during component failures.

High Availability (HA) Implementation Strategies

Collection-Level HA: To prevent search failures during coordinator issues, the team implemented a dual-writing system where embeddings are recorded in two separate collections simultaneously.
Alias Switching: Client applications use an "alias" to reference collections; if the primary collection becomes unavailable, the system instantly switches the alias to the backup collection to minimize downtime.
Coordinator-Level HA: To eliminate single points of failure, coordinators (such as Indexcoord) were configured in an Active-Standby mode, ensuring a backup is always ready to take over management tasks.

To successfully deploy a large-scale real-time recommendation engine, it is critical to select a vector database that decouples storage from compute and to implement multi-layered high-availability strategies, such as dual-collection writing and active-standby coordinators, to ensure production stability.