Milvus: Building a (opens in new tab)
LINE VOOM transitioned its recommendation system from a batch-based offline process to a real-time infrastructure to solve critical content freshness issues. By adopting Milvus, an open-source vector database, the team enabled the immediate indexing and searching of new video content as soon as it is uploaded. This implementation ensures that time-sensitive posts are recommended to users without the previous 24-hour delay, significantly enhancing user engagement.
Limitations of the Legacy Recommendation System
- The original system relied on daily offline batch processing for embedding generation and similarity searches.
- New content, such as holiday greetings or trending sports clips, suffered from a "lack of immediacy," often taking up to a full day to appear in user feeds.
- To improve user experience, the team needed to shift from offline candidate pools to an online system capable of real-time Approximate Nearest Neighbor (ANN) searches.
Selecting Milvus as the Vector Database
- The team evaluated Milvus and Qdrant based on performance, open-source status, and on-premise compatibility.
- Milvus was selected due to its superior performance, handling 2,406 requests per second compared to Qdrant's 326, with lower query latency (1ms vs 4ms).
- Key architectural advantages of Milvus included the separation of storage and computing, support for both stream and batch inserts, and a diverse range of supported in-memory index types.
Reliability Verification via Chaos Testing
- Given the complexity of Milvus clusters, the team performed chaos testing by intentionally injecting failures like pod kills and scaling events.
- Tests revealed critical vulnerabilities: killing the
Querycoordled to collection release and search failure, while losing theEtcdquorum caused total metadata loss. - These findings highlighted the need for robust high-availability (HA) configurations to prevent service interruptions during component failures.
High Availability (HA) Implementation Strategies
- Collection-Level HA: To prevent search failures during coordinator issues, the team implemented a dual-writing system where embeddings are recorded in two separate collections simultaneously.
- Alias Switching: Client applications use an "alias" to reference collections; if the primary collection becomes unavailable, the system instantly switches the alias to the backup collection to minimize downtime.
- Coordinator-Level HA: To eliminate single points of failure, coordinators (such as
Indexcoord) were configured in an Active-Standby mode, ensuring a backup is always ready to take over management tasks.
To successfully deploy a large-scale real-time recommendation engine, it is critical to select a vector database that decouples storage from compute and to implement multi-layered high-availability strategies, such as dual-collection writing and active-standby coordinators, to ensure production stability.