line

Milvus: Building a (opens in new tab)

LINE VOOM transitioned its recommendation system from a batch-based offline process to a real-time infrastructure to solve critical content freshness issues. By adopting Milvus, an open-source vector database, the team enabled the immediate indexing and searching of new video content as soon as it is uploaded. This implementation ensures that time-sensitive posts are recommended to users without the previous 24-hour delay, significantly enhancing user engagement.

Limitations of the Legacy Recommendation System

  • The original system relied on daily offline batch processing for embedding generation and similarity searches.
  • New content, such as holiday greetings or trending sports clips, suffered from a "lack of immediacy," often taking up to a full day to appear in user feeds.
  • To improve user experience, the team needed to shift from offline candidate pools to an online system capable of real-time Approximate Nearest Neighbor (ANN) searches.

Selecting Milvus as the Vector Database

  • The team evaluated Milvus and Qdrant based on performance, open-source status, and on-premise compatibility.
  • Milvus was selected due to its superior performance, handling 2,406 requests per second compared to Qdrant's 326, with lower query latency (1ms vs 4ms).
  • Key architectural advantages of Milvus included the separation of storage and computing, support for both stream and batch inserts, and a diverse range of supported in-memory index types.

Reliability Verification via Chaos Testing

  • Given the complexity of Milvus clusters, the team performed chaos testing by intentionally injecting failures like pod kills and scaling events.
  • Tests revealed critical vulnerabilities: killing the Querycoord led to collection release and search failure, while losing the Etcd quorum caused total metadata loss.
  • These findings highlighted the need for robust high-availability (HA) configurations to prevent service interruptions during component failures.

High Availability (HA) Implementation Strategies

  • Collection-Level HA: To prevent search failures during coordinator issues, the team implemented a dual-writing system where embeddings are recorded in two separate collections simultaneously.
  • Alias Switching: Client applications use an "alias" to reference collections; if the primary collection becomes unavailable, the system instantly switches the alias to the backup collection to minimize downtime.
  • Coordinator-Level HA: To eliminate single points of failure, coordinators (such as Indexcoord) were configured in an Active-Standby mode, ensuring a backup is always ready to take over management tasks.

To successfully deploy a large-scale real-time recommendation engine, it is critical to select a vector database that decouples storage from compute and to implement multi-layered high-availability strategies, such as dual-collection writing and active-standby coordinators, to ensure production stability.