embedding

2 posts

woowahan

Enhancing the “Frequently (opens in new tab)

Baedal Minjok (Baemin) has significantly improved its cart recommendation system by transitioning from a basic Item2Vec model to a sophisticated two-stage architecture that combines graph-based embeddings with Transformer sequence modeling. This evolution addresses the "substitutability bias" and lack of sequential context found in previous methods, allowing the system to understand the specific intent behind a user's shopping journey. By moving beyond simple item similarity, the new model effectively identifies cross-selling opportunities that align with the logical flow of a customer's purchase behavior. ### Limitations of the Item2Vec Approach * **Substitutability Bias:** The original Item2Vec model, based on the Skip-gram architecture, tended to map items from the same category into similar vector spaces. This resulted in recommending alternative brands of the same product (e.g., suggesting another brand of milk) rather than complementary goods (e.g., cereal or bread). * **Loss of Sequential Context:** Because Item2Vec treats a basket of goods as a "bag of words," it ignores the order in which items are added. This prevents the model from distinguishing between different user intents, such as a user starting with meat to grill versus a user starting with ingredients for a stew. * **Failure in Cross-Selling:** The primary goal of cart recommendations is to encourage cross-selling, but the reliance on embedding similarity alone limited the diversity of suggestions, often trapping users within a single product category. ### Stage 1: Graph-Based Product and Category Embeddings * **Node2Vec Implementation:** To combat data sparsity and the "long-tail" problem where many items have low purchase frequency, the team utilized Node2Vec. This method uses random walks to generate sequences that help the model learn structural relationships even when direct transaction data is thin. * **Heterogeneous Graph Construction:** The graph consists of both "Item Nodes" and "Category Nodes." Connecting items to their respective categories allows the system to generate initial vectors for new or low-volume products that lack sufficient historical purchase data. * **Association Rule Weighting:** Rather than using simple co-occurrence counts for edge weights, the team applied Association Rules. This ensures that weights reflect the actual strength of the complementary relationship, preventing popular "mega-hit" items from dominating all recommendation results. ### Stage 2: Transformer-Based Sequence Recommendation * **Capturing Purchase Context:** The second stage employs a Transformer model to analyze the sequence of items currently in the user's cart. This architecture is specifically designed to understand how the meaning of an item changes based on what preceded it. * **Next Item Prediction:** Using the pre-trained embeddings from Stage 1 as inputs, the Transformer predicts the most likely "next item" a user will add. This allows the system to provide dynamic recommendations that evolve as the user continues to shop. * **Integration of Category Data:** By feeding both item-level and category-level embeddings into the Transformer, the model maintains a high level of accuracy even when a user interacts with niche products, as the category context provides a fallback for the recommendation logic. ### Practical Conclusion For production-scale recommendation systems, relying solely on item similarity often leads to redundant suggestions that do not drive incremental sales. By decoupling the learning of structural relationships (via graphs) from the learning of temporal intent (via Transformers), engineers can build a system that is robust against data sparsity while remaining highly sensitive to the immediate context of a user's session. This two-stage approach is recommended for e-commerce environments where cross-category discovery is a key business metric.

line

Milvus: Building a (opens in new tab)

LINE VOOM transitioned its recommendation system from a batch-based offline process to a real-time infrastructure to solve critical content freshness issues. By adopting Milvus, an open-source vector database, the team enabled the immediate indexing and searching of new video content as soon as it is uploaded. This implementation ensures that time-sensitive posts are recommended to users without the previous 24-hour delay, significantly enhancing user engagement. ### Limitations of the Legacy Recommendation System * The original system relied on daily offline batch processing for embedding generation and similarity searches. * New content, such as holiday greetings or trending sports clips, suffered from a "lack of immediacy," often taking up to a full day to appear in user feeds. * To improve user experience, the team needed to shift from offline candidate pools to an online system capable of real-time Approximate Nearest Neighbor (ANN) searches. ### Selecting Milvus as the Vector Database * The team evaluated Milvus and Qdrant based on performance, open-source status, and on-premise compatibility. * Milvus was selected due to its superior performance, handling 2,406 requests per second compared to Qdrant's 326, with lower query latency (1ms vs 4ms). * Key architectural advantages of Milvus included the separation of storage and computing, support for both stream and batch inserts, and a diverse range of supported in-memory index types. ### Reliability Verification via Chaos Testing * Given the complexity of Milvus clusters, the team performed chaos testing by intentionally injecting failures like pod kills and scaling events. * Tests revealed critical vulnerabilities: killing the `Querycoord` led to collection release and search failure, while losing the `Etcd` quorum caused total metadata loss. * These findings highlighted the need for robust high-availability (HA) configurations to prevent service interruptions during component failures. ### High Availability (HA) Implementation Strategies * **Collection-Level HA:** To prevent search failures during coordinator issues, the team implemented a dual-writing system where embeddings are recorded in two separate collections simultaneously. * **Alias Switching:** Client applications use an "alias" to reference collections; if the primary collection becomes unavailable, the system instantly switches the alias to the backup collection to minimize downtime. * **Coordinator-Level HA:** To eliminate single points of failure, coordinators (such as `Indexcoord`) were configured in an Active-Standby mode, ensuring a backup is always ready to take over management tasks. To successfully deploy a large-scale real-time recommendation engine, it is critical to select a vector database that decouples storage from compute and to implement multi-layered high-availability strategies, such as dual-collection writing and active-standby coordinators, to ensure production stability.