data-migration

2 posts

line

Integration of LINE App’s Multi-party Chat Features (opens in new tab)

이 글은 합병 이전 구 블로그에 게시했던 기사(최초 게시일: 2022년 2월 24일)를 현재 블로그로 이관한 것으로, 내용은 최초 게시 시점 기준입니다. LINE은 1:1 대화뿐 아니라 다자간 대화도 지원합니다. 그런데 LINE에는 서로 다른 용도로 개발된 두 가지 다자간 대화 기능인 '여러 명과의 대화'와 '그룹'이 있었습니다. 여러 명과의 대화(Room)는 일시적인 대화를 위해 설계됐습니다. 여러 명과의 대화를 만들 때에는 따로 방의 이름을 지정할 필요가 없으며, 친구를 여러 명과의 대화에…

toss

From Legacy Payment Ledger to Scalable System (opens in new tab)

Toss Payments successfully modernized a 20-year-old legacy payment ledger by transitioning to a decoupled, MySQL-based architecture designed for high scalability and consistency. By implementing strategies like INSERT-only immutability and event-driven domain isolation, they overcame structural limitations such as the inability to handle split payments. Ultimately, the project demonstrates that robust system design must be paired with resilient operational recovery mechanisms to manage the complexities of large-scale financial migrations. ### Legacy Ledger Challenges * **Inconsistent Schemas:** Different payment methods used entirely different table structures; for instance, a table named `REFUND` unexpectedly contained only account transfer data rather than all refund types. * **Domain Coupling:** Multiple domains (settlement, accounting, and payments) shared the same tables and columns, meaning a single schema change required impact analysis across several teams. * **Structural Limits:** A rigid 1:1 relationship between a payment and its method prevented the implementation of modern features like split payments or "Dutch pay" models. ### New Ledger Architecture * **Data Immutability:** The system shifted from updating existing rows to an **INSERT-only** principle, ensuring a reliable audit trail and preventing database deadlocks. * **Event-Driven Decoupling:** Instead of direct database access, the system uses Kafka to publish payment events, allowing independent domains to consume data without tight coupling. * **Payment-Approval Separation:** By separating the "Payment" (the transaction intent) from the "Approval" (the specific financial method), the system now supports multiple payment methods per transaction. ### Safe Migration and Data Integrity * **Asynchronous Mirroring:** To maintain zero downtime, data was initially written to the legacy system and then asynchronously loaded into the new MySQL ledger. * **Resource Tuning:** Developers used dedicated migration servers within the same AWS Availability Zone to minimize latency and implemented **Bulk Inserts** to handle hundreds of millions of rows efficiently. * **Verification Batches:** A separate batch process ran every five minutes against a Read-Only (RO) database to identify and correct any data gaps caused by asynchronous processing failures. ### Operational Resilience and Incident Response * **Query Optimization:** During a load spike, the MySQL optimizer chose "Full Scans" over indexes; the team resolved this by implementing SQL hints and utilizing a 5-version Docker image history for rapid rollbacks. * **Network Cancellation:** To handle timeouts between Toss and external card issuers, the system uses specific logic to automatically send cancellation requests and synchronize states. * **Timeout Standardization:** Discrepancies between microservices were resolved by calculating the maximum processing time of approval servers and aligning all upstream timeout settings to prevent merchant response mismatches. * **Reliable Event Delivery:** While using the **Outbox pattern** for events, the team added log-based recovery (Elasticsearch and local disk) and idempotency keys in event headers to handle both missing and duplicate messages. For organizations tackling significant technical debt, this transition highlights that initial design is only half the battle. True system reliability comes from building "self-healing" structures—such as automated correction batches and standardized timeout chains—that can survive the unpredictable nature of live production environments.