Replacing the Payment System DB Handling (opens in new tab)
The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools. ### Protocol Selection and Implementation - The team initially attempted to use the gRPC protocol but encountered `http2: frame too large` errors and significant CPU overhead during performance testing. - Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol. - Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate. ### Keyspace Architecture and Data Routing - The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data. - The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform. - The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history. - VTGate automatically routes queries to the correct shard by analyzing the sharding key in the `WHERE` clause or `INSERT` statement, minimizing cross-shard overhead. ### MySQL Compatibility and Transaction Logic - Vitess maintains `REPEATABLE READ` isolation for single-shard transactions, while multi-shard transactions default to `READ COMMITTED`. - Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards. - Query execution plans are analyzed using `VEXPLAIN` and `VTEXPLAIN`, often managed through the VTAdmin web interface for better visibility. - Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core. ### Automated Operations and Monitoring - The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops. - Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000). - Real-time alerts are routed through Slack and email, using `tablet_alias` to specifically identify which MySQL node or VTTablet is experiencing issues. - A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time. For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.