Replacing the Payment System DB Handling (opens in new tab)
The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools.
Protocol Selection and Implementation
- The team initially attempted to use the gRPC protocol but encountered
http2: frame too largeerrors and significant CPU overhead during performance testing. - Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol.
- Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate.
Keyspace Architecture and Data Routing
- The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data.
- The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform.
- The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history.
- VTGate automatically routes queries to the correct shard by analyzing the sharding key in the
WHEREclause orINSERTstatement, minimizing cross-shard overhead.
MySQL Compatibility and Transaction Logic
- Vitess maintains
REPEATABLE READisolation for single-shard transactions, while multi-shard transactions default toREAD COMMITTED. - Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards.
- Query execution plans are analyzed using
VEXPLAINandVTEXPLAIN, often managed through the VTAdmin web interface for better visibility. - Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core.
Automated Operations and Monitoring
- The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops.
- Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000).
- Real-time alerts are routed through Slack and email, using
tablet_aliasto specifically identify which MySQL node or VTTablet is experiencing issues. - A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time.
For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.