The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools.
### Protocol Selection and Implementation
- The team initially attempted to use the gRPC protocol but encountered `http2: frame too large` errors and significant CPU overhead during performance testing.
- Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol.
- Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate.
### Keyspace Architecture and Data Routing
- The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data.
- The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform.
- The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history.
- VTGate automatically routes queries to the correct shard by analyzing the sharding key in the `WHERE` clause or `INSERT` statement, minimizing cross-shard overhead.
### MySQL Compatibility and Transaction Logic
- Vitess maintains `REPEATABLE READ` isolation for single-shard transactions, while multi-shard transactions default to `READ COMMITTED`.
- Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards.
- Query execution plans are analyzed using `VEXPLAIN` and `VTEXPLAIN`, often managed through the VTAdmin web interface for better visibility.
- Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core.
### Automated Operations and Monitoring
- The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops.
- Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000).
- Real-time alerts are routed through Slack and email, using `tablet_alias` to specifically identify which MySQL node or VTTablet is experiencing issues.
- A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time.
For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.