line

Replacing the Payment System DB Handling (opens in new tab)

The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools.

Protocol Selection and Implementation

  • The team initially attempted to use the gRPC protocol but encountered http2: frame too large errors and significant CPU overhead during performance testing.
  • Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol.
  • Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate.

Keyspace Architecture and Data Routing

  • The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data.
  • The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform.
  • The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history.
  • VTGate automatically routes queries to the correct shard by analyzing the sharding key in the WHERE clause or INSERT statement, minimizing cross-shard overhead.

MySQL Compatibility and Transaction Logic

  • Vitess maintains REPEATABLE READ isolation for single-shard transactions, while multi-shard transactions default to READ COMMITTED.
  • Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards.
  • Query execution plans are analyzed using VEXPLAIN and VTEXPLAIN, often managed through the VTAdmin web interface for better visibility.
  • Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core.

Automated Operations and Monitoring

  • The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops.
  • Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000).
  • Real-time alerts are routed through Slack and email, using tablet_alias to specifically identify which MySQL node or VTTablet is experiencing issues.
  • A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time.

For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.