vitess

2 posts

line

Replacing the Payment System DB Handling (opens in new tab)

The LINE Billing Platform successfully migrated its large-scale payment database from Nbase-T to Vitess to handle high-traffic global transactions. While initially exploring gRPC for its performance reputation, the team transitioned to the MySQL protocol to ensure stability and reduce CPU overhead within their Java-based environment. This implementation demonstrates how Vitess can manage complex sharding requirements while maintaining high availability through automated recovery tools. ### Protocol Selection and Implementation - The team initially attempted to use the gRPC protocol but encountered `http2: frame too large` errors and significant CPU overhead during performance testing. - Manual mapping of query results to Java objects proved cumbersome with the Vitess gRPC client, leading to a shift toward the more mature and recommended MySQL protocol. - Using the MySQL protocol allowed the team to leverage standard database drivers while benefiting from Vitess's routing capabilities via VTGate. ### Keyspace Architecture and Data Routing - The system utilizes a dual-keyspace strategy: a "Global Keyspace" for unsharded metadata and a "Service Keyspace" for sharded transaction data. - The Global Keyspace manages sharding keys using a "sequence" table type to ensure unique, auto-incrementing identifiers across the platform. - The Service Keyspace is partitioned into $N$ shards using a hash-based Vindex, which distributes coin balances and transaction history. - VTGate automatically routes queries to the correct shard by analyzing the sharding key in the `WHERE` clause or `INSERT` statement, minimizing cross-shard overhead. ### MySQL Compatibility and Transaction Logic - Vitess maintains `REPEATABLE READ` isolation for single-shard transactions, while multi-shard transactions default to `READ COMMITTED`. - Advanced features like Two-Phase Commit (2PC) are available for handling distributed transactions across multiple shards. - Query execution plans are analyzed using `VEXPLAIN` and `VTEXPLAIN`, often managed through the VTAdmin web interface for better visibility. - Certain limitations apply, such as temporary tables only being supported in unsharded keyspaces and specific unsupported SQL cases documented in the Vitess core. ### Automated Operations and Monitoring - The team employs VTOrc (based on Orchestrator) to automatically detect and repair database failures, such as unreachable primaries or replication stops. - Monitoring is centralized via Prometheus, which scrapes metrics from VTOrc, VTGate, and VTTablet components at dedicated ports (e.g., 16000). - Real-time alerts are routed through Slack and email, using `tablet_alias` to specifically identify which MySQL node or VTTablet is experiencing issues. - A web-based recovery dashboard provides a history of automated fixes, allowing operators to track the health of the cluster over time. For organizations migrating high-traffic legacy systems to a cloud-native sharding solution, prioritizing the MySQL protocol over gRPC is recommended for better compatibility with existing application frameworks and reduced operational complexity.

line

Replacing the Database of a Payment System (opens in new tab)

The LINE Billing Platform team recently migrated its core payment database from Nbase-T to Vitess to address rising licensing costs while maintaining the high availability required for financial transactions. After a rigorous Proof of Concept (PoC) evaluating Apache ShardingSphere, TiDB, and Vitess, the team selected Vitess for its mature sharding capabilities and its ability to provide a stable, scalable environment on bare-metal infrastructure. This migration ensures the platform can handle large-scale traffic efficiently without the financial burden of proprietary license fees. ### Evaluation of Alternative Sharding Solutions Before settling on Vitess, the team analyzed other prominent distributed database technologies to determine their fit for a high-stakes payment system: * **Apache ShardingSphere:** While it offers flexible Proxy and JDBC layers, it was excluded because it requires significant manual effort for data resharding and rebalancing. The management overhead for implementing shard-key logic across various components (API, batch, admin) was deemed too high. * **TiDB:** This MySQL-compatible distributed database uses a decoupled architecture consisting of TiDB (SQL layer), PD (metadata management), and TiKV (row-based storage). Its primary advantage is automatic rebalancing and the lack of a required shard key, which significantly reduces DBA operational costs. * **Nbase-T:** The legacy system provided the highest performance efficiency per resource unit; however, the shift from a free to a paid licensing model necessitated the move to an open-source alternative. ### Vitess Architecture and Core Components Vitess was chosen for its proven track record at companies like YouTube and GitHub, offering a robust abstraction layer that makes a clustered database appear as a single instance to the application. The system relies on several specialized components: * **VTGate:** A proxy server that routes queries to the correct VTTablet, manages distributed transactions, and hides the physical topology of the database from the application. * **VTTablet:** A sidecar process running alongside each MySQL instance that manages query execution, data replication, and connection pooling. * **VTorc and Topology Server:** High availability is managed by VTorc (an automated failover tool), while metadata regarding shard locations and node status is synchronized via a topology server using ZooKeeper or etcd. ### PoC Performance and Environment Setup The team conducted performance testing by simulating real payment API scenarios (a mix of reads and writes) on standardized hardware (8vCPU, 16GB RAM). * **Comparison Metrics:** The tests focused on Transactions Per Second (TPS) and resource utilization as thread counts increased. * **Infrastructure Strategy:** Because payment systems cannot tolerate even brief failover delays, the team opted for a bare-metal deployment rather than a containerized one to ensure maximum stability and performance. * **Resource Efficiency:** While Nbase-T showed the best raw efficiency, Vitess demonstrated the necessary scalability and management features required to replace the legacy system effectively within the new cost constraints. ### Practical Recommendation For organizations managing critical core systems that require horizontal scaling without proprietary lock-in, Vitess is a highly recommended solution. While it requires a deep understanding of its various components (like VTGate and VTTablet) and careful configuration of its topology server, the trade-off is a mature, cloud-native-ready architecture that supports massive scale and automated failover on both bare-metal and cloud environments.