Automating RDS Postgres to Aurora Postgres Migration -- Listen Share Ram Srivasta Kannan, Wale Akintayo, Jay Bharadwaj, John Crimmins, Shengwei Wang, Zhitao Zhu Introduction In 2024, the Online Data Stores team at Netflix conducted a comprehensive review of the relational databa…
AWS has expanded the capabilities of Amazon S3 Tables by introducing Intelligent-Tiering for automated cost optimization and cross-region replication for enhanced data availability. These updates address the operational overhead of managing large-scale Apache Iceberg datasets by automating storage lifecycle management and simplifying the architecture required for global data distribution. By integrating these features, organizations can reduce storage costs without manual intervention while ensuring consistent data access across multiple AWS Regions and accounts.
### Cost Optimization with S3 Tables Intelligent-Tiering
This feature automatically shifts data between storage tiers based on access frequency to maximize cost efficiency without impacting application performance.
* The system utilizes three low-latency tiers: Frequent Access, Infrequent Access (offering 40% lower costs), and Archive Instant Access (offering 68% lower costs than Infrequent Access).
* Data transitions are automated, moving to Infrequent Access after 30 days of inactivity and to Archive Instant Access after 90 days.
* Automated table maintenance tasks, such as compaction and snapshot expiration, are optimized to skip colder files; for example, compaction only processes data in the Frequent Access tier to minimize unnecessary compute and storage costs.
* Users can configure Intelligent-Tiering as the default storage class at the table bucket level using the AWS CLI commands `put-table-bucket-storage-class` and `get-table-bucket-storage-class`.
### Cross-Region and Cross-Account Replication
New replication support allows users to maintain synchronized, read-only replicas of their S3 Tables across different geographic locations and ownership boundaries.
* Replication maintains chronological consistency and preserves parent-child snapshot relationships, ensuring that replicas remain identical to the source for query purposes.
* Replica tables are typically updated within minutes of changes to the source table and support independent encryption and retention policies to meet specific regional compliance requirements.
* The service eliminates the need for complex, custom-built architectures to track metadata transformations or manually sync objects between Iceberg tables.
* This functionality is primarily designed to reduce query latency for geographically distributed teams and provide robust data protection for disaster recovery scenarios.
### Practical Implementation
To maximize the benefits of these new features, organizations should consider setting Intelligent-Tiering as the default storage class at the bucket level for all new datasets to ensure immediate cost savings. For global operations, setting up read-only replicas in regions closest to end-users will significantly improve query performance for analytics tools like Amazon Athena and Amazon SageMaker.
Naver Pay successfully transitioned its core database replication system from a legacy tool to "ergate," a high-performance CDC (Change Data Capture) solution built on Apache Flink and Spring. This strategic overhaul was designed to improve maintainability for backend developers while resolving rigid schema dependencies that previously caused operational bottlenecks. By leveraging a modern stream-processing architecture, the system now manages massive transaction volumes with sub-second latency and enhanced reliability.
### Limitations of the Legacy System
* **Maintenance Barriers:** The previous tool, mig-data, was written in pure Java by database core specialists, making it difficult for standard backend developers to maintain or extend.
* **Strict Schema Dependency:** Developers were forced to follow a rigid DDL execution order (Target DB before Source DB) to avoid replication halts, complicating database operations.
* **Blocking Failures:** Because the legacy system prioritized bi-directional data integrity, a single failed record could stall the entire replication pipeline for a specific shard.
* **Operational Risk:** Recovery procedures were manual and restricted to a small group of specialized personnel, increasing the time-to-recovery during outages.
### Technical Architecture and Stack
* **Apache Flink (LTS 2.0.0):** Selected for its high-availability, low-latency, and native Kafka integration, allowing the team to focus on replication logic rather than infrastructure.
* **Kubernetes Session Mode:** Used to manage 12 concurrent jobs (6 replication, 6 verification) through a single Job Manager endpoint for streamlined monitoring and deployment.
* **Hybrid Framework Approach:** The team isolated high-speed replication logic within Flink while using Spring (Kotlin) for complex recovery modules to leverage developer familiarity.
* **Data Pipeline:** The system captures MySQL binlogs via `nbase-cdc`, publishes them to Kafka, and uses Flink `jdbc-sink` jobs to apply changes to Target DBs (nBase-T and Oracle).
### Three-Tier Operational Model: Replication, Verification, and Recovery
* **Real-time Replication:** Processes incoming Kafka records and appends custom metadata columns (`ergate_yn`, `rpc_time`) to track the replication source and original commit time.
* **Delayed Verification:** A dedicated "verifier" Flink job consumes the same Kafka topic with a 2-minute delay to check Target DB consistency against the source record.
* **Secondary Logic:** To prevent false positives from rapid updates, the verifier performs a live re-query of the Source DB if a mismatch is initially detected.
* **Multi-Stage Recovery:**
* **Automatic Short-term:** Retries transient failures after 5 minutes.
* **Automatic Long-term:** Uses batch processes to resolve persistent discrepancies.
* **Manual:** Provides an admin interface for developers to trigger targeted reconciliations via API.
### Improvements in Schema Management and Performance
* **DDL Independence:** By implementing query and schema caching, ergate allows Source and Target tables to be updated in any order without halting the pipeline.
* **Performance Scaling:** The new system is designed to handle 10x the current peak QPS, ensuring stability even during high-traffic events like major sales or promotions.
* **Metadata Tracking:** The inclusion of specific replication identifiers allows for clear distinction between automated replication and manual force-sync actions during troubleshooting.
The ergate project demonstrates that a hybrid architecture—combining the high-throughput processing of Apache Flink with the robust logic handling of Spring—is highly effective for mission-critical financial systems. Organizations managing large-scale data replication should consider decoupling complex recovery logic from the main processing stream to ensure both performance and developer productivity.
Netflix has developed a distributed Write-Ahead Log (WAL) abstraction to address critical data challenges such as accidental corruption, system entropy, and the complexities of cross-region replication. By decoupling data mutation from immediate persistence and providing a unified API, this system ensures strong durability and eventual consistency across diverse storage engines. The WAL acts as a resilient buffer that powers high-leverage features like secondary indexing and delayed retry queues while maintaining the massive scale required for global operations.
### The Role of the WAL Abstraction
* The system serves as a centralized mechanism to capture data changes and reliably deliver them to downstream consumers, mitigating the risk of data loss during administrative errors or database corruption.
* It provides a simplified `WriteToLog` gRPC endpoint that abstracts underlying infrastructure, allowing developers to focus on data logic rather than the specifics of the storage layer.
* By acting as a durable intermediary, it prevents permanent data loss during incidents where primary datastores fail or require schema changes that might otherwise lead to corruption.
### Flexible Personas and Namespaces
* The architecture utilizes "namespaces" to define logical separation, allowing different services to configure specific storage backends like Kafka or SQS based on their needs.
* The "Delayed Queues" persona leverages SQS to provide a scalable way to retry failed messages in real-time pipelines without sacrificing overall system throughput.
* The system can be configured for "Cross-Region Replication," enabling high availability and disaster recovery for storage engines that do not natively support multi-region data transfer.
### Solving System Entropy and Consistency
* The WAL addresses the "dual-write" problem, where updates to primary stores (such as Cassandra) and search indices (such as Elasticsearch) can diverge over time, leading to data inconsistency.
* It facilitates reliable secondary indexing for NoSQL databases by managing updates to multiple partitions as a coordinated sequence of events.
* The platform mitigates operational risks, such as Out-of-Memory (OOM) errors on Key-Value nodes caused by bulk deletes, by staging and throttling mutations through the log.
Organizations operating at scale should adopt a WAL-centric architecture to simplify the management of heterogeneous data stores and enhance system resilience. By centralizing the mutation log, teams can implement complex features like Change Data Capture (CDC) and cross-region failover through a single, consistent interface rather than building bespoke solutions for every service.