spring-framework

3 posts

daangn

The Journey of Karrot Pay (opens in new tab)

Daangn Pay’s backend evolution demonstrates how software architecture must shift from a focus on development speed to a focus on long-term sustainability as a service grows. Over four years, the platform transitioned from a simple layered structure to a complex monorepo powered by Hexagonal and Clean Architecture principles to manage increasing domain complexity. This journey highlights that technical debt is often the price of early success, but structural refactoring is essential to support organizational scaling and maintain code quality. ## Early Speed with Layered Architecture * The initial system was built using a standard Controller-Service-Repository pattern to meet the urgent deadline for obtaining an electronic financial business license. * This simple structure allowed for rapid development and the successful launch of core remittance and wallet features. * As the service expanded to include promotions, billing, and points, the "Service" layer became overloaded with cross-cutting concerns like validation and permissions. * The lack of strict boundaries led to circular dependencies and "spaghetti code," making the system fragile and difficult to test or refactor. ## Decoupling Logic via Hexagonal Architecture * To address the tight coupling between business logic and infrastructure, the team adopted a Hexagonal (Ports and Adapters) approach. * The system was divided into three distinct modules: `domain` (pure POJO rules), `usecase` (orchestration of scenarios), and `adapter` (external implementations like DBs and APIs). * This separation ensured that core business logic remained independent of the Spring Framework or specific database technologies. * While this solved dependency issues and improved reusability across REST APIs and batch jobs, it introduced significant boilerplate code and the complexity of mapping between different data models (e.g., domain entities vs. persistence entities). ## Scaling to a Monorepo and Clean Architecture * As Daangn Pay grew from a single project into dozens of services handled by multiple teams, a Monorepo structure was implemented using Gradle multi-projects. * The architecture evolved to separate "Domain" modules (pure business logic) from "Service" modules (the actual runnable applications like API servers or workers). * An "Internal-First" policy was adopted, where modules are private by default and can only be accessed through explicitly defined public APIs to prevent accidental cross-domain contamination. * This setup currently manages over 30 services, providing a balance between code sharing and strict boundary enforcement between domains like Money, Billing, and Points. The evolution of Daangn Pay’s architecture serves as a practical reminder that there is no "perfect" architecture from the start; rather, the best design is one that adapts to the current size of the organization and the complexity of the business. Engineers should prioritize flexibility and structural constraints that guide developers toward correct patterns, ensuring the codebase remains manageable even as the team and service scale.

naver

@RequestCache: Developing a Custom (opens in new tab)

The development of `@RequestCache` addresses the performance degradation and network overhead caused by redundant external API calls or repetitive computations within a single HTTP request. By implementing a custom Spring-based annotation, developers can ensure that specific data is fetched only once per request and shared across different service layers. This approach provides a more elegant and maintainable solution than manual parameter passing or struggling with the limitations of global caching strategies. ### Addressing Redundant Operations in Web Services * Modern web architectures often involve multiple internal services (e.g., Order, Payment, and Notification) that independently request the same data, such as a user profile. * These redundant calls increase response times, put unnecessary load on external servers, and waste system resources. * `@RequestCache` provides a declarative way to cache method results within the scope of a single HTTP request, ensuring the actual logic or API call is executed only once. ### Limitations of Manual Data Passing * The common alternative of passing response objects as method parameters leads to "parameter drilling," where intermediate service layers must accept data they do not use just to pass it to a deeper layer. * In the "Strategy Pattern," adding a new data dependency to an interface forces every implementation to change, even those that have no use for the new parameter, which violates clean architecture principles. * Manual passing makes method signatures brittle and increases the complexity of refactoring as the call stack grows. ### The TTL Dilemma in Traditional Caching * Using Redis or a local cache with Time-To-Live (TTL) settings is often insufficient for request-level isolation. * If the TTL is set too short, the cache might expire before a long-running request finishes, leading to the very redundant calls the system was trying to avoid. * If the TTL is too long, the cache persists across different HTTP requests, which is logically incorrect for data that should be fresh for every new user interaction. ### Leveraging Spring’s Request Scope and Proxy Mechanism * The implementation utilizes Spring’s `@RequestScope` to manage the cache lifecycle, ensuring that data is automatically cleared when the request ends. * Under the hood, `@RequestScope` uses a Singleton Proxy that delegates calls to a specific instance stored in the `RequestContextHolder` for the current thread. * The cache relies on `RequestAttribute`, which uses `ThreadLocal` storage to guarantee isolation between different concurrent requests. * Lifecycle management is handled by Spring’s `FrameworkServlet`, which prevents memory leaks by automatically cleaning up request attributes after the response is sent. For applications dealing with deep call stacks or complex service interactions, a request-scoped caching annotation provides a robust way to optimize performance without sacrificing code readability. This mechanism is particularly recommended when the same data is needed across unrelated service boundaries within a single transaction, ensuring consistency and efficiency throughout the request lifecycle.

naver

Replacing a DB CDC replication tool that processes (opens in new tab)

Naver Pay successfully transitioned its core database replication system from a legacy tool to "ergate," a high-performance CDC (Change Data Capture) solution built on Apache Flink and Spring. This strategic overhaul was designed to improve maintainability for backend developers while resolving rigid schema dependencies that previously caused operational bottlenecks. By leveraging a modern stream-processing architecture, the system now manages massive transaction volumes with sub-second latency and enhanced reliability. ### Limitations of the Legacy System * **Maintenance Barriers:** The previous tool, mig-data, was written in pure Java by database core specialists, making it difficult for standard backend developers to maintain or extend. * **Strict Schema Dependency:** Developers were forced to follow a rigid DDL execution order (Target DB before Source DB) to avoid replication halts, complicating database operations. * **Blocking Failures:** Because the legacy system prioritized bi-directional data integrity, a single failed record could stall the entire replication pipeline for a specific shard. * **Operational Risk:** Recovery procedures were manual and restricted to a small group of specialized personnel, increasing the time-to-recovery during outages. ### Technical Architecture and Stack * **Apache Flink (LTS 2.0.0):** Selected for its high-availability, low-latency, and native Kafka integration, allowing the team to focus on replication logic rather than infrastructure. * **Kubernetes Session Mode:** Used to manage 12 concurrent jobs (6 replication, 6 verification) through a single Job Manager endpoint for streamlined monitoring and deployment. * **Hybrid Framework Approach:** The team isolated high-speed replication logic within Flink while using Spring (Kotlin) for complex recovery modules to leverage developer familiarity. * **Data Pipeline:** The system captures MySQL binlogs via `nbase-cdc`, publishes them to Kafka, and uses Flink `jdbc-sink` jobs to apply changes to Target DBs (nBase-T and Oracle). ### Three-Tier Operational Model: Replication, Verification, and Recovery * **Real-time Replication:** Processes incoming Kafka records and appends custom metadata columns (`ergate_yn`, `rpc_time`) to track the replication source and original commit time. * **Delayed Verification:** A dedicated "verifier" Flink job consumes the same Kafka topic with a 2-minute delay to check Target DB consistency against the source record. * **Secondary Logic:** To prevent false positives from rapid updates, the verifier performs a live re-query of the Source DB if a mismatch is initially detected. * **Multi-Stage Recovery:** * **Automatic Short-term:** Retries transient failures after 5 minutes. * **Automatic Long-term:** Uses batch processes to resolve persistent discrepancies. * **Manual:** Provides an admin interface for developers to trigger targeted reconciliations via API. ### Improvements in Schema Management and Performance * **DDL Independence:** By implementing query and schema caching, ergate allows Source and Target tables to be updated in any order without halting the pipeline. * **Performance Scaling:** The new system is designed to handle 10x the current peak QPS, ensuring stability even during high-traffic events like major sales or promotions. * **Metadata Tracking:** The inclusion of specific replication identifiers allows for clear distinction between automated replication and manual force-sync actions during troubleshooting. The ergate project demonstrates that a hybrid architecture—combining the high-throughput processing of Apache Flink with the robust logic handling of Spring—is highly effective for mission-critical financial systems. Organizations managing large-scale data replication should consider decoupling complex recovery logic from the main processing stream to ensure both performance and developer productivity.