performance-optimization

3 posts

naver

@RequestCache: Developing a Custom (opens in new tab)

The development of `@RequestCache` addresses the performance degradation and network overhead caused by redundant external API calls or repetitive computations within a single HTTP request. By implementing a custom Spring-based annotation, developers can ensure that specific data is fetched only once per request and shared across different service layers. This approach provides a more elegant and maintainable solution than manual parameter passing or struggling with the limitations of global caching strategies. ### Addressing Redundant Operations in Web Services * Modern web architectures often involve multiple internal services (e.g., Order, Payment, and Notification) that independently request the same data, such as a user profile. * These redundant calls increase response times, put unnecessary load on external servers, and waste system resources. * `@RequestCache` provides a declarative way to cache method results within the scope of a single HTTP request, ensuring the actual logic or API call is executed only once. ### Limitations of Manual Data Passing * The common alternative of passing response objects as method parameters leads to "parameter drilling," where intermediate service layers must accept data they do not use just to pass it to a deeper layer. * In the "Strategy Pattern," adding a new data dependency to an interface forces every implementation to change, even those that have no use for the new parameter, which violates clean architecture principles. * Manual passing makes method signatures brittle and increases the complexity of refactoring as the call stack grows. ### The TTL Dilemma in Traditional Caching * Using Redis or a local cache with Time-To-Live (TTL) settings is often insufficient for request-level isolation. * If the TTL is set too short, the cache might expire before a long-running request finishes, leading to the very redundant calls the system was trying to avoid. * If the TTL is too long, the cache persists across different HTTP requests, which is logically incorrect for data that should be fresh for every new user interaction. ### Leveraging Spring’s Request Scope and Proxy Mechanism * The implementation utilizes Spring’s `@RequestScope` to manage the cache lifecycle, ensuring that data is automatically cleared when the request ends. * Under the hood, `@RequestScope` uses a Singleton Proxy that delegates calls to a specific instance stored in the `RequestContextHolder` for the current thread. * The cache relies on `RequestAttribute`, which uses `ThreadLocal` storage to guarantee isolation between different concurrent requests. * Lifecycle management is handled by Spring’s `FrameworkServlet`, which prevents memory leaks by automatically cleaning up request attributes after the response is sent. For applications dealing with deep call stacks or complex service interactions, a request-scoped caching annotation provides a robust way to optimize performance without sacrificing code readability. This mechanism is particularly recommended when the same data is needed across unrelated service boundaries within a single transaction, ensuring consistency and efficiency throughout the request lifecycle.

naver

Beyond the Side Effects of API- (opens in new tab)

JVM applications often suffer from initial latency spikes because the Just-In-Time (JIT) compiler requires a "warm-up" period to optimize frequently executed code into machine language. While traditional strategies rely on simulated API calls to trigger this optimization, these methods often introduce side effects like data pollution, log noise, and increased maintenance overhead. This new approach advocates for a library-centric warm-up that targets core execution paths and dependencies directly, ensuring high performance from the first real request without the risks of full-scale API simulation. ### Limitations of Traditional API-Based Warm-up * **Data and State Pollution:** Simulated API calls can inadvertently trigger database writes, send notifications, or pollute analytics data, requiring complex logic to bypass these side effects. * **Maintenance Burden:** As business logic and API signatures change, developers must constantly update the warm-up scripts or "dummy" requests to match the current application state. * **Operational Risk:** Relying on external dependencies or complex internal services during the warm-up phase can lead to deployment failures if the mock environment is not perfectly aligned with production. ### The Library-Centric Warm-up Strategy * **Targeted Optimization:** Instead of hitting the entry-point controllers, the focus shifts to warming up heavy third-party libraries and internal utility classes (e.g., JSON parsers, encryption modules, and DB drivers). * **Internal Execution Path:** By directly invoking methods within the application's service or infrastructure layer during the startup phase, the JIT compiler can reach "Tier 4" (C2) optimization for critical code blocks. * **Decoupled Logic:** Because the warm-up targets underlying libraries rather than specific business endpoints, the logic remains stable even when the high-level API changes. ### Implementation and Performance Verification * **Reflection and Hooks:** The implementation uses application startup hooks to execute intensive code paths, ensuring the JVM is "hot" before the load balancer begins directing traffic to the instance. * **JIT Compilation Monitoring:** Success is measured by tracking the number of JIT-compiled methods and the time taken to reach a stable state, specifically targeting the reduction of "cold" execution time. * **Latency Improvements:** Empirical data shows a significant reduction in P99 latency during the first few minutes of deployment, as the most CPU-intensive library functions are already pre-optimized. ### Advantages and Practical Constraints * **Safer Deployments:** Removing the need for simulated network requests makes the deployment process more robust and prevents accidental side effects in downstream systems. * **Granular Control:** Developers can selectively warm up only the most performance-sensitive parts of the application, saving startup time compared to a full-system simulation. * **Incomplete Path Coverage:** A primary limitation is that library-only warming may miss specific branch optimizations that occur only during full end-to-end request processing. To achieve the best balance between safety and performance, engineering teams should prioritize warming up shared infrastructure libraries and high-overhead utilities. While it may not cover 100% of the application's execution paths, a library-based approach provides a more maintainable and lower-risk foundation for JVM performance tuning than traditional request-based methods.

netflix

100X Faster: How We Supercharged Netflix Maestro’s Workflow Engine | by Netflix Technology Blog | Netflix TechBlog (opens in new tab)

Netflix has significantly optimized Maestro, its horizontally scalable workflow orchestrator, to meet the evolving demands of low-latency use cases like live events, advertising, and gaming. By redesigning the core engine to transition from a polling-based architecture to a high-performance event-driven model, the team achieved a 100x increase in speed. This evolution reduced workflow overhead from several seconds to mere milliseconds, drastically improving developer productivity and system efficiency. ### Limitations of the Legacy Architecture The original Maestro architecture was built on a three-layer system that, while scalable, introduced significant latency during execution. * **Polling Latency:** The internal flow engine relied on calling execution functions at set intervals, creating a "speedbump" where tasks waited seconds to be picked up by workers. * **Execution Overhead:** The process of translating complex workflow graphs into parallel flows and sequentially chained tasks added internal processing time that hindered sub-hourly and ad-hoc workloads. * **Concurrency Issues:** A lack of strong guarantees from the internal flow engine occasionally led to race conditions, where a single step might be executed by multiple workers simultaneously. ### Transitioning to an Event-Driven Engine To support the highest level of user needs, Netflix replaced the traditional flow engine with a custom, high-performance execution model. * **Direct Dispatching:** The engine moved away from periodic polling in favor of an event-driven mechanism that triggers state transitions instantly. * **State Machine Optimization:** The new design manages the lifecycle of workflows and steps through a more streamlined state machine, ensuring faster transitions between "start," "restart," "stop," and "pause" actions. * **Reduced Data Latency:** The team optimized data access patterns for internal state storage, reducing the time required to write Maestro data to the database during high-volume executions. ### Scalability and Functional Improvements The redesign not only improved speed but also strengthened the engine's ability to handle massive, complex data pipelines. * **Isolation Layers:** The engine maintains strict isolation between the Maestro step runtime (integrated with Spark and Trino) and the underlying execution logic. * **Support for Heterogeneous Workflows:** The supercharged engine continues to support massive workflows with hundreds of thousands of jobs while providing the low latency required for iterative development cycles. * **Reliability Guarantees:** By moving to a more robust internal event bus, the system eliminated the race conditions found in the previous distributed job queue implementation. For organizations managing large-scale Data or ML workflows, moving toward an event-driven orchestration model is essential for supporting sub-hourly execution and low-latency ad-hoc queries. These performance improvements are now available in the Maestro open-source project for wider community adoption.