daangn

Redux for the Server: Developing a (opens in new tab)

Traditional CRUD-based architectures often struggle to meet complex backend requirements such as audit logging, version history, and state rollbacks. To address these challenges, Daangn’s Frontend Core team developed **Ventyd**, an open-source TypeScript library that implements event sourcing on the server using patterns familiar to Redux users. By shifting the focus from storing "current state" to storing a "history of events," developers can build more traceable and resilient systems. ### Limitations of Traditional CRUD * Standard CRUD (Create, Read, Update, Delete) patterns only record the final state of data, losing the context of "why" or "how" a change occurred. * Implementing complex features like approval workflows or history tracking usually requires manual table management, such as adding `status` columns or creating separate history tables. * Rollback logic in CRUD is often fragile and requires complex custom code to revert data to a previous specific state. ### The Event Sourcing Philosophy * Instead of overwriting rows in a database, event sourcing records every discrete action (e.g., "Post Created," "Post Approved," "Profile Updated") as an immutable sequence. * The system provides a built-in audit log, ensuring every change is attributed to a specific user, time, and reason. * State can be reconstructed for any point in time by "replaying" events, enabling seamless "time travel" and easier debugging. * It allows for deeper business insights by providing a full narrative of data changes rather than just a snapshot. ### Redux as a Server-Side Blueprint * The library leverages the familiarity of Redux to bridge the gap between frontend and backend engineering. * Just as Redux uses **Actions** and **Reducers** to manage state in the browser, event sourcing uses **Events** and **Reducers** to manage state in the database. * The primary difference is persistence: Redux manages state in memory, while Ventyd persists the event stream to a database for permanent storage. ### Technical Implementation with Ventyd * **Type-Safe Schemas**: Developers use `defineSchema` to define the shape of both the events and the resulting state, ensuring strict TypeScript validation. * **Validation Library Support**: Ventyd is flexible, supporting various validation libraries including Valibot, Zod, TypeBox, and ArkType. * **Reducer Logic**: The `defineReducer` function centralizes how the state evolves based on incoming events, making state transitions predictable and easy to test. * **Database Agnostic**: The library is designed to be flexible regarding the underlying storage, allowing it to integrate with different database systems. Ventyd offers a robust path for teams needing more than what basic CRUD can provide, particularly for internal tools requiring high accountability. By adopting this event-driven approach, developers can simplify the implementation of complex business logic while maintaining a clear, type-safe history of every action within their system.

line

Code Quality Improvement Techniques Part 30 (opens in new tab)

Code quality often suffers when functions share implicit dependencies, where the correct behavior of one relies on the state or validation provided by another. This "invisible" connection creates fragile code that is prone to runtime errors and logic mismatches during refactoring or feature expansion. To solve this, developers should consolidate related logic or make dependencies explicit to ensure consistency and safety. ## Problems with Implicit Function Dependencies When logic is split across separate functions—such as one for validation (`isContentValid`) and another for processing (`getMessageText`)—developers often rely on undocumented preconditions. * **Fragile Runtime Safety:** In the provided example, `getMessageText` throws a runtime error if called on invalid data, assuming the caller has already checked `isContentValid`. * **Maintenance Burden:** When new data types (e.g., a new message type) are added, developers must remember to update both functions to keep them in sync, increasing the risk of "forgotten" updates. * **Hidden Logic Flow:** Callers might not realize the two functions are linked, leading to improper usage where the transformation function is called without the necessary prior validation. ## Consolidating Logic for Single-Source Truth The most effective way to eliminate implicit dependencies is to merge filtering and transformation into a single function. This ensures that the code cannot reach a processing state without passing through the necessary logic. * **Nullable Returns:** By changing the transformation function to return a nullable type (`String?`), the function can signal that a piece of data is "invalid" or "empty" directly through its return value. * **Simplified Caller Logic:** The UI layer no longer needs to call two separate functions; it simply checks if the result of the transformation is null to determine visibility. * **Elimination of Redundant Branches:** This approach reduces the number of `when` or `if-else` blocks that need to be maintained across the codebase. ## Establishing Explicit Consistency In scenarios where separate functions for validation and transformation are required for clarity or architectural reasons, the validation logic should be defined in terms of the transformation. * **Dependent Validation:** Instead of writing a separate `when` block for `isContentValid`, the function should simply check if `getMessageText` returns a non-null value. * **Guaranteed Synchronization:** This structure makes the relationship between the two functions explicit and guarantees that if a message is deemed "valid," it will always produce a valid text output. * **Improved Documentation:** Defining functions this way serves as self-documenting code, showing future developers exactly how the two operations are linked. When functions share a "red thread" of logic, they should either be merged or structured so that one acts as the source of truth for the other. By removing the need for callers to remember implicit preconditions, you reduce the surface area for bugs and make the codebase significantly easier to extend.

naver

Smart Store Center's (opens in new tab)

Smart Store Center successfully migrated its legacy platform from Oracle to MySQL to overcome performance instability caused by resource contention and to reduce high licensing costs. By implementing a "dual write" strategy, the team achieved a zero-downtime transition while maintaining the ability to roll back immediately without data loss. This technical journey highlights the use of proxy data sources and transaction synchronization to ensure data integrity across disparate database environments. ## Zero-Downtime Migration via Dual Writing * The migration strategy relied on "dual writing," where all Create, Update, and Delete (CUD) operations are performed on both the legacy Oracle and the new MySQL databases. * In the pre-migration phase, Oracle served as the primary source for all traffic while MySQL recorded writes in the background to build a synchronized state. * Once data was fully migrated and verified, the primary traffic was shifted to MySQL, with background writes continuing to Oracle to allow for an instantaneous rollback if performance issues occurred. * This approach decoupled the database switch from application deployment, providing a safety net against critical failures that a simple redeploy could not fix. ## Technical Implementation for JPA * To capture and replicate queries, the team utilized the `datasource-proxy` library, which allowed them to intercept Oracle queries and execute them against a separate MySQL DataSource. * To prevent MySQL write failures from impacting the primary Oracle transactions, writes to the secondary database were managed using `TransactionSynchronizationManager`. * By executing MySQL queries during the `afterCommit` phase, the team ensured that the primary service remained stable even if the secondary database encountered errors or performance bottlenecks. * The transition required modifying JPA Entity configurations, such as changing primary key generation from Oracle Sequences to MySQL’s `IDENTITY` (auto-increment) and adjusting `columnDefinition` for types like `text`, `longtext`, and `decimal`. ## Centralized MyBatis Strategy * To avoid modifying thousands of business logic points in a 10-year-old codebase, the team sought a way to implement dual writing for MyBatis at the architectural level. * The implementation focused on the MyBatis `Configuration` and `MappedStatement` objects to capture SQL execution without requiring manual updates to individual repository interfaces. * This centralized approach maintained the purity of the business logic and ensured that the dual-write logic could be easily removed once the migration was fully stabilized. For organizations managing large-scale legacy migrations, the dual-write pattern combined with asynchronous transaction synchronization is a highly recommended safety mechanism. Prioritizing the isolation of secondary database failures ensures that the user experience remains unaffected while technical validation is performed in real-time.

toss

Will developers be replaced by AI? (opens in new tab)

The current AI hype cycle is a significant economic bubble where massive infrastructure investments of $560 billion far outweigh the modest $35 billion in generated revenue. However, drawing parallels to the 1995 dot-com era, the author argues that while short-term expectations are overblown, the long-term transformation of the developer role is inevitable. The conclusion is that developers won't be replaced but will instead evolve into "Code Creative Directors" who manage AI through the lens of technical abstraction and delegation. ### The Economic Bubble and Amara’s Law * The industry is experiencing a 16:1 imbalance between AI investment and revenue, with 95% of generative AI implementations reportedly failing to deliver clear efficiency improvements. * Amara’s Law suggests that we are overestimating AI's short-term impact while potentially underestimating its long-term necessity. * Much of the current "AI-driven" job market contraction is actually a result of companies cutting personnel costs to fund expensive GPU infrastructure and AI research. ### Jevons Paradox and the Evolution of Roles * Jevons Paradox indicates that as the "cost" of producing code drops due to AI efficiency, the total demand for software and the complexity of systems will paradoxically increase. * The developer’s identity is shifting from "code producer" to "system architect," focusing on agent orchestration, result verification, and high-level design. * AI functions as a "power tool" similar to game engines, allowing small teams to achieve professional-grade output while amplifying the capabilities of senior engineers. ### Delegation as a Form of Abstraction * Delegating a task to AI is an act of "work abstraction," which involves choosing which low-level details a developer can afford to ignore. * The technical boundary of what is "hard to delegate" is constantly shifting; for example, a complex RAG (Retrieval-Augmented Generation) pipeline built for GPT-4 might become obsolete with the release of a more capable model like GPT-5. * The focus for developers must shift from "what is easy to delegate" to "what *should* be delegated," distinguishing between routine boilerplate and critical human judgment. ### The Risks of Premature Abstraction * Abstraction does not eliminate complexity; it simply moves it into the future. If the underlying assumptions of an AI-generated system change, the abstraction "leaks" or breaks. * Sudden shifts in scaling (traffic surges), regulation (GDPR updates), or security (zero-day vulnerabilities) expose the limitations of AI-delegated work, requiring senior intervention. * Poorly managed AI delegation can lead to "abstraction debt," where the cost of fixing a broken AI-generated system exceeds the cost of having written it manually from the start. To thrive in this environment, developers should embrace AI not as a replacement, but as a layer of abstraction. Success requires mastering the ability to define clear boundaries for AI—delegating routine CRUD operations and boilerplate while retaining human control over architecture, security, and complex business logic.

aws

Announcing Amazon EC2 G7e instances accelerated by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs | AWS News Blog (opens in new tab)

Amazon has announced the general availability of EC2 G7e instances, a new hardware tier powered by NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs designed for generative AI and high-end graphics. These instances deliver up to 2.3 times the inference performance of their G6e predecessors while providing significant upgrades to memory and bandwidth. This launch aims to provide a cost-effective solution for running medium-sized AI models and complex spatial computing workloads at scale. **Blackwell GPU and Memory Advancements** * The G7e instances feature NVIDIA RTX PRO 6000 Blackwell GPUs, which provide twice the memory and 1.85 times the memory bandwidth of the G6e generation. * Each GPU provides 96 GB of memory, allowing users to run medium-sized models—such as those with up to 70 billion parameters—on a single GPU using FP8 precision. * The architecture is optimized for both spatial computing and scientific workloads, offering the highest graphics performance currently available in the EC2 portfolio. **High-Speed Connectivity and Multi-GPU Scaling** * To support large-scale models, G7e instances utilize NVIDIA GPUDirect P2P, enabling direct communication between GPUs over PCIe interconnects with minimal latency. * These instances offer four times the inter-GPU bandwidth compared to the L40s GPUs found in G6e instances, facilitating more efficient data transfer in multi-GPU configurations. * Total GPU memory can scale up to 768 GB within a single node, supporting massive inference tasks across eight interconnected GPUs. **Networking and Storage Performance** * G7e instances provide up to 1,600 Gbps of network bandwidth, a four-fold increase over previous generations, making them suitable for small-scale multi-node clusters. * Support for NVIDIA GPUDirect Remote Direct Memory Access (RDMA) via Elastic Fabric Adapter (EFA) reduces latency for remote GPU-to-GPU communication. * The instances support GPUDirect Storage with Amazon FSx for Lustre, achieving throughput speeds up to 1.2 Tbps to ensure rapid model loading and data processing. **System Specifications and Configurations** * Under the hood, G7e instances are powered by Intel Emerald Rapids processors and support up to 192 vCPUs and 2,048 GiB of system memory. * Local storage options include up to 15.2 TB of NVMe SSD capacity to handle high-speed data caching and local processing. * The instance family ranges from the g7e.2xlarge (1 GPU, 8 vCPUs) to the g7e.48xlarge (8 GPUs, 192 vCPUs). For developers ready to transition to Blackwell-based architecture, these instances are accessible through AWS Deep Learning AMIs (DLAMI). They represent a major step forward for organizations needing to balance the high memory requirements of modern LLMs with the cost efficiencies of the G-series instance family.

aws

AWS Weekly Roundup: Kiro CLI latest features, AWS European Sovereign Cloud, EC2 X8i instances, and more (January 19, 2026) | AWS News Blog (opens in new tab)

The January 19, 2026, AWS Weekly Roundup highlights significant advancements in sovereign cloud infrastructure and the general availability of high-performance, memory-optimized compute instances. The update also emphasizes the maturing ecosystem of AI agents, focusing on enhanced developer tooling and streamlined deployment workflows for agentic applications. These releases collectively aim to satisfy stringent regulatory requirements in Europe while pushing the boundaries of enterprise performance and automated productivity. ## Developer Tooling and Kiro CLI Enhancements * New granular controls for web fetch URLs allow developers to use allowlists and blocklists to strictly govern which external resources an agent can access. * The update introduces custom keyboard shortcuts to facilitate seamless switching between multiple specialized agents within a single session. * Enhanced diff views provide clearer visibility into changes, improving the debugging and auditing process for automated workflows. ## AWS European Sovereign Cloud General Availability * Following its initial 2023 announcement, this independent cloud infrastructure is now generally available to all customers. * The environment is purpose-built to meet the most rigorous sovereignty and data residency requirements for European organizations. * It offers a comprehensive set of AWS services within a framework that ensures operational independence and localized data handling. ## High-Performance Computing with EC2 X8i Instances * The memory-optimized X8i instances, powered by custom Intel Xeon 6 processors, have moved from preview to general availability. * These instances feature a sustained all-core turbo frequency of 3.9 GHz, which is currently exclusive to the AWS platform. * The hardware is SAP certified and engineered to provide the highest memory bandwidth and performance for memory-intensive enterprise workloads compared to other Intel-based cloud offerings. ## Agentic AI and Productivity Updates * Amazon Quick Suite continues to expand as a workplace "agentic teammate," designed to synthesize research and execute actions based on organizational insights. * New technical guidance has been released regarding the deployment of AI agents on Amazon Bedrock AgentCore. * The integration of GitHub Actions is now supported to automate the deployment and lifecycle management of these AI agents, bridging the gap between traditional DevOps and agentic AI development. These updates signal a strategic shift toward highly specialized infrastructure, both in terms of regulatory compliance with the Sovereign Cloud and raw performance with the X8i instances. Organizations looking to scale their AI operations should prioritize the new deployment patterns for Bedrock AgentCore to ensure a robust CI/CD pipeline for their autonomous agents.

toss

Toss Income QA Platform (opens in new tab)

Toss's QA team developed an internal "QA Platform" to solve the high barrier to entry associated with using Swagger for manual testing and data setup. By transforming complex, multi-step API calls into a simple, button-based GUI, the team successfully empowered non-QA members to perform self-verification. This shift effectively moved quality assurance from a final-stage bottleneck to a continuous, integrated part of the development process, significantly increasing product delivery speed. ### Lowering the Barrier to Test APIs * Existing Swagger documentation was functionally complete but difficult for developers or planners to use due to the need for manual JSON editing and sequential API execution. * The QA Platform does not create new APIs; instead, it provides a GUI layer over existing Swagger Test APIs to make them accessible without technical documentation. * The system offers two distinct interfaces: "Normal Mode" for simplified, one-click testing and "Swagger Mode" for granular control over request bodies and parameters. ### From Manual Clicks to Automation and Management * Phase 1 focused on visual accessibility, allowing users to trigger complex data states via buttons rather than manual API orchestration. * Phase 2 integrates existing automation scripts into the platform, removing the need for local environment setups and allowing anyone to execute automated test suites. * The final phase aims to transition into a comprehensive Test Management System (TMS) tailored to the team's specific workflow, reducing reliance on third-party external tools. ### Redefining Quality as a Design Choice * By reducing the time and mental effort required to run a test, verification became a frequent, daily habit for the entire product team rather than a chore for the QA department. * Lowering the "cost" of testing replaced guesswork with data-driven confidence, allowing the team to move faster during development. * This initiative reflects a philosophical shift where quality is no longer viewed as a final checklist item but as a core structural element designed into the development lifecycle. The primary takeaway for engineering teams is that the speed of a product is often limited by the friction of its testing process. By building internal tools that democratize testing capabilities—making them available to anyone regardless of their technical role—organizations can eliminate verification delays and foster a culture where quality is a shared responsibility.

toss

How I Tole Down Our Legacy (opens in new tab)

Toss Payments modernized its inherited legacy infrastructure by building an OpenStack-based private cloud to operate alongside public cloud providers in an Active-Active hybrid configuration. By overcoming extreme technical debt—including servers burdened with nearly 2,000 manual routing entries—the team achieved a cloud-agnostic deployment environment that ensures high availability and cost efficiency. The transformation demonstrates how a small team can successfully implement complex open-source infrastructure through automation and the rigorous technical internalization of Cluster API and OpenStack. ### The Challenge of Legacy Networking - The inherited infrastructure relied on server-side routing rather than network equipment, meaning every server carried its own routing table. - Some legacy servers contained 1,997 individual routing entries, making manual management nearly impossible and preventing efficient scaling. - Initial attempts to solve this via public cloud (AWS) faced limitations, including rising costs due to exchange rates, lack of deep visibility for troubleshooting, and difficulties in disaster recovery (DR) configuration between public and on-premise environments. ### Scaling OpenStack with a Two-Person Team - Despite having only two engineers with no prior OpenStack experience, the team chose the open-source platform to maintain 100% control over the infrastructure. - The team internalized the technology by installing three different versions of OpenStack dozens of times and simulating various failure scenarios. - Automation was prioritized using Ansible and Terraform to manage the lifecycle of VMs and load balancers, enabling new instance creation in under 10 seconds. - Deep technical tuning was applied, such as modifying the source code of the Octavia load balancer to output custom log formats required for their specific monitoring needs. ### High Availability and Monitoring Strategy - To ensure reliability, the team built three independent OpenStack clusters operating in an Active-Active configuration. - This architecture allows for immediate traffic redirection if a specific cluster fails, minimizing the impact on service availability. - A comprehensive monitoring stack was implemented using Zabbix, Prometheus, Mimir, and Grafana to collect and visualize every essential metric across the private cloud. ### Managing Kubernetes with Cluster API - To replicate the convenience of Public Cloud PaaS (like EKS), the team implemented Cluster API to manage the Kubernetes lifecycle. - Cluster API treats Kubernetes clusters themselves as resources within a management cluster, allowing for standardized and rapid deployment across the private environment. - This approach ensures that developers can deploy applications without needing to distinguish between the underlying cloud providers, fulfilling the goal of "cloud-agnostic" infrastructure. ### Practical Recommendation For organizations dealing with massive technical debt or high public cloud costs, the Toss Payments model suggests that a "Private-First" hybrid approach is viable even with limited headcount. The key is to avoid proprietary black-box solutions and instead invest in the technical internalization of open-source tools like OpenStack and Cluster API, backed by a "code-as-infrastructure" philosophy to ensure scalability and reliability.

naver

Analysis of Naver Integrated (opens in new tab)

The integration of AI Briefing (AIB) into Naver Search has led to a noticeable increase in Largest Contentful Paint (LCP) values, with p95 metrics rising to approximately 3.1 seconds. This shift is primarily driven by the architectural mismatch between traditional performance metrics and the dynamic, streaming nature of AI chat interfaces. The analysis concludes that while AIB appears to degrade performance on paper, the delay is largely a result of how browsers measure rendering in incremental UI patterns. ### Impact of AIB on Search Performance * Since the introduction of AIB’s chat-based UI in July 2025, LCP p95 has moved beyond the 2.5-second target, showing a direct correlation with AIB traffic volume. * The performance degradation is characterized by a "tail" effect, where a higher percentage of users fall into slower LCP buckets despite stable server response times. * Unlike Google’s AI Overview, which renders in larger blocks, Naver’s AIB uses word-by-word animations and frequent UI updates that place a heavier burden on the browser's rendering engine. ### Client-Side Rendering Bottlenecks * Performance profiling indicates that the delay is localized to the client-side rendering phase rather than the network or server. * Initial rendering includes a skeleton UI period of roughly 900ms, followed by sequential text animations that push the final paint time back. * Comparative data shows that when AIB is the LCP candidate, the p75 value reaches 4.5 seconds—significantly slower than other heavy components like map modules. ### Structural Misalignment with LCP Measurement * **DOM Reconstruction:** After text animations finish, AIB rebuilds the DOM to enable citation highlighting and hover interactions, which triggers Chromium to update the LCP timestamp to this much later point. * **Candidate Fragmentation:** Streaming text at the word level prevents the browser from identifying a single large text block; instead, small, insignificant fragments are often incorrectly selected as the LCP candidate. * **Paint Invalidation:** Chromium’s rendering pipeline treats every new word in a streaming response as a layer update, causing repeated paint invalidations that push the `renderTime` forward frame-by-frame until the entire message is complete. ### New Metrics for AI-Driven Interfaces * To more accurately reflect user experience, Naver is shifting toward Time to First Token (TTFT) as a primary metric for AIB, focusing on how quickly the first meaningful response appears. * Standard LCP remains a valid quality indicator for static search results, but it is no longer treated as a universal benchmark for interactive AI components. * Future performance management will involve more granular distribution analysis and "predictive" performance modeling rather than simply optimizing for a single threshold like the 2.5-second LCP mark. To effectively manage performance in the era of generative AI, organizations should move away from relying solely on LCP for streaming interfaces. Implementing TTFT as a complementary metric provides a better representation of perceived speed, while optimizing the timing of DOM reconstructions can prevent unnecessary measurement delays in Chromium-based browsers.

line

Code Quality Improvement Techniques Part 29 (opens in new tab)

Complexity in software often arises from "Gordian Variables," where tangled data dependencies make the logic flow difficult to trace and maintain. By identifying and designing an ideal intermediate data structure, developers can decouple these dependencies and simplify complex operations. This approach replaces convoluted conditional checks with a clean, structured data flow that highlights the core business logic. ## The Complexity of Tangled Dependencies Synchronizing remote data with local storage often leads to fragmented logic when the relationship between data IDs and objects is not properly managed. * Initial implementations frequently use set operations like `subtract` on ID lists to determine which items to create, update, or delete. * This approach forces the program to re-access original data sets multiple times, creating a disconnected flow between identifying a change and executing it. * Dependency entanglements often necessitate "impossible" runtime error handling (e.g., `error("This must not happen")`) because the compiler cannot guarantee data presence within maps during the update phase. * Inconsistent processing patterns emerge, where "add" and "update" logic might follow one sequence while "delete" logic follows an entirely different one. ## Designing Around Intermediate Data Structures To untangle complex flows, developers should work backward from an ideal data representation that categorizes all possible states—additions, updates, and deletions. * The first step involves creating lookup maps for both remote and local entries to provide O(1) access to data objects. * A unified collection of all unique IDs from both sources serves as the foundation for a single, comprehensive transformation pass. * A specialized utility function, such as `partitionByNullity`, can transform a sequence of data pairs (`Pair<Remote?, Local?>`) into three distinct, non-nullable lists. * This transformation results in a `Triple` containing `createdEntries`, `updatedEntries` (as pairs), and `deletedEntries`, effectively separating data preparation from business execution. ## Improved Synchronization Flow Restructuring the function around categorized lists allows the primary synchronization logic to remain concise and readable. * The synchronization function becomes a sequence of two phases: data categorization followed by execution loops. * By using the `partitionByNullity` pattern, the code eliminates the need for manual null checks or "impossible" error branches during the update process. * The final implementation highlights the most important part of the code—the `forEach` blocks for adding, updating, and deleting—by removing the noise of ID-based lookups and set mathematics. When faced with complex data dependencies, prioritize the creation of a clean intermediate data structure over-optimizing individual logical branches. Designing a data flow that naturally represents the different states of your business logic will result in more robust, self-documenting, and maintainable code.

aws

Amazon EC2 X8i instances powered by custom Intel Xeon 6 processors are generally available for memory-intensive workloads | AWS News Blog (opens in new tab)

Amazon has announced the general availability of EC2 X8i instances, specifically engineered for memory-intensive workloads such as SAP HANA, large-scale databases, and data analytics. Powered by custom Intel Xeon 6 processors with a 3.9 GHz all-core turbo frequency, these instances provide a significant performance leap over the previous X2i generation. By offering up to 6 TB of memory and substantial improvements in throughput, X8i instances represent the highest-performing Intel-based memory-optimized option in the AWS cloud. ### Performance Enhancements and Processor Architecture * **Custom Silicon:** The instances utilize custom Intel Xeon 6 processors available exclusively on AWS, delivering the fastest memory bandwidth among comparable Intel cloud processors. * **Memory and Bandwidth:** X8i provides 1.5 times more memory capacity (up to 6 TB) and 3.4 times more memory bandwidth compared to previous-generation X2i instances. * **Workload Benchmarks:** Real-world performance gains include a 50% increase in SAP Application Performance Standard (SAPS), 47% faster PostgreSQL performance, 88% faster Memcached performance, and a 46% boost in AI inference. ### Scalable Instance Sizes and Throughput * **Flexible Sizing:** The instances are available in 14 sizes, including new larger formats such as the 48xlarge, 64xlarge, and 96xlarge. * **Bare Metal Options:** Two bare metal sizes (metal-48xl and metal-96xl) are available for workloads requiring direct access to physical hardware resources. * **Networking and Storage:** The architecture supports up to 100 Gbps of network bandwidth with Elastic Fabric Adapter (EFA) support and up to 80 Gbps of Amazon EBS throughput. * **Bandwidth Control:** Support for Instance Bandwidth Configuration (IBC) allows users to customize the allocation of performance between networking and EBS to suit specific application needs. ### Cost Efficiency and Use Cases * **Licensing Optimization:** In preview testing, customers like Orion reduced SQL Server licensing costs by 50% by maintaining performance thresholds with fewer active cores compared to older instance types. * **Enterprise Applications:** The instances are SAP-certified, making them ideal for RISE with SAP and other high-demand ERP environments. * **Broad Utility:** Beyond databases, the instances are optimized for Electronic Design Automation (EDA) and complex data analytics that require massive memory footprints. For organizations managing massive datasets or expensive licensed database software, migrating to X8i instances offers a clear path to both performance optimization and infrastructure cost reduction. These instances are currently available in the US East (N. Virginia), US West (Oregon), and Europe (Ireland) regions through On-Demand, Spot, and Reserved purchasing models.

toss

Creating the New Face of Toss (opens in new tab)

Toss redesigned its brand persona graphics to transition from simple, child-like icons to more professional and inclusive human figures that better represent the brand's identity. This update aims to project a more trustworthy and intelligent image while ensuring the visual language is prepared for a global, multi-cultural audience. By balancing iconic simplicity with diverse representation, the new design system maintains brand consistency across various screen sizes and service contexts. ### Refining Proportions for Professionalism * The team adjusted the vertical facial ratio to move away from a "child-like" impression, finding a balance that suggests maturity and intelligence without losing the icon's friendly nature. * The placement of the eyes, nose, and mouth was meticulously tuned to maintain an iconic look while increasing the perceived level of trust. * Structural improvements were made to the body, specifically refining the curves where the neck and shoulders meet to eliminate the unnatural "blocky" feel of previous versions. * A short turtleneck was selected as the default attire to provide a clean, professional, and sophisticated look that works across different UI environments. ### Achieving Gender-Neutral Hairstyles * The design team aimed for "neutrality" in hair design to prevent the characters from being categorized into specific gender roles. * Several iterations were tested, including high-density detailed styles (which were too complex) and simple line-separated styles (which lacked visual density when scaled up). * The final selection focuses on a clean silhouette that follows the head line while adding enough volume to ensure the graphic feels complete and high-quality at any size. ### Implementing Universal Skin Tones and Diversity * To support Toss's expansion into global markets, the team moved away from a single skin tone that could be interpreted as a specific race. * While a "neutral yellow" (similar to standard emojis) was considered, it was ultimately rejected because it felt inconsistent and jarring when displayed in larger formats within the app. * Instead of a single "neutral" color, the team defined a palette of five distinct skin tones based on universal emoji standards. * New guidelines were established to mix these different skin tones in scenes with multiple characters, fostering a sense of inclusivity and representation that reflects a diverse user base. The evolution of the Toss persona illustrates that as a service grows, its visual language must move beyond simple aesthetics to address broader values like trust and inclusivity. Moving forward, the design system will continue to expand to ensure that no user feels excluded by age, gender, or race.

daangn

The Journey of Karrot Pay (opens in new tab)

Daangn Pay’s backend evolution demonstrates how software architecture must shift from a focus on development speed to a focus on long-term sustainability as a service grows. Over four years, the platform transitioned from a simple layered structure to a complex monorepo powered by Hexagonal and Clean Architecture principles to manage increasing domain complexity. This journey highlights that technical debt is often the price of early success, but structural refactoring is essential to support organizational scaling and maintain code quality. ## Early Speed with Layered Architecture * The initial system was built using a standard Controller-Service-Repository pattern to meet the urgent deadline for obtaining an electronic financial business license. * This simple structure allowed for rapid development and the successful launch of core remittance and wallet features. * As the service expanded to include promotions, billing, and points, the "Service" layer became overloaded with cross-cutting concerns like validation and permissions. * The lack of strict boundaries led to circular dependencies and "spaghetti code," making the system fragile and difficult to test or refactor. ## Decoupling Logic via Hexagonal Architecture * To address the tight coupling between business logic and infrastructure, the team adopted a Hexagonal (Ports and Adapters) approach. * The system was divided into three distinct modules: `domain` (pure POJO rules), `usecase` (orchestration of scenarios), and `adapter` (external implementations like DBs and APIs). * This separation ensured that core business logic remained independent of the Spring Framework or specific database technologies. * While this solved dependency issues and improved reusability across REST APIs and batch jobs, it introduced significant boilerplate code and the complexity of mapping between different data models (e.g., domain entities vs. persistence entities). ## Scaling to a Monorepo and Clean Architecture * As Daangn Pay grew from a single project into dozens of services handled by multiple teams, a Monorepo structure was implemented using Gradle multi-projects. * The architecture evolved to separate "Domain" modules (pure business logic) from "Service" modules (the actual runnable applications like API servers or workers). * An "Internal-First" policy was adopted, where modules are private by default and can only be accessed through explicitly defined public APIs to prevent accidental cross-domain contamination. * This setup currently manages over 30 services, providing a balance between code sharing and strict boundary enforcement between domains like Money, Billing, and Points. The evolution of Daangn Pay’s architecture serves as a practical reminder that there is no "perfect" architecture from the start; rather, the best design is one that adapts to the current size of the organization and the complexity of the business. Engineers should prioritize flexibility and structural constraints that guide developers toward correct patterns, ensuring the codebase remains manageable even as the team and service scale.

aws

Opening the AWS European Sovereign Cloud | AWS News Blog (opens in new tab)

AWS has officially launched the AWS European Sovereign Cloud, a specialized infrastructure designed to meet the rigorous data residency and operational autonomy requirements of European public sector organizations and highly regulated industries. This new offering provides a fully featured cloud environment that is physically and logically separate from existing AWS Regions, ensuring all data and metadata remain entirely within the European Union. By bridging the gap between legacy on-premises security and modern cloud innovation, AWS enables sensitive workloads to operate under strict European jurisdiction and independent governance. **Strategic Independence and Operational Control** Organizations in the EU often face complex regulatory hurdles that prevent them from using standard public cloud offerings, frequently forcing them to remain on aging on-premises hardware. The AWS European Sovereign Cloud addresses these challenges through: * **Independent Operations:** The infrastructure is operated independently from other AWS Regions, providing a distinct management layer specific to the EU. * **Enhanced Sovereignty Controls:** Robust technical controls and legal protections are integrated to ensure that data remains under European jurisdiction. * **Governance Autonomy:** The cloud is built to provide European entities with full control over their data residency and operational transparency. **Independent Infrastructure and Regional Presence** The architecture is designed for high availability and resilience, ensuring that mission-critical services remain functional regardless of external connectivity. * **Initial Region:** The first region is now generally available in Brandenburg, Germany, serving as the primary hub for the sovereign infrastructure. * **Redundancy:** The infrastructure utilizes multiple Availability Zones with redundant power and networking to maintain continuous operation. * **Isolated Connectivity:** The design allows the cloud to continue operating even if connectivity to the rest of the global AWS network is interrupted. **Expansion and Hybrid Deployment Options** To support the diverse needs of EU member states, AWS is expanding the footprint of this sovereign infrastructure through localized hardware and edge services. * **Sovereign Local Zones:** Future expansion plans include new Local Zones in Belgium, the Netherlands, and Portugal to provide low-latency access within specific borders. * **Hybrid Integration:** Customers can extend sovereign infrastructure to their own data centers using AWS Outposts or AWS Dedicated Local Zones. * **Advanced Capabilities:** The platform supports specialized workloads through AWS AI Factories, allowing regulated industries to leverage artificial intelligence within a sovereign boundary. For European organizations navigating strict compliance landscapes, the AWS European Sovereign Cloud provides a viable path to digital transformation. Decision-makers should evaluate their current on-premises or restricted cloud environments to determine how these new sovereign regions and local zones can fulfill upcoming data residency mandates while providing access to advanced cloud-native services.

kakao

Kanana-2 Development Story (1 (opens in new tab)

Kakao has introduced Kanana-2, a series of language models utilizing a Mixture of Experts (MoE) architecture to achieve high intelligence while maintaining low inference costs. To support the stable pre-training of their largest 155B parameter model, the team implemented advanced technical stacks including the Muon optimizer and MuonClip to prevent training instabilities. These developments reflect a strategic focus on balancing large-scale performance with "high-efficiency, low-cost" engineering. ### MoE Architecture and Scaling Strategy * Kanana-2 models, such as the 32B version, activate only 3B parameters during inference to maximize computational efficiency without sacrificing the intelligence of a larger model. * The team is currently training a massive 155B parameter version (Kanana-2-155b-a17b) using FP8 training infrastructure, MuonClip, and Hyperparameter Transfer to ensure stable convergence. * Custom-developed MoE kernels were integrated to reduce memory usage and increase training speed, resulting in a highly stable Loss Curve even during constant learning rate phases. ### A Controlled Testbed for Mid- and Post-Training * The Kanana-2-30b-a3b-base-2601 model was intentionally released without synthetic reasoning data to serve as a "clean" base for research. * This model allows researchers to investigate phenomena like "Reasoning Trace Distribution Mismatch" and "Spurious Rewards" by providing a baseline unaffected by post-training interventions. * By offering a high-quality Korean base model, Kakao aims to support the local AI community in conducting more rigorous experiments on mathematical and logical reasoning. ### Optimization with Muon and Polar Express * Kakao shifted from the industry-standard AdamW optimizer to Muon, which updates parameters by orthogonalizing gradients rather than performing element-wise updates. * To achieve more accurate orthogonalization, they implemented the Polar Express iterative algorithm instead of the standard Newton-Schulz method, aiming to reduce noise in weight updates during the latter stages of large-scale training. * The optimization process also involved detailed adjustments to RMSNorm parameterization and learning rate (LR) management to ensure the model scales effectively. ### Training Stability via MuonClip * To address potential "logit explosion" in large-scale models, the team utilized MuonClip, a technique that clips attention logits to maintain stability. * Because standard Flash Attention stores Max Logit values only on-chip, the team modified the Flash Attention kernels to extract and return these values for monitoring and clipping purposes. * Stress tests conducted with high learning rates proved that MuonClip prevents training divergence and maintains performance levels even when the model is pushed to its limits. The development of Kanana-2 demonstrates that scaling to hundreds of billions of parameters requires more than just data; it necessitates deep architectural optimizations and custom kernel engineering. For organizations looking to train large-scale MoE models, adopting sophisticated orthogonalization optimizers and logit clipping mechanisms is highly recommended to ensure predictable and stable model convergence.