aws

8 posts

aws

AWS Weekly Roundup: Kiro CLI latest features, AWS European Sovereign Cloud, EC2 X8i instances, and more (January 19, 2026) | AWS News Blog (opens in new tab)

The January 19, 2026, AWS Weekly Roundup highlights significant advancements in sovereign cloud infrastructure and the general availability of high-performance, memory-optimized compute instances. The update also emphasizes the maturing ecosystem of AI agents, focusing on enhanced developer tooling and streamlined deployment workflows for agentic applications. These releases collectively aim to satisfy stringent regulatory requirements in Europe while pushing the boundaries of enterprise performance and automated productivity. ## Developer Tooling and Kiro CLI Enhancements * New granular controls for web fetch URLs allow developers to use allowlists and blocklists to strictly govern which external resources an agent can access. * The update introduces custom keyboard shortcuts to facilitate seamless switching between multiple specialized agents within a single session. * Enhanced diff views provide clearer visibility into changes, improving the debugging and auditing process for automated workflows. ## AWS European Sovereign Cloud General Availability * Following its initial 2023 announcement, this independent cloud infrastructure is now generally available to all customers. * The environment is purpose-built to meet the most rigorous sovereignty and data residency requirements for European organizations. * It offers a comprehensive set of AWS services within a framework that ensures operational independence and localized data handling. ## High-Performance Computing with EC2 X8i Instances * The memory-optimized X8i instances, powered by custom Intel Xeon 6 processors, have moved from preview to general availability. * These instances feature a sustained all-core turbo frequency of 3.9 GHz, which is currently exclusive to the AWS platform. * The hardware is SAP certified and engineered to provide the highest memory bandwidth and performance for memory-intensive enterprise workloads compared to other Intel-based cloud offerings. ## Agentic AI and Productivity Updates * Amazon Quick Suite continues to expand as a workplace "agentic teammate," designed to synthesize research and execute actions based on organizational insights. * New technical guidance has been released regarding the deployment of AI agents on Amazon Bedrock AgentCore. * The integration of GitHub Actions is now supported to automate the deployment and lifecycle management of these AI agents, bridging the gap between traditional DevOps and agentic AI development. These updates signal a strategic shift toward highly specialized infrastructure, both in terms of regulatory compliance with the Sovereign Cloud and raw performance with the X8i instances. Organizations looking to scale their AI operations should prioritize the new deployment patterns for Bedrock AgentCore to ensure a robust CI/CD pipeline for their autonomous agents.

toss

Managing Thousands of API/Batch Servers (opens in new tab)

Toss Payments manages thousands of API and batch server configurations that handle trillions of won in transactions, where a single typo in a JVM setting can lead to massive financial infrastructure failure. To solve the risks associated with manual "copy-paste" workflows and configuration duplication, the team developed a sophisticated system that treats configuration as code. By implementing layered architectures and dynamic templates, they created a testable, unified environment capable of managing complex hybrid cloud setups with minimal human error. ## Overlay Architecture for Hierarchical Control * The team implemented a layered configuration system consisting of `global`, `cluster`, `phase`, and `application` levels. * Settings are resolved by priority, where lower-level layers override higher-level defaults, allowing servers to inherit common settings while maintaining specific overrides. * This structure allows the team to control environment-specific behaviors, such as disabling canary deployments in development environments, from a single centralized directory. * The directory structure maps files 1:1 to their respective layers, ensuring that naming conventions drive the CI/CD application process. ## Solving Duplication with Template Patterns * Standard YAML overlays often fail when dealing with long strings or arrays, such as `JVM_OPTION`, because changing a single value usually requires redefining the entire block. * To prevent the proliferation of nearly identical environment variables, the team introduced a template pattern using placeholders like `{{MAX_HEAP}}`. * Developers can modify specific parameters at the application layer while the core string remains defined at the global layer, significantly reducing the risk of typos. * This approach ensures that critical settings, like G1GC parameters or heap region sizes, remain consistent across the infrastructure unless explicitly changed. ## Dynamic and Conditional Configuration Logic * The system allows for "evolutionary" configurations where Python scripts can be injected to generate dynamic values, such as random JMX ports or data fetched from remote APIs. * Advanced conditional logic was added to handle complex deployment scenarios, enabling environment variables to change their values automatically based on the target cluster name (e.g., different profiles for AWS vs. IDC). * By treating configuration as a living codebase, the team can adapt to new infrastructure requirements without abandoning their core architectural principles. ## Reliable Batch Processing through Simplicity * For batch operations handling massive settlement volumes, the team prioritized "appropriate technology" and simplicity to minimize failure points. * They chose Jenkins for its low learning curve and reliability, despite its lack of native GitOps support. * To address inconsistencies in manual UI entries and varying Java versions across machines, they standardized the batch infrastructure to ensure that high-stakes financial calculations are executed in a controlled, predictable environment. The most effective way to manage large-scale infrastructure is to transition from static, duplicated configuration files to a dynamic, code-centric system. By combining an overlay architecture for hierarchy and a template pattern for granular changes, organizations can achieve the flexibility needed for hybrid clouds while maintaining the strict safety standards required for financial systems.

netflix

Netflix Live Origin. Xiaomei Liu, Joseph Lynch, Chris Newton | by Netflix Technology Blog | Dec, 2025 | Netflix TechBlog (opens in new tab)

The Netflix Live Origin is a specialized, multi-tenant microservice designed to bridge the gap between cloud-based live streaming pipelines and the Open Connect content delivery network. By operating as an intelligent broker, it manages content selection across redundant regional pipelines to ensure that only valid, high-quality segments are distributed to client devices. This architecture allows Netflix to achieve high resilience and stream integrity through server-side failover and deterministic segment selection. ### Multi-Pipeline and Multi-Region Awareness * The origin server mitigates common live streaming defects, such as missing segments, timing discontinuities, and short segments containing missing video or audio samples. * It leverages independent, redundant streaming pipelines across different AWS regions to ensure high availability; if one pipeline fails or produces a defective segment, the origin selects a valid candidate from an alternate path. * Implementation of epoch locking at the cloud encoder level allows the origin to interchangeably select segments from various pipelines. * The system uses lightweight media inspection at the packager level to generate metadata, which the origin then uses to perform deterministic candidate selection. ### Stream Distribution and Protocol Integration * The service operates on AWS EC2 instances and utilizes standard HTTP protocol features for communication. * Upstream packagers use HTTP PUT requests to push segments into storage at specific URLs, while the downstream Open Connect network retrieves them via GET requests. * The architecture is optimized for a manifest design that uses segment templates and constant segment durations, which reduces the need for frequent manifest refreshes. ### Open Connect Streaming Optimization * While Netflix’s Open Connect Appliances (OCAs) were originally optimized for VOD, the Live Origin extends nginx proxy-caching functionality to meet live-specific requirements. * OCAs are provided with Live Event Configuration data, including Availability Start Times and initial segment numbers, to determine the legitimate range of segments for an event. * This predictive modeling allows the CDN to reject requests for objects outside the valid range immediately, reducing unnecessary traffic and load on the origin. By decoupling the live streaming pipeline from the distribution network through this specialized origin layer, Netflix can maintain a high level of fault tolerance and stream stability. This approach minimizes client-side complexity by handling failovers and segment selection on the server side, ensuring a seamless experience for viewers of live events.

aws

AWS Weekly Roundup: AWS re:Invent keynote recap, on-demand videos, and more (December 8, 2025) | AWS News Blog (opens in new tab)

The December 8, 2025, AWS Weekly Roundup recaps the major themes from AWS re:Invent, signaling a significant industry transition from AI assistants to autonomous AI agents. While technical innovation in infrastructure remains a priority, the event underscored that developers remain at the heart of the AWS mission, empowered by new tools to automate complex tasks using natural language. This shift represents a "renaissance" in cloud computing, where purpose-built infrastructure is now designed to support the non-deterministic nature of agentic workloads. ## Community Recognition and the Now Go Build Award * Raphael Francis Quisumbing (Rafi) from the Philippines was honored with the Now Go Build Award, presented by Werner Vogels. * A veteran of the ecosystem, Quisumbing has served as an AWS Hero since 2015 and has co-led the AWS User Group Philippines for over a decade. * The recognition emphasizes AWS's continued focus on community dedication and the role of individual builders in empowering regional developer ecosystems. ## The Evolution from AI Assistants to Agents * AWS CEO Matt Garman identified AI agents as the next major inflection point for the industry, moving beyond simple chat interfaces to systems that perform tasks and automate workflows. * Dr. Swami Sivasubramanian highlighted a paradigm shift where natural language serves as the primary interface for describing complex goals. * These agents are designed to autonomously generate plans, write necessary code, and call various tools to execute complete solutions without constant human intervention. * AWS is prioritizing the development of production-ready infrastructure that is secure and scalable specifically to handle the "non-deterministic" behavior of these AI agents. ## Core Infrastructure and the Developer Renaissance * Despite the focus on AI, AWS reaffirmed that its core mission remains the "freedom to invent," keeping developers central to its 20-year strategy. * Leaders Peter DeSantis and Dave Brown reinforced that foundational attributes—security, availability, and performance—remain the non-negotiable pillars of the AWS cloud. * The integration of AI agents is framed as a way to finally realize material business returns on AI investments by moving from experimental use cases to automated business logic. To maximize the value of these updates, organizations should begin evaluating how to transition from simple LLM implementations to agentic frameworks that can execute end-to-end business processes. Reviewing the on-demand keynote sessions from re:Invent 2025 is recommended for technical teams looking to implement the latest secure, agent-ready infrastructure.

aws

Introducing checkpointless and elastic training on Amazon SageMaker HyperPod | AWS News Blog (opens in new tab)

Amazon SageMaker HyperPod has introduced checkpointless and elastic training features to accelerate AI model development by minimizing infrastructure-related downtime. These advancements replace traditional, slow checkpoint-restart cycles with peer-to-peer state recovery and enable training workloads to scale dynamically based on available compute capacity. By decoupling training progress from static hardware configurations, organizations can significantly reduce model time-to-market while maximizing cluster utilization. **Checkpointless Training and Rapid State Recovery** * Replaces the traditional five-stage recovery process—including job termination, network setup, and checkpoint retrieval—which can often take up to an hour on self-managed clusters. * Utilizes peer-to-peer state replication and in-process recovery to allow healthy nodes to restore the model state instantly without restarting the entire job. * Incorporates technical optimizations such as collective communications initialization and memory-mapped data loading to enable efficient data caching. * Reduces recovery downtime by over 80% based on internal studies of clusters with up to 2,000 GPUs, and was a core technology used in the development of Amazon Nova models. **Elastic Training and Automated Cluster Scaling** * Allows AI workloads to automatically expand to use idle cluster capacity as it becomes available and contract when resources are needed for higher-priority tasks. * Reduces the need for manual intervention, saving hours of engineering time previously spent reconfiguring training jobs to match fluctuating compute availability. * Optimizes total cost of ownership by ensuring that training momentum continues even as inference volumes peak and pull resources away from the training pool. * Orchestrates these transitions seamlessly through the HyperPod training operator, ensuring that model development is not disrupted by infrastructure changes. For teams managing large-scale AI workloads, adopting these features can reclaim significant development time and lower operational costs by preventing idle cluster periods. Organizations scaling to thousands of accelerators should prioritize checkpointless training to mitigate the impact of hardware faults and maintain continuous training momentum.

aws

Amazon OpenSearch Service improves vector database performance and cost with GPU acceleration and auto-optimization | AWS News Blog (opens in new tab)

Amazon OpenSearch Service has introduced serverless GPU acceleration and auto-optimization features designed to enhance the performance and cost-efficiency of large-scale vector databases. These updates allow users to build vector indexes up to ten times faster at a quarter of the traditional indexing cost, enabling the creation of billion-scale databases in under an hour. By automating complex tuning processes, OpenSearch Service simplifies the deployment of generative AI and high-speed search applications. ### GPU Acceleration for Rapid Indexing The new serverless GPU acceleration streamlines the creation of vector data structures by offloading intensive workloads to specialized hardware. * **Performance Gains:** Indexing speed is increased by 10x compared to non-GPU configurations, significantly reducing the time-to-market for data-heavy applications. * **Cost Efficiency:** Indexing costs are reduced to approximately 25% of standard costs, and users only pay for active processing through OpenSearch Compute Units (OCU) rather than idle instance time. * **Serverless Management:** There is no need to provision or manage GPU instances manually; OpenSearch Service automatically detects acceleration opportunities and isolates workloads within the user's Amazon VPC. * **Operational Scope:** Acceleration is automatically applied to both initial indexing and subsequent force-merge operations. ### Automated Vector Index Optimization Auto-optimization removes the requirement for deep vector expertise by automatically balancing competing performance metrics. * **Simplified Tuning:** The system replaces manual index tuning—which can traditionally take weeks—with automated configurations. * **Resource Balancing:** The tool finds the optimal trade-off between search latency, search quality (recall rates), and memory requirements. * **Improved Accuracy:** Users can achieve higher recall rates and better cost savings compared to using default, unoptimized index configurations. ### Configuration and Integration These features can be integrated into new or existing OpenSearch Service domains and Serverless collections through the AWS Console or CLI. * **CLI Activation:** Users can enable acceleration on existing domains using the `update-domain-config` command with the `--aiml-options` flag set to enable `ServerlessVectorAcceleration`. * **Index Settings:** To leverage GPU processing, users must create a vector index with specific settings, notably setting `index.knn.remote_index_build.enabled` to `true`. * **Supported Workloads:** The service supports standard OpenSearch operations, including the Bulk API for adding vector data and text embeddings. For organizations managing large-scale vector workloads for RAG (Retrieval-Augmented Generation) or semantic search, enabling GPU acceleration is a highly recommended step to reduce operational overhead. Developers should transition existing indexes to include the `remote_index_build` setting to take immediate advantage of the improved speed and reduced OCU pricing.

coupang

Optimizing operational costs through cloud (opens in new tab)

Coupang’s Finance and Engineering teams collaborated to optimize cloud expenditures by focusing on resource efficiency and the company's "Hate Waste" leadership principle. Through a dedicated optimization project team and the implementation of data-driven analytics, the company successfully reduced on-demand costs by millions of dollars without compromising business growth. This initiative transformed cloud management from a reactive expense into a proactive engineering culture centered on financial accountability and technical efficiency. ### Forming the Optimization Project Team * A specialized team consisting of Cloud Infrastructure Engineers and Technical Program Managers (TPMs) was established to bridge the gap between finance and engineering. * The project team focused on educating domain teams about the variable cost model of cloud services, moving away from a fixed-cost mindset. * Technical experts helped domain teams identify opportunities to use cost-efficient technologies, such as ARM-based AWS Graviton processors and AWS Spot Instances for data processing. * The initiative established clear ownership, ensuring that each domain team understood and managed their specific cloud resource usage. ### Analytics and Dashboards for Visibility * Engineers developed custom dashboards using Amazon Athena to process Amazon CloudWatch data, providing deep insights into resource performance. * The team utilized AWS Cost & Usage Reports (CUR) within internal Business Intelligence (BI) tools to provide granular visibility into spending patterns. * Finance teams worked alongside engineers to align technical roadmaps with monthly and quarterly budget goals, making cost management a shared responsibility. ### Strategies for Usage and Cost Reduction * **Spend Less (Usage Reduction):** Coupang implemented automation to ensure that non-production environment resources were only active when needed, resulting in a 25% cost saving for those environments. * **Pay Less (Right-sizing):** The team analyzed usage patterns to manually identify and decommission unused EC2 resources across all domain teams. * **Instance and Storage Optimization:** The project prioritized migrating workloads to the latest instance generations and optimizing Amazon S3 storage structures to reduce costs for data at rest. To achieve sustainable cloud efficiency, organizations should move beyond simple monitoring and foster an engineering culture where resource management is a core technical discipline. Prioritizing automated resource scheduling and adopting modern, high-efficiency hardware like Graviton instances are essential steps for any large-scale cloud operation looking to maximize its return on investment.

coupang

Cloud expenditure optimization for cost efficiency | by Coupang Engineering | Coupang Engineering Blog | Medium (opens in new tab)

Coupang addressed rising cloud costs by establishing a cross-functional Central team to bridge the gap between engineering usage and financial accountability. Through a data-driven approach involving custom analytics and automated resource management, the company successfully reduced on-demand expenditure by millions of dollars. This initiative demonstrates that aligning technical infrastructure with financial governance is essential for maintaining growth without unnecessary waste. **The Central Team and Data-Driven Governance** * Coupang formed a specialized Central team consisting of infrastructure engineers and technical program managers to identify efficiency opportunities across the organization. * The team developed custom BI dashboards utilizing Amazon CloudWatch, AWS Cost and Usage Reports (CUR), and Amazon Athena to provide domain teams with actionable insights into their spending. * The finance department partnered with engineering to enforce strict budget compliance, ensuring that domain teams managed their resources within assigned monthly and quarterly limits. **Strategies for Spending and Paying Less** * The company implemented "Spending Less" strategies by automating the launch of resources in non-production environments only when needed, resulting in a 25% cost reduction for those areas. * "Paying Less" initiatives focused on rightsizing, where the Central team worked with domain owners to manually identify and eliminate unutilized or underutilized EC2 resources. * Workloads were migrated to more efficient hardware and pricing models, specifically leveraging ARM-based AWS Graviton processors and AWS Spot Instances for data processing and storage. **Targeted Infrastructure Optimization** * Engineering teams focused on instance generation alignment, ensuring that services were running on the most cost-effective hardware generations available. * Storage costs were reduced by optimizing Amazon S3 structures at rest, improving how data is organized and stored. * The team refined Amazon EMR (Elastic MapReduce) configurations to enhance processing efficiency, significantly lowering the cost of large-scale data analysis. To achieve sustainable cloud efficiency, engineering organizations should move beyond viewing cloud costs as a purely financial concern and instead treat resource management as a core technical metric. By integrating financial accountability directly into the engineering workflow through shared analytics and automated resource controls, companies can foster a culture of efficiency that supports long-term scalability.