Customize your AWS Management Console experience with visual settings including account color, region and service visibility In August 2025, we introduced AWS User Experience Customization (UXC) capability to tailor user interfaces (UIs) to meet your specific needs and complete…
20 years in the AWS Cloud – how time flies! AWS has reached its 20th anniversary! With a steady pace of innovation, AWS has grown to offer over 240 comprehensive cloud services and continues to launch thousands of new features annually for millions of customers. During this time…
Our First 2026 Heroes Cohort Is Here! We’re thrilled to celebrate three exceptional developer community leaders as AWS Heroes. These individuals represent the heart of what makes the AWS community so vibrant. In addition to sharing technical knowledge, they build connections, fo…
Mount Mayhem at Netflix: Scaling Containers on Modern CPUs -- 6 Listen Share Authors: Harshad Sane, Andrew Halaney Imagine this — you click play on Netflix on a Friday night and behind the scenes hundreds of containers spring to action in a few seconds to answer your call. At Ne…
Safeguarding Dynamic Configuration Changes at Scale How Airbnb ships dynamic config changes safely and reliably -- 4 Listen Share By Cosmo Qiu, Bo Teng, Siyuan Zhou, Ankur Soni, Willis Harvey Dynamic configuration is a core infrastructure capability in modern systems. It allows…
Announcing Amazon SageMaker Inference for custom Amazon Nova models Since we launched Amazon Nova customization in Amazon SageMaker AI at AWS NY Summit 2025, customers have been asking for the same capabilities with Amazon Nova as they do when they customize open weights models…
AWS Weekly Roundup: Claude Opus 4.6 in Amazon Bedrock, AWS Builder ID Sign in with Apple, and more (February 9, 2026) Here are the notable launches and updates from last week that can help you build, scale, and innovate on AWS. Last week’s launches Here are the launches that got…
Toss Payments transformed its security infrastructure from a vulnerable, single-layered legacy system into a robust "Defense in Depth" architecture spanning hybrid IDC and AWS environments. By integrating advanced perimeter defense, internal server monitoring, and container runtime security, the team established a comprehensive framework that prioritizes visibility and continuous verification. This four-year journey demonstrates that modern security requires moving beyond simple boundary protection toward a proactive, multi-layered strategy that assumes breaches can occur.
### Perimeter Defense and SSL/TLS Visibility
* Addressed the critical visibility gap in legacy systems by implementing dedicated SSL/TLS decryption tools, allowing the team to analyze encrypted traffic for hidden malicious payloads.
* Established a hybrid security architecture using a combination of physical DDoS protection, IPS, and WAF in IDC environments, complemented by AWS WAF and AI-based GuardDuty in the cloud.
* Developed a collaborative merchant response process that moves beyond simple IP blocking; the system automatically detects malicious traffic from partners and provides them with detailed vulnerability reports and remediation guides (e.g., specific SQL injection points).
### Internal Network Security and "Assume Breach" Monitoring
* Implemented **Wazuh**, an open-source security platform, in IDC environments to monitor lateral movement, collect centralized logs, and perform file integrity checks across diverse operating systems.
* Leveraged **AWS GuardDuty** for intelligent threat detection in the cloud, focusing on malware scanning for EC2 instances and monitoring for suspicious process activities.
* Established automated detection for privilege escalation and unauthorized access to sensitive system files, such as tracking instances where root privileges are obtained to modify the `/etc/passwd` file.
### Container Runtime Security as the Final Defense
* Adopted **Falco**, a CNCF-hosted runtime security tool, to protect Kubernetes environments by monitoring system calls (syscalls) in real-time.
* Configured specific security rules to detect "container escape" attempts, unauthorized access to sensitive files like `/etc/shadow`, and the execution of new or suspicious binaries within running containers.
* Integrated **Falco Sidekick** to manage security events efficiently, ensuring that anomalous behaviors at the container level are instantly routed to the security team for response.
### Zero Trust and Continuous Verification
* Shifted toward a Zero Trust model for the internal work network to ensure that all users and devices are continuously verified regardless of their location.
* Focused on implementing dynamic access control and the principle of least privilege to minimize the potential impact of credential theft or device compromise.
Organizations operating in hybrid cloud environments should move away from relying on a single perimeter and instead adopt a multi-layered defense strategy. True security resilience is achieved by gaining deep visibility into encrypted traffic and maintaining granular monitoring at the server and container levels to intercept threats that inevitably bypass initial defenses.
AWS Weekly Roundup: Amazon EC2 G7e instances, Amazon Corretto updates, and more (January 26, 2026) Hey! It’s my first post for 2026, and I’m writing to you while watching our driveway getting dug out. I hope wherever you are you are safe and warm and your data is still flowing!…
The January 19, 2026, AWS Weekly Roundup highlights significant advancements in sovereign cloud infrastructure and the general availability of high-performance, memory-optimized compute instances. The update also emphasizes the maturing ecosystem of AI agents, focusing on enhanced developer tooling and streamlined deployment workflows for agentic applications. These releases collectively aim to satisfy stringent regulatory requirements in Europe while pushing the boundaries of enterprise performance and automated productivity.
## Developer Tooling and Kiro CLI Enhancements
* New granular controls for web fetch URLs allow developers to use allowlists and blocklists to strictly govern which external resources an agent can access.
* The update introduces custom keyboard shortcuts to facilitate seamless switching between multiple specialized agents within a single session.
* Enhanced diff views provide clearer visibility into changes, improving the debugging and auditing process for automated workflows.
## AWS European Sovereign Cloud General Availability
* Following its initial 2023 announcement, this independent cloud infrastructure is now generally available to all customers.
* The environment is purpose-built to meet the most rigorous sovereignty and data residency requirements for European organizations.
* It offers a comprehensive set of AWS services within a framework that ensures operational independence and localized data handling.
## High-Performance Computing with EC2 X8i Instances
* The memory-optimized X8i instances, powered by custom Intel Xeon 6 processors, have moved from preview to general availability.
* These instances feature a sustained all-core turbo frequency of 3.9 GHz, which is currently exclusive to the AWS platform.
* The hardware is SAP certified and engineered to provide the highest memory bandwidth and performance for memory-intensive enterprise workloads compared to other Intel-based cloud offerings.
## Agentic AI and Productivity Updates
* Amazon Quick Suite continues to expand as a workplace "agentic teammate," designed to synthesize research and execute actions based on organizational insights.
* New technical guidance has been released regarding the deployment of AI agents on Amazon Bedrock AgentCore.
* The integration of GitHub Actions is now supported to automate the deployment and lifecycle management of these AI agents, bridging the gap between traditional DevOps and agentic AI development.
These updates signal a strategic shift toward highly specialized infrastructure, both in terms of regulatory compliance with the Sovereign Cloud and raw performance with the X8i instances. Organizations looking to scale their AI operations should prioritize the new deployment patterns for Bedrock AgentCore to ensure a robust CI/CD pipeline for their autonomous agents.
Toss Payments manages thousands of API and batch server configurations that handle trillions of won in transactions, where a single typo in a JVM setting can lead to massive financial infrastructure failure. To solve the risks associated with manual "copy-paste" workflows and configuration duplication, the team developed a sophisticated system that treats configuration as code. By implementing layered architectures and dynamic templates, they created a testable, unified environment capable of managing complex hybrid cloud setups with minimal human error.
## Overlay Architecture for Hierarchical Control
* The team implemented a layered configuration system consisting of `global`, `cluster`, `phase`, and `application` levels.
* Settings are resolved by priority, where lower-level layers override higher-level defaults, allowing servers to inherit common settings while maintaining specific overrides.
* This structure allows the team to control environment-specific behaviors, such as disabling canary deployments in development environments, from a single centralized directory.
* The directory structure maps files 1:1 to their respective layers, ensuring that naming conventions drive the CI/CD application process.
## Solving Duplication with Template Patterns
* Standard YAML overlays often fail when dealing with long strings or arrays, such as `JVM_OPTION`, because changing a single value usually requires redefining the entire block.
* To prevent the proliferation of nearly identical environment variables, the team introduced a template pattern using placeholders like `{{MAX_HEAP}}`.
* Developers can modify specific parameters at the application layer while the core string remains defined at the global layer, significantly reducing the risk of typos.
* This approach ensures that critical settings, like G1GC parameters or heap region sizes, remain consistent across the infrastructure unless explicitly changed.
## Dynamic and Conditional Configuration Logic
* The system allows for "evolutionary" configurations where Python scripts can be injected to generate dynamic values, such as random JMX ports or data fetched from remote APIs.
* Advanced conditional logic was added to handle complex deployment scenarios, enabling environment variables to change their values automatically based on the target cluster name (e.g., different profiles for AWS vs. IDC).
* By treating configuration as a living codebase, the team can adapt to new infrastructure requirements without abandoning their core architectural principles.
## Reliable Batch Processing through Simplicity
* For batch operations handling massive settlement volumes, the team prioritized "appropriate technology" and simplicity to minimize failure points.
* They chose Jenkins for its low learning curve and reliability, despite its lack of native GitOps support.
* To address inconsistencies in manual UI entries and varying Java versions across machines, they standardized the batch infrastructure to ensure that high-stakes financial calculations are executed in a controlled, predictable environment.
The most effective way to manage large-scale infrastructure is to transition from static, duplicated configuration files to a dynamic, code-centric system. By combining an overlay architecture for hierarchy and a template pattern for granular changes, organizations can achieve the flexibility needed for hybrid clouds while maintaining the strict safety standards required for financial systems.
The Netflix Live Origin is a specialized, multi-tenant microservice designed to bridge the gap between cloud-based live streaming pipelines and the Open Connect content delivery network. By operating as an intelligent broker, it manages content selection across redundant regional pipelines to ensure that only valid, high-quality segments are distributed to client devices. This architecture allows Netflix to achieve high resilience and stream integrity through server-side failover and deterministic segment selection.
### Multi-Pipeline and Multi-Region Awareness
* The origin server mitigates common live streaming defects, such as missing segments, timing discontinuities, and short segments containing missing video or audio samples.
* It leverages independent, redundant streaming pipelines across different AWS regions to ensure high availability; if one pipeline fails or produces a defective segment, the origin selects a valid candidate from an alternate path.
* Implementation of epoch locking at the cloud encoder level allows the origin to interchangeably select segments from various pipelines.
* The system uses lightweight media inspection at the packager level to generate metadata, which the origin then uses to perform deterministic candidate selection.
### Stream Distribution and Protocol Integration
* The service operates on AWS EC2 instances and utilizes standard HTTP protocol features for communication.
* Upstream packagers use HTTP PUT requests to push segments into storage at specific URLs, while the downstream Open Connect network retrieves them via GET requests.
* The architecture is optimized for a manifest design that uses segment templates and constant segment durations, which reduces the need for frequent manifest refreshes.
### Open Connect Streaming Optimization
* While Netflix’s Open Connect Appliances (OCAs) were originally optimized for VOD, the Live Origin extends nginx proxy-caching functionality to meet live-specific requirements.
* OCAs are provided with Live Event Configuration data, including Availability Start Times and initial segment numbers, to determine the legitimate range of segments for an event.
* This predictive modeling allows the CDN to reject requests for objects outside the valid range immediately, reducing unnecessary traffic and load on the origin.
By decoupling the live streaming pipeline from the distribution network through this specialized origin layer, Netflix can maintain a high level of fault tolerance and stream stability. This approach minimizes client-side complexity by handling failovers and segment selection on the server side, ensuring a seamless experience for viewers of live events.
The December 8, 2025, AWS Weekly Roundup recaps the major themes from AWS re:Invent, signaling a significant industry transition from AI assistants to autonomous AI agents. While technical innovation in infrastructure remains a priority, the event underscored that developers remain at the heart of the AWS mission, empowered by new tools to automate complex tasks using natural language. This shift represents a "renaissance" in cloud computing, where purpose-built infrastructure is now designed to support the non-deterministic nature of agentic workloads.
## Community Recognition and the Now Go Build Award
* Raphael Francis Quisumbing (Rafi) from the Philippines was honored with the Now Go Build Award, presented by Werner Vogels.
* A veteran of the ecosystem, Quisumbing has served as an AWS Hero since 2015 and has co-led the AWS User Group Philippines for over a decade.
* The recognition emphasizes AWS's continued focus on community dedication and the role of individual builders in empowering regional developer ecosystems.
## The Evolution from AI Assistants to Agents
* AWS CEO Matt Garman identified AI agents as the next major inflection point for the industry, moving beyond simple chat interfaces to systems that perform tasks and automate workflows.
* Dr. Swami Sivasubramanian highlighted a paradigm shift where natural language serves as the primary interface for describing complex goals.
* These agents are designed to autonomously generate plans, write necessary code, and call various tools to execute complete solutions without constant human intervention.
* AWS is prioritizing the development of production-ready infrastructure that is secure and scalable specifically to handle the "non-deterministic" behavior of these AI agents.
## Core Infrastructure and the Developer Renaissance
* Despite the focus on AI, AWS reaffirmed that its core mission remains the "freedom to invent," keeping developers central to its 20-year strategy.
* Leaders Peter DeSantis and Dave Brown reinforced that foundational attributes—security, availability, and performance—remain the non-negotiable pillars of the AWS cloud.
* The integration of AI agents is framed as a way to finally realize material business returns on AI investments by moving from experimental use cases to automated business logic.
To maximize the value of these updates, organizations should begin evaluating how to transition from simple LLM implementations to agentic frameworks that can execute end-to-end business processes. Reviewing the on-demand keynote sessions from re:Invent 2025 is recommended for technical teams looking to implement the latest secure, agent-ready infrastructure.
Amazon SageMaker HyperPod has introduced checkpointless and elastic training features to accelerate AI model development by minimizing infrastructure-related downtime. These advancements replace traditional, slow checkpoint-restart cycles with peer-to-peer state recovery and enable training workloads to scale dynamically based on available compute capacity. By decoupling training progress from static hardware configurations, organizations can significantly reduce model time-to-market while maximizing cluster utilization.
**Checkpointless Training and Rapid State Recovery**
* Replaces the traditional five-stage recovery process—including job termination, network setup, and checkpoint retrieval—which can often take up to an hour on self-managed clusters.
* Utilizes peer-to-peer state replication and in-process recovery to allow healthy nodes to restore the model state instantly without restarting the entire job.
* Incorporates technical optimizations such as collective communications initialization and memory-mapped data loading to enable efficient data caching.
* Reduces recovery downtime by over 80% based on internal studies of clusters with up to 2,000 GPUs, and was a core technology used in the development of Amazon Nova models.
**Elastic Training and Automated Cluster Scaling**
* Allows AI workloads to automatically expand to use idle cluster capacity as it becomes available and contract when resources are needed for higher-priority tasks.
* Reduces the need for manual intervention, saving hours of engineering time previously spent reconfiguring training jobs to match fluctuating compute availability.
* Optimizes total cost of ownership by ensuring that training momentum continues even as inference volumes peak and pull resources away from the training pool.
* Orchestrates these transitions seamlessly through the HyperPod training operator, ensuring that model development is not disrupted by infrastructure changes.
For teams managing large-scale AI workloads, adopting these features can reclaim significant development time and lower operational costs by preventing idle cluster periods. Organizations scaling to thousands of accelerators should prioritize checkpointless training to mitigate the impact of hardware faults and maintain continuous training momentum.
AWS has expanded its flexible pricing model to include managed database services with the launch of Database Savings Plans, offering up to 35% cost reduction for consistent usage. By committing to a specific hourly spend over a one-year term, customers can maintain cost efficiency across multiple accounts, resource types, and AWS Regions. This initiative simplifies financial management for organizations running diverse data-driven and AI applications while providing the agility to modernize architectures without losing discounted rates.
### Flexibility and Modernization Support
* The plan allows customers to switch between different database engines and deployment types, such as moving from provisioned instances to serverless options, without affecting their savings.
* Usage is portable across AWS Regions, enabling global organizations to shift workloads as business needs evolve while retaining their commitment benefits.
* The model supports ongoing cost optimization by automatically applying discounts to new instance types, sizes, or eligible database offerings as they become available.
### Service Coverage and Tiered Discounts
* Database Savings Plans cover a wide array of services, including Amazon Aurora, RDS, DynamoDB, ElastiCache, DocumentDB, Neptune, Keyspaces, Timestream, and AWS DMS.
* Serverless deployments offer the most significant savings, providing up to 35% off standard on-demand rates.
* Provisioned instances across supported services deliver discounts of up to 20%.
* Specific workloads for Amazon DynamoDB and Amazon Keyspaces receive tailored rates, with up to 18% savings for on-demand throughput and up to 12% for provisioned capacity.
### Implementation and Cost Management
* Customers can purchase and manage these plans through the AWS Billing and Cost Management Console or via the AWS CLI.
* Discounts are applied automatically on an hourly basis to all eligible usage; any consumption exceeding the hourly commitment is billed at the standard on-demand rate.
* Integrated cost management tools allow users to analyze their coverage and utilization, ensuring spend remains predictable even as application usage patterns fluctuate.
For organizations with stable or growing database requirements, Database Savings Plans offer a low-risk path to reducing operational expenses. Customers should utilize the AWS Cost Explorer to analyze their historical usage and determine an appropriate hourly commitment level to maximize their return on investment over a one-year term.