aws

27 posts

toss

From Perimeter Security to Zero (opens in new tab)

Toss Payments transformed its security infrastructure from a vulnerable, single-layered legacy system into a robust "Defense in Depth" architecture spanning hybrid IDC and AWS environments. By integrating advanced perimeter defense, internal server monitoring, and container runtime security, the team established a comprehensive framework that prioritizes visibility and continuous verification. This four-year journey demonstrates that modern security requires moving beyond simple boundary protection toward a proactive, multi-layered strategy that assumes breaches can occur. ### Perimeter Defense and SSL/TLS Visibility * Addressed the critical visibility gap in legacy systems by implementing dedicated SSL/TLS decryption tools, allowing the team to analyze encrypted traffic for hidden malicious payloads. * Established a hybrid security architecture using a combination of physical DDoS protection, IPS, and WAF in IDC environments, complemented by AWS WAF and AI-based GuardDuty in the cloud. * Developed a collaborative merchant response process that moves beyond simple IP blocking; the system automatically detects malicious traffic from partners and provides them with detailed vulnerability reports and remediation guides (e.g., specific SQL injection points). ### Internal Network Security and "Assume Breach" Monitoring * Implemented **Wazuh**, an open-source security platform, in IDC environments to monitor lateral movement, collect centralized logs, and perform file integrity checks across diverse operating systems. * Leveraged **AWS GuardDuty** for intelligent threat detection in the cloud, focusing on malware scanning for EC2 instances and monitoring for suspicious process activities. * Established automated detection for privilege escalation and unauthorized access to sensitive system files, such as tracking instances where root privileges are obtained to modify the `/etc/passwd` file. ### Container Runtime Security as the Final Defense * Adopted **Falco**, a CNCF-hosted runtime security tool, to protect Kubernetes environments by monitoring system calls (syscalls) in real-time. * Configured specific security rules to detect "container escape" attempts, unauthorized access to sensitive files like `/etc/shadow`, and the execution of new or suspicious binaries within running containers. * Integrated **Falco Sidekick** to manage security events efficiently, ensuring that anomalous behaviors at the container level are instantly routed to the security team for response. ### Zero Trust and Continuous Verification * Shifted toward a Zero Trust model for the internal work network to ensure that all users and devices are continuously verified regardless of their location. * Focused on implementing dynamic access control and the principle of least privilege to minimize the potential impact of credential theft or device compromise. Organizations operating in hybrid cloud environments should move away from relying on a single perimeter and instead adopt a multi-layered defense strategy. True security resilience is achieved by gaining deep visibility into encrypted traffic and maintaining granular monitoring at the server and container levels to intercept threats that inevitably bypass initial defenses.

aws

AWS Weekly Roundup: Kiro CLI latest features, AWS European Sovereign Cloud, EC2 X8i instances, and more (January 19, 2026) (opens in new tab)

The January 19, 2026, AWS Weekly Roundup highlights significant advancements in sovereign cloud infrastructure and the general availability of high-performance, memory-optimized compute instances. The update also emphasizes the maturing ecosystem of AI agents, focusing on enhanced developer tooling and streamlined deployment workflows for agentic applications. These releases collectively aim to satisfy stringent regulatory requirements in Europe while pushing the boundaries of enterprise performance and automated productivity. ## Developer Tooling and Kiro CLI Enhancements * New granular controls for web fetch URLs allow developers to use allowlists and blocklists to strictly govern which external resources an agent can access. * The update introduces custom keyboard shortcuts to facilitate seamless switching between multiple specialized agents within a single session. * Enhanced diff views provide clearer visibility into changes, improving the debugging and auditing process for automated workflows. ## AWS European Sovereign Cloud General Availability * Following its initial 2023 announcement, this independent cloud infrastructure is now generally available to all customers. * The environment is purpose-built to meet the most rigorous sovereignty and data residency requirements for European organizations. * It offers a comprehensive set of AWS services within a framework that ensures operational independence and localized data handling. ## High-Performance Computing with EC2 X8i Instances * The memory-optimized X8i instances, powered by custom Intel Xeon 6 processors, have moved from preview to general availability. * These instances feature a sustained all-core turbo frequency of 3.9 GHz, which is currently exclusive to the AWS platform. * The hardware is SAP certified and engineered to provide the highest memory bandwidth and performance for memory-intensive enterprise workloads compared to other Intel-based cloud offerings. ## Agentic AI and Productivity Updates * Amazon Quick Suite continues to expand as a workplace "agentic teammate," designed to synthesize research and execute actions based on organizational insights. * New technical guidance has been released regarding the deployment of AI agents on Amazon Bedrock AgentCore. * The integration of GitHub Actions is now supported to automate the deployment and lifecycle management of these AI agents, bridging the gap between traditional DevOps and agentic AI development. These updates signal a strategic shift toward highly specialized infrastructure, both in terms of regulatory compliance with the Sovereign Cloud and raw performance with the X8i instances. Organizations looking to scale their AI operations should prioritize the new deployment patterns for Bedrock AgentCore to ensure a robust CI/CD pipeline for their autonomous agents.

toss

Managing Thousands of API/ (opens in new tab)

Toss Payments manages thousands of API and batch server configurations that handle trillions of won in transactions, where a single typo in a JVM setting can lead to massive financial infrastructure failure. To solve the risks associated with manual "copy-paste" workflows and configuration duplication, the team developed a sophisticated system that treats configuration as code. By implementing layered architectures and dynamic templates, they created a testable, unified environment capable of managing complex hybrid cloud setups with minimal human error. ## Overlay Architecture for Hierarchical Control * The team implemented a layered configuration system consisting of `global`, `cluster`, `phase`, and `application` levels. * Settings are resolved by priority, where lower-level layers override higher-level defaults, allowing servers to inherit common settings while maintaining specific overrides. * This structure allows the team to control environment-specific behaviors, such as disabling canary deployments in development environments, from a single centralized directory. * The directory structure maps files 1:1 to their respective layers, ensuring that naming conventions drive the CI/CD application process. ## Solving Duplication with Template Patterns * Standard YAML overlays often fail when dealing with long strings or arrays, such as `JVM_OPTION`, because changing a single value usually requires redefining the entire block. * To prevent the proliferation of nearly identical environment variables, the team introduced a template pattern using placeholders like `{{MAX_HEAP}}`. * Developers can modify specific parameters at the application layer while the core string remains defined at the global layer, significantly reducing the risk of typos. * This approach ensures that critical settings, like G1GC parameters or heap region sizes, remain consistent across the infrastructure unless explicitly changed. ## Dynamic and Conditional Configuration Logic * The system allows for "evolutionary" configurations where Python scripts can be injected to generate dynamic values, such as random JMX ports or data fetched from remote APIs. * Advanced conditional logic was added to handle complex deployment scenarios, enabling environment variables to change their values automatically based on the target cluster name (e.g., different profiles for AWS vs. IDC). * By treating configuration as a living codebase, the team can adapt to new infrastructure requirements without abandoning their core architectural principles. ## Reliable Batch Processing through Simplicity * For batch operations handling massive settlement volumes, the team prioritized "appropriate technology" and simplicity to minimize failure points. * They chose Jenkins for its low learning curve and reliability, despite its lack of native GitOps support. * To address inconsistencies in manual UI entries and varying Java versions across machines, they standardized the batch infrastructure to ensure that high-stakes financial calculations are executed in a controlled, predictable environment. The most effective way to manage large-scale infrastructure is to transition from static, duplicated configuration files to a dynamic, code-centric system. By combining an overlay architecture for hierarchy and a template pattern for granular changes, organizations can achieve the flexibility needed for hybrid clouds while maintaining the strict safety standards required for financial systems.