google

Dynamic surface codes open new avenues for quantum error correction (opens in new tab)

Google Research has demonstrated the operation of dynamic surface codes for quantum error correction, marking a significant shift from traditional static circuit architectures. By alternating between different circuit constructions and re-tiling "detecting regions" in each cycle, these dynamic circuits offer greater flexibility to avoid hardware defects and suppress correlated errors. Experimental results on the Willow processor show that these methods can match the performance of static codes while significantly simplifying the physical design and fabrication of quantum chips. ## Error Triangulation via Dynamic Detecting Regions Quantum error correction (QEC) functions by localizing physical errors within specific "detecting regions" over multiple cycles to prevent them from affecting logical information. While standard surface codes use a static, square tiling for these regions, dynamic codes periodically change the tiling pattern. * Dynamic circuits allow the system to "deform" the detecting regions in spacetime, providing multiple perspectives to triangulate errors. * This approach enables the use of different gate types and connectivity layouts that are not possible with fixed, repetitive cycles. * The flexibility of dynamic re-tiling allows the system to sidestep common superconducting qubit issues such as "dropouts" (failed qubits or couplers) and leakage out of the computational subspace. ## Quantum Error Correction on Hexagonal Lattices Traditional square lattices require each physical qubit to connect to four neighbors, which creates significant overhead in wiring and coupler density. Dynamic circuits enable the use of a hexagonal lattice, where each qubit only requires three couplers. * The hexagonal code alternates between two distinct cycle types, utilizing one of the three couplers twice per cycle to maintain error detection capabilities. * Testing on the Willow processor showed that scaling the hexagonal code from distance 3 to 5 improved the logical error rate by a factor of 2.15, matching the performance of standard static circuits. * Reducing coupler density simplifies the optimization of qubit and gate frequencies, leading to a 15% improvement in simulated error suppression compared to four-coupler designs. ## Walking Circuits to Mitigate Leakage Superconducting qubits are prone to "leakage," where a qubit exits its intended computational states (0 and 1) into a higher energy state (2). In static circuits, repeated measurements on the same physical qubits can cause these leakage errors to accumulate and spread. * "Walking" circuits solve this by shifting the roles of data and measurement qubits across the lattice in each cycle. * By constantly moving the location where errors are measured, the circuit effectively "flushes out" leakage and other correlated errors before they can damage logical information. * Experiments confirmed that walking circuits achieve error suppression equivalent to static circuits while offering a more robust defense against long-term error correlations. ## Flexibility with iSWAP Entangling Gates Most superconducting quantum processors are optimized for Controlled-Z (CZ) gates, but dynamic circuits prove that QEC can be effectively implemented using alternative gates like iSWAP. * The research team demonstrated a dynamic surface code that utilizes iSWAP gates, which are native to many quantum hardware architectures. * This flexibility ensures that QEC is not tethered to a specific gate set, allowing hardware designers to choose entangling operations that offer the highest physical fidelity for their specific device. The move toward dynamic surface codes suggests a future where quantum processors are more resilient to manufacturing imperfections. By adopting hexagonal layouts and walking circuits, developers can reduce hardware complexity and mitigate physical noise, providing a more scalable path toward fault-tolerant quantum computing.

google

Hard-braking events as indicators of road segment crash risk (opens in new tab)

Google Research has established a statistically significant correlation between hard-braking events (HBEs) collected via Android Auto and actual road crash rates. By utilizing HBEs as a "leading" indicator rather than relying on sparse, lagging historical crash data, researchers can proactively identify high-risk road segments with much greater speed and spatial granularity. This validation suggests that connected vehicle data can serve as a scalable proxy for traditional safety assessments. ### Data Density and Scalability * HBEs—defined as forward deceleration exceeding -3m/s²—provide a signal that is 18 times denser than reported crash data. * While crashes are statistically rare and can take years to provide a valid safety profile for a specific road segment, HBEs offer a continuous stream of information. * This high density allows for the creation of a comprehensive "safety map" that includes local and arterial roads where crash reporting is often inconsistent or sparse. ### Statistical Validation of HBEs * Researchers employed negative binomial regression models to analyze 10 years of public crash data from California and Virginia alongside anonymized HBE data. * The models controlled for confounding factors such as traffic volume, segment length, road type (local, arterial, highway), and infrastructure dynamics like slope and lane changes. * The results confirmed a consistent positive association between HBE frequency and crash rates across all road types, proving HBEs are a reliable surrogate for risk regardless of geography. ### High-Risk Identification Case Study * An analysis of a freeway merge connecting Highway 101 and Highway 880 in California served as a practical validation of the metric. * This specific segment was found to have an HBE rate 70 times higher than the state average, correlating with a historical record of one crash every six weeks. * The HBE signal successfully flagged this location as being in the top 1% of high-risk segments without needing years of collision reports to confirm the danger, demonstrating its utility in identifying "black spots" early. ### Real-World Application and Road Management * Validating HBEs transforms raw sensor data into a trusted tool for urban planners and road authorities to perform network-wide safety assessments. * This approach allows for proactive infrastructure interventions, such as adjusting signage or merge patterns, before fatalities or injuries occur. * The findings support the integration of connected vehicle insights into platforms like Google Maps to help authorities manage road safety more dynamically.

google

NeuralGCM harnesses AI to better simulate long-range global precipitation (opens in new tab)

NeuralGCM represents a significant evolution in atmospheric modeling by combining traditional fluid dynamics with neural networks to solve the long-standing challenge of simulating global precipitation. By training the AI component directly on high-quality NASA satellite observations rather than biased reanalysis data, the model achieves unprecedented accuracy in predicting daily weather cycles and extreme rainfall events. This hybrid approach offers a faster, more precise tool for both medium-range weather forecasting and multi-decadal climate projections. ## The Limitations of Cloud Parameterization * Precipitation is driven by cloud processes occurring at scales as small as 100 meters, which is far below the kilometer-scale resolution of global weather models. * Traditional models rely on "parameterizations," or mathematical approximations, to estimate how these small-scale events affect the larger atmosphere. * Because these approximations are often simplified, traditional models struggle to accurately capture the complexity of water droplet formation and ice crystal growth, leading to errors in long-term forecasts. ## Training on Direct Satellite Observations * Unlike previous AI models trained on "reanalyses"—which are essentially simulations used to fill observational gaps—NeuralGCM is trained on NASA satellite-based precipitation data spanning 2001 to 2018. * The model utilizes a differentiable dynamical core, an architecture that allows the neural network to learn the effects of small-scale events directly from physical observations. * By bypassing the weaknesses inherent in reanalysis data, the model effectively creates a machine-learned parameterization that is more faithful to real-world cloud physics. ## Performance in Weather and Climate Benchmarks * At a resolution of 280 km, NeuralGCM outperforms leading operational models in medium-range forecasts (up to 15 days) and matches the precision of sophisticated multi-decadal climate models. * The model shows a marked improvement in capturing precipitation extremes, particularly for the top 0.1% of rainfall events. * Evaluation through WeatherBench 2 demonstrates that NeuralGCM accurately reproduces the diurnal (daily) weather cycle, a metric where traditional physics-based models frequently fall short. NeuralGCM provides a highly efficient and accessible framework for researchers and city planners who need to simulate long-range climate scenarios, such as 100-year storms or seasonal agricultural cycles. Its ability to maintain physical consistency while leveraging the speed of AI makes it a powerful candidate for the next generation of global atmospheric modeling.

toss

Managing Thousands of API/ (opens in new tab)

Toss Payments manages thousands of API and batch server configurations that handle trillions of won in transactions, where a single typo in a JVM setting can lead to massive financial infrastructure failure. To solve the risks associated with manual "copy-paste" workflows and configuration duplication, the team developed a sophisticated system that treats configuration as code. By implementing layered architectures and dynamic templates, they created a testable, unified environment capable of managing complex hybrid cloud setups with minimal human error. ## Overlay Architecture for Hierarchical Control * The team implemented a layered configuration system consisting of `global`, `cluster`, `phase`, and `application` levels. * Settings are resolved by priority, where lower-level layers override higher-level defaults, allowing servers to inherit common settings while maintaining specific overrides. * This structure allows the team to control environment-specific behaviors, such as disabling canary deployments in development environments, from a single centralized directory. * The directory structure maps files 1:1 to their respective layers, ensuring that naming conventions drive the CI/CD application process. ## Solving Duplication with Template Patterns * Standard YAML overlays often fail when dealing with long strings or arrays, such as `JVM_OPTION`, because changing a single value usually requires redefining the entire block. * To prevent the proliferation of nearly identical environment variables, the team introduced a template pattern using placeholders like `{{MAX_HEAP}}`. * Developers can modify specific parameters at the application layer while the core string remains defined at the global layer, significantly reducing the risk of typos. * This approach ensures that critical settings, like G1GC parameters or heap region sizes, remain consistent across the infrastructure unless explicitly changed. ## Dynamic and Conditional Configuration Logic * The system allows for "evolutionary" configurations where Python scripts can be injected to generate dynamic values, such as random JMX ports or data fetched from remote APIs. * Advanced conditional logic was added to handle complex deployment scenarios, enabling environment variables to change their values automatically based on the target cluster name (e.g., different profiles for AWS vs. IDC). * By treating configuration as a living codebase, the team can adapt to new infrastructure requirements without abandoning their core architectural principles. ## Reliable Batch Processing through Simplicity * For batch operations handling massive settlement volumes, the team prioritized "appropriate technology" and simplicity to minimize failure points. * They chose Jenkins for its low learning curve and reliability, despite its lack of native GitOps support. * To address inconsistencies in manual UI entries and varying Java versions across machines, they standardized the batch infrastructure to ensure that high-stakes financial calculations are executed in a controlled, predictable environment. The most effective way to manage large-scale infrastructure is to transition from static, duplicated configuration files to a dynamic, code-centric system. By combining an overlay architecture for hierarchy and a template pattern for granular changes, organizations can achieve the flexibility needed for hybrid clouds while maintaining the strict safety standards required for financial systems.

daangn

Karrot’s User Behavior (opens in new tab)

Daangn transitioned its user behavior log management from a manual, code-based Git workflow to a centralized UI platform called Event Center to improve data consistency and operational efficiency. By automating schema creation and enforcing standardized naming conventions, the platform reduced the technical barriers for developers and analysts while ensuring high data quality for downstream analysis. This transition has streamlined the entire data lifecycle, from collection in the mobile app to structured storage in BigQuery. ### Challenges of Code-Based Schema Management Prior to Event Center, Daangn managed its event schemas—definitions that describe the ownership, domain, and custom parameters of a log—using Git and manual JSON files. This approach created several bottlenecks for the engineering team: * **High Entry Barrier**: Users were required to write complex Spark `StructType` JSON files, which involved managing nested structures and specific metadata fields like `nullable` and `type`. * **Inconsistent Naming**: Without a central enforcement mechanism, event names followed different patterns (e.g., `item_click` vs. `click_item`), making it difficult for analysts to discover relevant data. * **Operational Friction**: Every schema change required a Pull Request (PR), manual review by the data team, and a series of CI checks, leading to slow iteration cycles and frequent communication overhead. ### The User Behavior Log Pipeline To support data-driven decision-making, Daangn employs a robust pipeline that processes millions of events daily through several critical stages: * **Collection and Validation**: Events are sent from the mobile SDK to an event server, which performs initial validation before passing data to GCP Pub/Sub. * **Streaming Processing**: GCP Dataflow handles real-time deduplication, field validation, and data transformation (flattening) to prepare logs for storage. * **Storage and Accessibility**: Data is stored in Google Cloud Storage and BigQuery, where custom parameters defined in the schema are automatically expanded into searchable columns, removing the need for complex JSON parsing in SQL. ### Standardizing Discovery via Event Center The Event Center platform was designed to transform log management into a user-friendly, UI-driven experience while maintaining technical rigor. * **Standardized Naming Conventions**: The platform enforces a strict "Action-Object-Service" naming rule, ensuring that all events are categorized logically across the entire organization. * **Recursive Schema Builder**: To handle the complexity of nested JSON data, the team built a UI component that uses a recursive tree structure, allowing users to define deep data hierarchies without writing code. * **Centralized Dictionary**: The platform serves as a "single source of truth" where any employee can search for events, view their descriptions, and identify the team responsible for specific data points. ### Technical Implementation and Integration The system architecture was built to bridge the gap between a modern web UI and the existing Git-based infrastructure. * **Tech Stack**: The backend is powered by Go (Gin framework) and PostgreSQL (GORM), while the frontend utilizes React, TypeScript, and TanStack Query for state management. * **Automated Git Sync**: When a user saves a schema in Event Center, the system automatically triggers a GitHub Action that generates the necessary JSON files and pushes them to the repository, maintaining the codebase as the ultimate source of truth while abstracting the complexity. * **Real-time Validation**: The UI provides immediate feedback on data types and naming errors, preventing invalid schemas from reaching the production pipeline. Implementing a dedicated log management platform like Event Center is highly recommended for organizations scaling their data operations. Moving away from manual file management to a UI-based system not only reduces the risk of human error but also democratizes data access by allowing non-engineers to define and discover the logs they need for analysis.

toss

Rethinking Design Systems (opens in new tab)

Toss Design System (TDS) argues that as organizations scale, design systems often become a source of friction rather than efficiency, leading teams to bypass them through "forking" or "detaching" components. To prevent this, TDS treats the design system as a product that must adapt to user demand rather than a set of rigid constraints to be enforced. By shifting from a philosophy of control to one of flexible expansion, they ensure that the system remains a helpful tool rather than an obstacle. ### The Limits of Control and System Fragmentation * When a design system is too rigid, product teams often fork packages to make minor adjustments, which breaks the link to central updates and creates UI inconsistencies. * Treating "system bypasses" as user errors is ineffective; instead, they should be viewed as unmet needs in the system's "supply." * The goal of a modern design system should be to reduce the reason to bypass the system by providing natural extension points. ### Comparing Flat and Compound API Patterns * **Flat Pattern:** These components hide internal structures and use props to manage variations (e.g., `title`, `description`). While easy to use, they suffer from "prop bloat" as more edge cases are added, making long-term maintenance difficult. * **Compound Pattern:** This approach provides sub-components (e.g., `Card.Header`, `Card.Body`) for the user to assemble manually. This offers high flexibility for unexpected layouts but increases the learning curve and the amount of boilerplate code required. ### The Hybrid API Strategy * TDS employs a hybrid approach, offering both Flat APIs for common, simple use cases and Compound APIs for complex, customized needs. * Developers can choose a `FlatCard` for speed or a `Compound Card` when they need to inject custom elements like badges or unique button placements. * To avoid the burden of maintaining two separate codebases, TDS uses a "primitive" layer where the Flat API is simply a pre-assembled version of the Compound components. Design systems should function as guardrails that guide developers toward consistency, rather than fences that stop them from solving product-specific problems. By providing flexible architecture that supports exceptions, a system can maintain its relevance and ensure that teams stay within the ecosystem even as their requirements evolve.

line

Code Quality Improvement Techniques Part (opens in new tab)

LY Corporation’s technical review highlights that making a class open for inheritance imposes a "tax" on its internal constraints, particularly immutability. While developers often use inheritance to create specialized versions of a class, doing so with immutable types can allow subclasses to inadvertently or intentionally break the parent class's guarantees. To ensure strict data integrity, the post concludes that classes intended to be immutable should be made final or designed around read-only interfaces rather than open for extension. ### The Risks of Open Immutable Classes * Kotlin developers often wrap `IntArray` in an `ImmutableIntList` to avoid the overhead of boxed types while ensuring the collection remains unchangeable. * If `ImmutableIntList` is marked as `open`, a developer might create a `MutableIntList` subclass that adds a `set` method to modify the internal `protected valueArray`, violating the "Immutable" contract of the parent type. * Even if the internal state is `private`, a subclass can override the `get` method to return dynamic or state-dependent values, effectively breaking the expectation that the data remains constant. * These issues demonstrate that any class with a "fundamental" name should be carefully guarded against unexpected inheritance in different modules or packages. ### Establishing Safe Inheritance Hierarchies * Mutable objects should not inherit from immutable objects, as this inherently violates the immutability constraints established by the parent. * Conversely, immutable objects should not inherit from mutable ones; this often leads to runtime errors (such as `UnsupportedOperationException`) when a user attempts to call modification methods like `add` or `set` on an immutable instance. * The most effective design pattern is to use a "read-only" (unmodifiable) interface as a common parent, similar to how Kotlin distinguishes between `List` and `MutableList`. * In this structure, mutable classes can inherit from the read-only parent without issue (adding new methods), and immutable classes can inherit from the read-only parent while adding stricter internal constraints. To maintain high code quality and prevent logic errors, developers should default to making classes final when immutability is a core requirement. If shared functionality is needed across different types of lists, utilize composition or a shared read-only interface to ensure that the "immutable" label remains a truthful guarantee.

datadog

Hardening eBPF for runtime security: Lessons from Datadog Workload Protection | Datadog (opens in new tab)

Scaling real-time file monitoring across high-traffic environments requires a strategy to process billions of kernel events without exhausting system resources. By leveraging eBPF, organizations can move filtering logic directly into the Linux kernel, drastically reducing the overhead associated with traditional userspace monitoring tools. This approach enables precise observability of file system activity while maintaining the performance necessary for large-scale production workloads. ### Limitations of Traditional Monitoring Tools * Conventional tools like `auditd` often struggle with performance bottlenecks because they require every event to be copied from the kernel to userspace for evaluation. * Standard APIs like `fanotify` and `inotify` lack the granularity needed for complex filtering, often resulting in "event storms" during high I/O operations. * The high frequency of context switching between kernel and userspace when processing billions of events per minute can lead to significant CPU spikes and system instability. ### Architecture of eBPF-Based File Monitoring * The system hooks into the Virtual File System (VFS) layer using `kprobes` and `tracepoints` to capture actions such as `vfs_read`, `vfs_write`, and `vfs_open`. * LSM (Linux Security Module) hooks are utilized for security-focused monitoring, providing a stable interface that is less prone to kernel version changes than raw kprobes. * By executing C-like code within the kernel’s sandboxed environment, the system can inspect file paths and process IDs (PIDs) instantly upon event creation. ### In-Kernel Filtering and Data Management * High-performance eBPF maps, specifically `BPF_MAP_TYPE_HASH` and `BPF_MAP_TYPE_LPM_TRIE`, are used to store allowlists and denylists for specific directories and file extensions. * The system implements prefix matching to ignore high-volume, low-value paths like `/proc`, `/sys`, or temporary build directories, discarding these events before they ever leave the kernel. * To minimize memory contention, per-CPU maps are employed, allowing the eBPF programs to aggregate data locally on each core without the need for expensive global locks. ### Efficient Data Transmission with Ring Buffers * The implementation utilizes `BPF_RINGBUF` rather than the older `BPF_PERF_EVENT_ARRAY` to handle data transfer to userspace. * Ring buffers provide a shared memory space between the kernel and userspace, offering better memory efficiency and guaranteeing event ordering. * By only pushing "filtered" events—representing a tiny fraction of the billions of raw kernel events—the system prevents userspace consumers from becoming overwhelmed. For organizations operating at massive scale, moving from reactive userspace logging to proactive kernel-level filtering is essential. Implementing an eBPF-based monitoring stack allows for deep visibility into file system changes with minimal performance impact, making it the recommended standard for modern, high-throughput cloud environments.

aws

Happy New Year! AWS Weekly Roundup: 10,000 AIdeas Competition, Amazon EC2, Amazon ECS Managed Instances and more (January 5, 2026) (opens in new tab)

The first AWS Weekly Roundup of 2026 highlights a strategic focus on community-driven AI innovation and significant performance upgrades to the EC2 instance lineup. By combining high-stakes competitions like the 10,000 AIdeas challenge with technical releases such as Graviton4-powered instances, AWS is positioning itself to lead in both "Agentic AI" development and high-performance cloud infrastructure. **AI Innovation and Professional Mentorship** * The "Become a Solutions Architect" (BeSA) program is launching a new six-week cohort on February 21, 2026, specifically focused on Agentic AI on AWS. * The Global 10,000 AIdeas Competition offers a $250,000 prize pool and recognition at re:Invent 2026, with a submission deadline of January 21, 2026. * Competition participants are required to utilize the "Kiro" development tool and must ensure their applications remain within AWS Free Tier limits. **Next-Generation EC2 Instances and Hardware** * New M8gn and M8gb instances utilize AWS Graviton4 processors, providing a 30% compute performance boost over the previous Graviton3 generation. * The M8gn variant features 6th generation AWS Nitro Cards, delivering up to 600 Gbps of network bandwidth, the highest available for network-optimized instances. * The M8gb variant is optimized for storage-heavy workloads, offering up to 150 Gbps of dedicated Amazon EBS bandwidth. **Resilience Testing and Governance** * AWS Direct Connect now integrates with the AWS Fault Injection Service (FIS), allowing engineers to simulate Border Gateway Protocol (BGP) failovers to validate redundant pathing. * AWS Control Tower has expanded its governance capabilities by supporting 176 additional Security Hub controls within the Control Catalog. * These controls address a broad spectrum of requirements across security, cost optimization, operations, and data durability. **Hybrid Cloud and Windows Support** * Amazon ECS Managed Instances now support Windows Server for on-premises and remote environment management. * The service uses AWS Systems Manager (SSM) to register external instances, which can then be managed as part of an ECS cluster using Windows-based ECS-optimized AMIs. Developers and infrastructure architects should prioritize the January 21 deadline for AI project submissions while evaluating the M8gn instances for high-throughput networking requirements. Additionally, organizations running hybrid Windows workloads should explore the new ECS Managed Instances support to unify their container orchestration across on-premises and cloud environments.