Cloudflare Email Service: now in public beta. Ready for your agents 2026-04-16 Thomas Gauvin Eric Falcão Email is the most accessible interface in the world. It is ubiquitous. There’s no need for a custom chat application, no custom SDK for each channel. Everyone already has an…
Expanded capabilities bring enterprise knowledge, visual creation, and interactive learning directly into user workflows Box: Connects document repositories directly into Go, allowing users to create new Box documents in the right folders and pull information from existing Box f…
Cloudflare outage on February 20, 2026 2026-02-21 David Tuber Dzevad Trumic On February 20, 2026, at 17:48 UTC, Cloudflare experienced a service outage when a subset of customers who use Cloudflare’s Bring Your Own IP (BYOIP) service saw their routes to the Internet withdrawn vi…
Pierre Gimalac Over the past few years, the Datadog Agent's artifact size has grown significantly, from 428 MiB in version 7.16.0 to a peak of 1.22 GiB in version 7.60.0 on Linux. That growth reflected years of new capabilities, broader integrations, and support for more environ…
Inside the feature store powering real-time AI in Dropbox Dash Dropbox Dash uses AI to understand questions about your files, work chats, and company content, bringing everything together in one place for deeper, more focused work. With tens of thousands of potential work docume…
Yevgeniy Miretskiy Sesh Nalla Arun Parthiban Alp Keles At Datadog, cost-aware engineering is more than a principle; it’s a performance challenge at scale. We've published how we saved $17 million by rethinking our infrastructure, and we’ve built Cloud Cost Management to help cus…
Nayef Ghattas When Go 1.24 was released in early 2025, we were eager to roll it out across our services. The headline feature—the new Swiss Tables map implementation—promised reduced CPU and memory overhead. Our story begins while the new version was being rolled out internally.…
Nayef Ghattas In Part 1: How we tracked down a Go 1.24 memory regression across hundreds of pods, we shared how upgrading to Go 1.24 introduced a subtle runtime regression that increased physical memory usage (RSS) across Datadog services. We worked with the Go community to iden…
Artem Krylysov May Lee Datadog collects billions of events from millions of hosts every minute and that number keeps growing and fast. Our data volumes grew 30x between 2017 and 2022. On top of that, the kind of queries we receive from our users has changed significantly. Why? B…
Remy Mathieu Our aspiration for the Datadog Agent is for it to process the maximum amount of data, very quickly, with as low of a CPU as possible. Striking this balance between performance and efficiency is an ongoing challenge for us. We are constantly searching for ways to opt…
Datadog engineers recently optimized the Datadog Agent's metric processing pipeline to achieve higher throughput and lower CPU overhead. By identifying that metric context generation—the process of creating unique keys for metrics—was a primary bottleneck, they implemented a series of algorithmic changes and Go runtime optimizations. These improvements allow the Agent to process significantly more metrics using the same computational resources.
### Identifying Bottlenecks via CPU Profiling
* Developers utilized Go’s native profiling tools to capture CPU usage during high-volume metric ingestion via DogStatsD.
* Flamegraph analysis revealed that the `addSample` and `trackContext` functions were the most CPU-intensive components of the pipeline.
* The profiling data specifically pointed to tag sorting and deduplication as the underlying operations consuming the most processing time.
### The Challenges of Metric Context Generation
* The Agent must generate a unique hash (context) for every metric received to address it within a hash table in RAM.
* To ensure the same metric always generates the same key, the original algorithm required sorting all tags and ensuring their uniqueness.
* The computational cost of sorting lists repeatedly for every incoming message created a performance ceiling for the entire metrics pipeline.
### Specialization and Runtime Optimization
* **Algorithmic Specialization:** The team implemented specialized sorting logic that adjusts based on the number of tags, optimizing the "hot path" for the most common metric structures.
* **Hashing Efficiency:** Micro-benchmarks identified Murmur3 as the most efficient hash implementation for balancing speed and collision resistance in this use case.
* **Leveraging Go Runtime:** The team transitioned from 128-bit hashes to 64-bit metric contexts. This change allowed the Agent to utilize Go's internal `mapassign_fast64` and `mapaccess2_fast64` functions, which provide optimized map operations for 64-bit keys.
### Redesigning for Performance
* The original design followed a rigid "hash metric name -> sort tags -> deduplicate tags -> iterative hash" workflow.
* Recognizing that sorting was the primary architectural bottleneck, the team moved toward a new design intended to minimize or eliminate the overhead of traditional list sorting during context generation.
To achieve similar performance gains in high-throughput Go applications, developers should profile their applications under realistic load and look for opportunities to leverage runtime-specific optimizations, such as using 64-bit map keys to trigger specialized compiler paths.
Felix Geisendörfer Without a doubt, Go 1.18 is shaping up to be one of the most exciting releases since Go 1.0. You’ve probably heard about major features such as generics and fuzzing, but in this post, I’ll focus on profiling and highlight a few noteworthy improvements to look…
Massimiliano Pippi If you look at the new Datadog Agent, you might notice most of the codebase is written in Go, although the checks we use to gather metrics are still written in Python. This is possible because the Datadog Agent, a regular Go binary, embeds a CPython interprete…
Jason Moiron To commemorate the third annual GopherCon US in Denver this week, we're releasing cgo bindings to two compression libraries that we've been using in production at Datadog for a while now: czlib and zstd. czlib started as a fork of the vitess project's cgzip package.…