Christophe Nasarre The first part of this series introduced the high level architecture of the Datadog .NET continuous profiler. I discussed its initialization and the impact of the .NET runtime (CLR) version to figure out which CLR services to use. The goal of the profiler is t…
Christophe Nasarre The Profiling Engineering team at Datadog develops profiling tools for various runtimes, including Microsoft .NET. This blog post is the first in a series explaining the technical architecture and implementation choices behind our .NET profiler. Along the way,…
Remy Mathieu Our aspiration for the Datadog Agent is for it to process the maximum amount of data, very quickly, with as low of a CPU as possible. Striking this balance between performance and efficiency is an ongoing challenge for us. We are constantly searching for ways to opt…
Datadog engineers recently optimized the Datadog Agent's metric processing pipeline to achieve higher throughput and lower CPU overhead. By identifying that metric context generation—the process of creating unique keys for metrics—was a primary bottleneck, they implemented a series of algorithmic changes and Go runtime optimizations. These improvements allow the Agent to process significantly more metrics using the same computational resources.
### Identifying Bottlenecks via CPU Profiling
* Developers utilized Go’s native profiling tools to capture CPU usage during high-volume metric ingestion via DogStatsD.
* Flamegraph analysis revealed that the `addSample` and `trackContext` functions were the most CPU-intensive components of the pipeline.
* The profiling data specifically pointed to tag sorting and deduplication as the underlying operations consuming the most processing time.
### The Challenges of Metric Context Generation
* The Agent must generate a unique hash (context) for every metric received to address it within a hash table in RAM.
* To ensure the same metric always generates the same key, the original algorithm required sorting all tags and ensuring their uniqueness.
* The computational cost of sorting lists repeatedly for every incoming message created a performance ceiling for the entire metrics pipeline.
### Specialization and Runtime Optimization
* **Algorithmic Specialization:** The team implemented specialized sorting logic that adjusts based on the number of tags, optimizing the "hot path" for the most common metric structures.
* **Hashing Efficiency:** Micro-benchmarks identified Murmur3 as the most efficient hash implementation for balancing speed and collision resistance in this use case.
* **Leveraging Go Runtime:** The team transitioned from 128-bit hashes to 64-bit metric contexts. This change allowed the Agent to utilize Go's internal `mapassign_fast64` and `mapaccess2_fast64` functions, which provide optimized map operations for 64-bit keys.
### Redesigning for Performance
* The original design followed a rigid "hash metric name -> sort tags -> deduplicate tags -> iterative hash" workflow.
* Recognizing that sorting was the primary architectural bottleneck, the team moved toward a new design intended to minimize or eliminate the overhead of traditional list sorting during context generation.
To achieve similar performance gains in high-throughput Go applications, developers should profile their applications under realistic load and look for opportunities to leverage runtime-specific optimizations, such as using 64-bit map keys to trigger specialized compiler paths.
Felix Geisendörfer Without a doubt, Go 1.18 is shaping up to be one of the most exciting releases since Go 1.0. You’ve probably heard about major features such as generics and fuzzing, but in this post, I’ll focus on profiling and highlight a few noteworthy improvements to look…