datadog

Performance improvements in the Datadog Agent metrics pipeline | Datadog (opens in new tab)

Datadog engineers recently optimized the Datadog Agent's metric processing pipeline to achieve higher throughput and lower CPU overhead. By identifying that metric context generation—the process of creating unique keys for metrics—was a primary bottleneck, they implemented a series of algorithmic changes and Go runtime optimizations. These improvements allow the Agent to process significantly more metrics using the same computational resources.

Identifying Bottlenecks via CPU Profiling

  • Developers utilized Go’s native profiling tools to capture CPU usage during high-volume metric ingestion via DogStatsD.
  • Flamegraph analysis revealed that the addSample and trackContext functions were the most CPU-intensive components of the pipeline.
  • The profiling data specifically pointed to tag sorting and deduplication as the underlying operations consuming the most processing time.

The Challenges of Metric Context Generation

  • The Agent must generate a unique hash (context) for every metric received to address it within a hash table in RAM.
  • To ensure the same metric always generates the same key, the original algorithm required sorting all tags and ensuring their uniqueness.
  • The computational cost of sorting lists repeatedly for every incoming message created a performance ceiling for the entire metrics pipeline.

Specialization and Runtime Optimization

  • Algorithmic Specialization: The team implemented specialized sorting logic that adjusts based on the number of tags, optimizing the "hot path" for the most common metric structures.
  • Hashing Efficiency: Micro-benchmarks identified Murmur3 as the most efficient hash implementation for balancing speed and collision resistance in this use case.
  • Leveraging Go Runtime: The team transitioned from 128-bit hashes to 64-bit metric contexts. This change allowed the Agent to utilize Go's internal mapassign_fast64 and mapaccess2_fast64 functions, which provide optimized map operations for 64-bit keys.

Redesigning for Performance

  • The original design followed a rigid "hash metric name -> sort tags -> deduplicate tags -> iterative hash" workflow.
  • Recognizing that sorting was the primary architectural bottleneck, the team moved toward a new design intended to minimize or eliminate the overhead of traditional list sorting during context generation.

To achieve similar performance gains in high-throughput Go applications, developers should profile their applications under realistic load and look for opportunities to leverage runtime-specific optimizations, such as using 64-bit map keys to trigger specialized compiler paths.