datadog

How we use Vale to improve our documentation editing process | Datadog (opens in new tab)

To manage a high volume of technical content across dozens of products, Datadog’s documentation team has automated its editorial process using the open-source linting tool Vale. By integrating these checks directly into their CI/CD pipeline via GitHub Actions, the team ensures prose consistency and clarity while significantly reducing the manual burden on technical writers. This "shift-left" approach empowers both internal and external contributors to identify and fix style issues independently before a formal human review begins. ### Scaling Documentation Workflows * The Datadog documentation team operates at a 200:1 developer-to-writer ratio, managing over 1,400 contributors and 35 distinct products. * In 2023 alone, the team merged over 20,000 pull requests covering 650 integrations, 400 security rules, and 65 API endpoints. * On-call writers review an average of 40 pull requests per day, necessitating automation to handle triaging and style enforcement efficiently. ### Automated Prose Review with Vale * Vale is implemented as a command-line tool and a GitHub Action that scans Markdown and HTML files for style violations. * When a contributor opens a pull request, the linter provides automated comments in the "Files Changed" tab, flagging long sentences, wordy phrasing, or legacy formatting habits. * This automation reduces the "mental toll" on writers by filtering out repetitive errors before they reach the human review stage. ### Codifying Style Guides into Rules * The team transitioned from static editorial guidelines stored in Confluence and wikis to a codified repository called `datadog-vale`. * Style rules are defined using Vale’s YAML specification, allowing the team to update global standards in a single location that is immediately active in the CI pipeline. * Custom regular expressions are used to exclude specific content from validation, such as Hugo shortcodes or technical snippets that do not follow standard prose rules. ### Implementation of Specific Linting Rules * **Jargon and Filler Words:** A `words.yml` file flags "cruft" such as "easily" or "simply" to maintain a professional, objective tone. * **Oxford Comma Enforcement:** The `oxfordcomma.yml` rule uses regex to identify lists missing a serial comma and provides a suggestion to the author. * **Latin Abbreviations:** The `abbreviations.yml` rule identifies terms like "e.g." or "i.e." and suggests plain English alternatives like "for example" or "that is." * **Timelessness:** Rules flag words like "currently" or "now" to ensure documentation remains relevant without frequent updates. By open-sourcing their Vale configurations, Datadog provides a framework for other organizations to automate their style guides and foster a more efficient, collaborative documentation culture. Teams looking to improve prose quality should consider adopting a similar "docs-as-code" approach to shift editorial effort toward the beginning of the contribution lifecycle.

datadog

.NET Continuous Profiler: CPU and wall time profiling | Datadog (opens in new tab)

Datadog’s Continuous Profiler timeline view offers a granular look at application performance by mapping code execution directly to a temporal axis. This allows engineers to move beyond aggregate flame graphs to understand exactly when and why specific bottlenecks occur during a request’s lifecycle. By correlating traces with detailed profile data, teams can effectively isolate the root causes of latency spikes and resource exhaustion in live production environments. ### Bridging the Gap Between Tracing and Profiling * While distributed tracing identifies which service or span is slow, profiling explains the "why" by showing execution at the method and line level. * The timeline view integrates profile data with specific trace spans, allowing users to zoom into the exact millisecond a performance degradation began. * By toggling between CPU time and wall time, developers can distinguish between active computation and passive waiting, providing a clearer picture of thread state. ### Visualizing CPU-Bound Inefficiencies * The tool identifies "hot" methods that consume excessive CPU cycles, such as inefficient regular expressions, heavy JSON serialization, or intensive cryptographic operations. * It detects transient CPU spikes that might be averaged out or hidden in traditional 60-second aggregate profiles. * Engineers can correlate CPU usage with specific threads to identify background tasks or "noisy neighbor" processes that impact the responsiveness of the main application logic. ### Diagnosing Wall Time and Runtime Overhead * Wall time analysis reveals where threads are blocked by external factors like I/O operations, database wait times, or mutex lock contention. * The view surfaces runtime-specific issues such as Garbage Collection (GC) pauses and Safepoint intervals that halt execution across the entire virtual machine. * This visibility is critical for troubleshooting synchronization issues where a thread is idle and waiting for a resource, a scenario that often causes high latency without showing up in CPU-only profiles. To maintain high availability and performance, organizations should integrate continuous profiling into their standard troubleshooting workflows, enabling a seamless transition from detecting a slow trace to identifying the offending line of code or runtime event.