synthetic-monitoring

1 posts

datadog

How we migrated our acceptance tests to use Synthetic Monitoring | Datadog (opens in new tab)

Datadog’s Frontend Developer Experience team migrated their massive codebase from a fragile, custom Puppeteer-based acceptance testing framework to Datadog Synthetic Monitoring to address persistent flakiness and high maintenance overhead. By leveraging a record-and-play approach and integrating it into their CI/CD pipelines via the `datadog-ci` tool, they successfully reduced developer friction and improved testing reliability for over 300 engineers. This transition demonstrates how replacing manual browser scripting with specialized monitoring tools can significantly streamline high-scale frontend workflows. ### Limitations of Puppeteer-Based Testing * Custom runners built on Puppeteer suffered from inherent flakiness because they relied on a complex chain of virtual graphic engines, browser manipulation, and network stability that frequently failed unexpectedly. * Writing tests was unintuitive, requiring engineers to manually script interaction details—such as verifying if a button is present and enabled before clicking—which became exponentially more complex for custom elements like dropdowns. * The testing infrastructure was slow and expensive, with CI jobs taking up to 35 minutes of machine time per commit to cover the application's 565 tests and 100,000 lines of test code. * Maintenance was a constant burden; every product update required a corresponding manual update to the scripts, making the process as labor-intensive as writing new features. ### Adopting Synthetic Monitoring and Tooling * The team moved to Synthetic Monitoring, which allows engineers to record browser interactions directly rather than writing code, significantly lowering the barrier to entry for creating tests. * To integrate these tests into the development lifecycle, the team developed `datadog-ci`, a CLI tool designed to trigger tests and poll result statuses directly from the CI environment. * The new system uses a specific file format (`.synthetics.json`) to identify tests within the codebase, allowing for configuration overrides and human-readable output in the build logs. * This transition turned an internal need into a product improvement, as the `datadog-ci` tool was generalized to help all Datadog users execute commands from within their CI/CD scripts. ### Strategies for High-Scale Migration and Adoption * The team utilized comprehensive documentation and internal "frontend gatherings" to educate 300 engineers on how to record tests and why the new system required less maintenance. * To build developer trust, the team initially implemented the new tests as non-blocking CI jobs, surfacing failures as PR comments rather than breaking builds. * Migration was treated as a distributed effort, with 565 individual tests tracked via Jira and assigned to their respective product teams to ensure ownership and a steady pace. * By progressively sunsetting the old platform as tests were migrated, the team managed a year-long transition without disrupting the daily output of 160 authors pushing 90 new PRs every day. To successfully migrate large-scale testing infrastructures, organizations should prioritize developer trust by introducing new tools through non-blocking pipelines and providing comprehensive documentation. Transitioning from manual browser scripting to automated recording tools not only reduces technical debt but also empowers engineers to maintain high-quality codebases without the burden of managing complex testing infrastructure.