How we built a Ruby library that saves 50% in testing time | Datadog (opens in new tab)
Lengthy CI pipelines and flaky tests often hinder developer productivity by causing unnecessary wait times and costly infrastructure usage. To address this, Datadog developed a Ruby test impact analysis library that dynamically maps tests to specific source files, allowing the CI runner to skip tests unrelated to the latest code changes. By moving beyond standard coverage tools and utilizing low-level Ruby VM interpreter events, this solution significantly reduces testing time while maintaining high performance and correctness.
The Strategy of Test Impact Analysis
- Lengthy CI pipelines (often exceeding 20 minutes) increase the likelihood of intermittent "flaky" failures that are unrelated to current code changes.
- While parallelization can reduce time, it increases cloud computing costs and does not mitigate the flakiness of irrelevant tests.
- Test impact analysis generates a dynamic map between each test and the source files executed during its run; if a commit doesn't touch those files, the test is safely skipped.
- Success depends on three pillars: correctness (never skipping a necessary test), performance (low overhead), and seamlessness (no required code changes for the user).
Limitations of Standard Coverage Tools
- Ruby’s built-in
Coveragemodule (enhanced in version 3.1 withresume/suspendmethods) proved incompatible with existing total code coverage tools likesimplecov. - Initial prototypes using the
Coveragemodule showed a performance overhead of 300%, making the test suite four times slower. - The
TracePointAPI was also evaluated as an alternative to spy on code execution via thelineevent, but it still produced a significant median overhead of 200% to 400%. - Benchmarks were conducted using the
rubocoptest suite—a "hard mode" scenario with 20,000+ tests—to ensure the tool could handle high-sensitivity environments.
Implementing a Custom C Extension
- To bypass the limitations of high-level APIs, developers utilized Ruby’s C extension capabilities to hook directly into the Virtual Machine.
- The library uses
rb_add_event_hook2andrb_thread_add_event_hookto subscribe to theRUBY_EVENT_LINEevent at the interpreter level. - The implementation involves a C-based
dd_cov_startfunction that triggers when a test begins and add_cov_stopfunction to collect the results. - During execution, the tool uses
rb_sourcefile()to identify the current file and stores it in a Ruby hash only if the file is located within the project’s root directory.
For engineering teams struggling with bloated CI pipelines, adopting test impact analysis is a highly effective way to optimize resources. By utilizing tools like Datadog’s Intelligent Test Runner, which leverages low-level VM events for minimal overhead, teams can cut their testing time in half without sacrificing the reliability of their master branch.