Building better AI benchmarks: How many raters are enough? (opens in new tab)
Building better AI benchmarks: How many raters are enough? March 31, 2026 Flip Korn and Chris Welty, Research Scientists, Google Research We introduce an evaluation framework for ML models, based on “gold” ratings data, that optimizes the trade-off between the number of items an…