google Mar 17, 2025

Generating synthetic data with differentially private LLM inference (opens in new tab)

ai llm machine-learning differential-privacy gemma synthetic-data inference kv-caching bert

Researchers at Google have developed an inference-only method for generating differentially private (DP) synthetic data that avoids the high costs and data requirements associated with private fine-tuning. By prompting off-the-shelf large language models (LLMs) with sensitive examples in parallel and aggregating their outputs, the approach can generate thousands of high-quality synthetic data points while maintaining rigorous privacy guarantees. This method allows synthetic data to serve as a secure interface for model development, enabling teams to collaborate without requiring specialized knowledge of differential privacy.

Differentially Private Prediction and Aggregation

The core of this method relies on "private prediction," where privacy is applied to the model's output rather than the model itself.

Sensitive data points are distributed across multiple independent prompts, ensuring that no single individual's record can significantly influence the final output.
The LLM generates next-token predictions for each prompt in parallel, which are then aggregated to mask individual contributions.
The researchers designed a DP token sampling algorithm that treats the standard LLM "softmax" sampling process as a version of the exponential mechanism, a mathematical framework used to select the best option from a set while maintaining privacy.

Enhancing Efficiency via KV Caching

Previous attempts at private prediction were computationally expensive because they required a fresh batch of sensitive examples for every single token generated.

A new privacy analysis allows the system to reuse a fixed batch of sensitive examples across an entire generation sequence.
By maintaining the same context for each generation step, the system becomes compatible with standard inference optimization techniques like KV (Key-Value) caching.
This improvement enables the generation of synthetic data at a scale two to three orders of magnitude larger than prior methods.

Optimizing Privacy Spend with Public Drafters

To preserve the "privacy budget"—the limited amount of information that can be released before privacy is compromised—the method introduces a public drafter model.

The drafter model predicts the next token based solely on previously generated synthetic text, without ever seeing the sensitive data.
Using the sparse vector technique, the system only consumes the privacy budget when the public drafter’s suggestion disagrees with the private aggregate of the sensitive data.
This is particularly useful for structured data, where the drafter can handle formatting and syntax tokens, saving the privacy budget for the actual content.

By leveraging off-the-shelf models like Gemma, this approach provides a scalable way to transform sensitive datasets into useful synthetic versions. These synthetic datasets are high-quality enough to replace real data in downstream machine learning tasks, such as in-context learning or fine-tuning models like BERT, without the risk of leaking individual user information.