Researchers at Google have developed an inference-only method for generating differentially private (DP) synthetic data that avoids the high costs and data requirements associated with private fine-tuning. By prompting off-the-shelf large language models (LLMs) with sensitive examples in parallel and aggregating their outputs, the approach can generate thousands of high-quality synthetic data points while maintaining rigorous privacy guarantees. This method allows synthetic data to serve as a secure interface for model development, enabling teams to collaborate without requiring specialized knowledge of differential privacy.
## Differentially Private Prediction and Aggregation
The core of this method relies on "private prediction," where privacy is applied to the model's output rather than the model itself.
* Sensitive data points are distributed across multiple independent prompts, ensuring that no single individual's record can significantly influence the final output.
* The LLM generates next-token predictions for each prompt in parallel, which are then aggregated to mask individual contributions.
* The researchers designed a DP token sampling algorithm that treats the standard LLM "softmax" sampling process as a version of the exponential mechanism, a mathematical framework used to select the best option from a set while maintaining privacy.
## Enhancing Efficiency via KV Caching
Previous attempts at private prediction were computationally expensive because they required a fresh batch of sensitive examples for every single token generated.
* A new privacy analysis allows the system to reuse a fixed batch of sensitive examples across an entire generation sequence.
* By maintaining the same context for each generation step, the system becomes compatible with standard inference optimization techniques like KV (Key-Value) caching.
* This improvement enables the generation of synthetic data at a scale two to three orders of magnitude larger than prior methods.
## Optimizing Privacy Spend with Public Drafters
To preserve the "privacy budget"—the limited amount of information that can be released before privacy is compromised—the method introduces a public drafter model.
* The drafter model predicts the next token based solely on previously generated synthetic text, without ever seeing the sensitive data.
* Using the sparse vector technique, the system only consumes the privacy budget when the public drafter’s suggestion disagrees with the private aggregate of the sensitive data.
* This is particularly useful for structured data, where the drafter can handle formatting and syntax tokens, saving the privacy budget for the actual content.
By leveraging off-the-shelf models like Gemma, this approach provides a scalable way to transform sensitive datasets into useful synthetic versions. These synthetic datasets are high-quality enough to replace real data in downstream machine learning tasks, such as in-context learning or fine-tuning models like BERT, without the risk of leaking individual user information.