Differential privacy on trust graphs (opens in new tab)
Researchers from Google have introduced Trust Graph Differential Privacy (TGDP), a framework that models privacy based on varying trust relationships between users represented as vertices in a graph. By allowing users to share data with trusted neighbors who then aggregate and privatize the information, TGDP bridges the gap between the highly accurate central DP model and the high-privacy local DP model. This approach enables more practical and accurate data analysis in scenarios where users exhibit nuanced privacy preferences rather than binary trust assumptions.
Defining Trust Graph DP
- The model represents users as vertices and mutual trust as edges, ensuring that a user’s data remains statistically indistinguishable to any party they do not trust.
- This guarantee holds even if non-trusted parties pool their data or collaborate with a user's trusted neighbors to attempt re-identification.
- TGDP serves as a mathematical interpolation: a "star graph" topology corresponds to the central DP model, while a fully unconnected graph corresponds to the local DP model.
Private Aggregation and Error Metrics
- The research evaluates TGDP through the fundamental task of private aggregation, where the goal is to estimate the sum of all users' private values ($\Sigma x_i$).
- Accuracy is quantified using mean-squared error, allowing researchers to establish theoretical upper and lower bounds for algorithm performance.
- These bounds demonstrate that the utility of a privacy-preserving algorithm is directly tied to the specific structure of the trust relationships within the network.
The Dominating Set Algorithm
- The proposed algorithm utilizes the concept of a "dominating set"—a subset of users $T$ such that every user in the graph is either in $T$ or adjacent to someone in $T$.
- In this mechanism, each user sends their raw data to a trusted neighbor within the dominating set.
- The members of the dominating set aggregate the data they receive and add specific statistical noise to satisfy differential privacy before sharing the results.
- This method reduces the total noise required compared to the local model, as the number of noise-adding entities is limited to the size of the dominating set rather than the entire population.
By leveraging existing trust networks, TGDP provides a rigorous way to optimize the trade-off between privacy and utility. This framework suggests that identifying small dominating sets within a community can significantly improve the accuracy of data analytics and machine learning without requiring a single, universally trusted central curator.