google

Differential privacy on trust graphs (opens in new tab)

Researchers from Google have introduced Trust Graph Differential Privacy (TGDP), a framework that models privacy based on varying trust relationships between users represented as vertices in a graph. By allowing users to share data with trusted neighbors who then aggregate and privatize the information, TGDP bridges the gap between the highly accurate central DP model and the high-privacy local DP model. This approach enables more practical and accurate data analysis in scenarios where users exhibit nuanced privacy preferences rather than binary trust assumptions.

Defining Trust Graph DP

  • The model represents users as vertices and mutual trust as edges, ensuring that a user’s data remains statistically indistinguishable to any party they do not trust.
  • This guarantee holds even if non-trusted parties pool their data or collaborate with a user's trusted neighbors to attempt re-identification.
  • TGDP serves as a mathematical interpolation: a "star graph" topology corresponds to the central DP model, while a fully unconnected graph corresponds to the local DP model.

Private Aggregation and Error Metrics

  • The research evaluates TGDP through the fundamental task of private aggregation, where the goal is to estimate the sum of all users' private values ($\Sigma x_i$).
  • Accuracy is quantified using mean-squared error, allowing researchers to establish theoretical upper and lower bounds for algorithm performance.
  • These bounds demonstrate that the utility of a privacy-preserving algorithm is directly tied to the specific structure of the trust relationships within the network.

The Dominating Set Algorithm

  • The proposed algorithm utilizes the concept of a "dominating set"—a subset of users $T$ such that every user in the graph is either in $T$ or adjacent to someone in $T$.
  • In this mechanism, each user sends their raw data to a trusted neighbor within the dominating set.
  • The members of the dominating set aggregate the data they receive and add specific statistical noise to satisfy differential privacy before sharing the results.
  • This method reduces the total noise required compared to the local model, as the number of noise-adding entities is limited to the size of the dominating set rather than the entire population.

By leveraging existing trust networks, TGDP provides a rigorous way to optimize the trade-off between privacy and utility. This framework suggests that identifying small dominating sets within a community can significantly improve the accuracy of data analytics and machine learning without requiring a single, universally trusted central curator.