data-aggregation

1 posts

google

Differential privacy on trust graphs (opens in new tab)

Researchers from Google have introduced Trust Graph Differential Privacy (TGDP), a framework that models privacy based on varying trust relationships between users represented as vertices in a graph. By allowing users to share data with trusted neighbors who then aggregate and privatize the information, TGDP bridges the gap between the highly accurate central DP model and the high-privacy local DP model. This approach enables more practical and accurate data analysis in scenarios where users exhibit nuanced privacy preferences rather than binary trust assumptions. ## Defining Trust Graph DP * The model represents users as vertices and mutual trust as edges, ensuring that a user’s data remains statistically indistinguishable to any party they do not trust. * This guarantee holds even if non-trusted parties pool their data or collaborate with a user's trusted neighbors to attempt re-identification. * TGDP serves as a mathematical interpolation: a "star graph" topology corresponds to the central DP model, while a fully unconnected graph corresponds to the local DP model. ## Private Aggregation and Error Metrics * The research evaluates TGDP through the fundamental task of private aggregation, where the goal is to estimate the sum of all users' private values ($\Sigma x_i$). * Accuracy is quantified using mean-squared error, allowing researchers to establish theoretical upper and lower bounds for algorithm performance. * These bounds demonstrate that the utility of a privacy-preserving algorithm is directly tied to the specific structure of the trust relationships within the network. ## The Dominating Set Algorithm * The proposed algorithm utilizes the concept of a "dominating set"—a subset of users $T$ such that every user in the graph is either in $T$ or adjacent to someone in $T$. * In this mechanism, each user sends their raw data to a trusted neighbor within the dominating set. * The members of the dominating set aggregate the data they receive and add specific statistical noise to satisfy differential privacy before sharing the results. * This method reduces the total noise required compared to the local model, as the number of noise-adding entities is limited to the size of the dominating set rather than the entire population. By leveraging existing trust networks, TGDP provides a rigorous way to optimize the trade-off between privacy and utility. This framework suggests that identifying small dominating sets within a community can significantly improve the accuracy of data analytics and machine learning without requiring a single, universally trusted central curator.