graph-theory

2 posts

google

Differential privacy on trust graphs (opens in new tab)

Researchers from Google have introduced Trust Graph Differential Privacy (TGDP), a framework that models privacy based on varying trust relationships between users represented as vertices in a graph. By allowing users to share data with trusted neighbors who then aggregate and privatize the information, TGDP bridges the gap between the highly accurate central DP model and the high-privacy local DP model. This approach enables more practical and accurate data analysis in scenarios where users exhibit nuanced privacy preferences rather than binary trust assumptions. ## Defining Trust Graph DP * The model represents users as vertices and mutual trust as edges, ensuring that a user’s data remains statistically indistinguishable to any party they do not trust. * This guarantee holds even if non-trusted parties pool their data or collaborate with a user's trusted neighbors to attempt re-identification. * TGDP serves as a mathematical interpolation: a "star graph" topology corresponds to the central DP model, while a fully unconnected graph corresponds to the local DP model. ## Private Aggregation and Error Metrics * The research evaluates TGDP through the fundamental task of private aggregation, where the goal is to estimate the sum of all users' private values ($\Sigma x_i$). * Accuracy is quantified using mean-squared error, allowing researchers to establish theoretical upper and lower bounds for algorithm performance. * These bounds demonstrate that the utility of a privacy-preserving algorithm is directly tied to the specific structure of the trust relationships within the network. ## The Dominating Set Algorithm * The proposed algorithm utilizes the concept of a "dominating set"—a subset of users $T$ such that every user in the graph is either in $T$ or adjacent to someone in $T$. * In this mechanism, each user sends their raw data to a trusted neighbor within the dominating set. * The members of the dominating set aggregate the data they receive and add specific statistical noise to satisfy differential privacy before sharing the results. * This method reduces the total noise required compared to the local model, as the number of noise-adding entities is limited to the size of the dominating set rather than the entire population. By leveraging existing trust networks, TGDP provides a rigorous way to optimize the trade-off between privacy and utility. This framework suggests that identifying small dominating sets within a community can significantly improve the accuracy of data analytics and machine learning without requiring a single, universally trusted central curator.

google

Load balancing with random job arrivals (opens in new tab)

Research from Google explores the competitive ratio of online load balancing when tasks arrive in a uniformly random order rather than an adversarial one. By analyzing a "tree balancing game" where edges must be oriented to minimize node indegree, the authors demonstrate that random arrival sequences still impose significant mathematical limitations on deterministic algorithms. The study ultimately concludes that no online algorithm can achieve a competitive ratio significantly better than $\sqrt{\log n}$, establishing new theoretical boundaries for efficient cluster management. ### The Online Load Balancing Challenge * Modern cluster management systems, such as Google’s Borg, must distribute hundreds of thousands of jobs across machines to maximize utilization and minimize the maximum load (makespan). * In the online version of this problem, jobs arrive one-by-one, and the system must assign them immediately without knowing what future jobs will look like. * Traditionally, these algorithms are evaluated using "competitive analysis," comparing the performance of an online algorithm against an optimal offline version that has full knowledge of the job sequence. ### The Tree Balancing Game * The problem is modeled as a game where an adversary presents edges of a tree (representing jobs and machines) one at a time. * For every undirected edge $(u, v)$ presented, the algorithm must choose an orientation ($u \to v$ or $v \to u$), with the goal of minimizing the maximum number of edges pointing at any single node. * In a worst-case adversarial arrival order, it has been mathematically proven since the 1990s that no deterministic algorithm can guarantee a maximum indegree of less than $\log n$, where $n$ is the number of nodes. ### Performance Under Random Arrival Orders * The research specifically investigates "random order arrivals," where every possible permutation of the job sequence is equally likely, simulating a more natural distribution than a malicious adversary. * While previous assumptions suggested that a simple "greedy algorithm" (assigning the job to the machine with the currently lower load) performed better in this model, this research proves a new, stricter lower bound. * The authors demonstrate that even with random arrivals, any online algorithm will still incur a maximum load proportional to at least $\sqrt{\log n}$. * For more general load balancing scenarios beyond simple trees, the researchers established a lower bound of $\sqrt{\log \log n}$. ### Practical Implications These findings suggest that while random job arrival provides a slight performance advantage over adversarial scenarios, system designers cannot rely on randomness alone to eliminate load imbalances. Because the maximum load grows predictably according to the $\sqrt{\log n}$ limit, large-scale systems must be architected to handle this inherent logarithmic growth in resource pressure to maintain high utilization and stability.