Approximate Gradient Coding in Random and Adversarial Settings
Distributed algorithms are often beset by the straggler effect, where the slowest compute nodes in the system dictate the overall running time. Gradient coding uses coding-theoretic techniques to mitigate stragglers via algorithmic redundancy. Prior work has mainly focused on exact recovery of the desired output. However, slightly inexact solutions can be acceptable in applications that are robust to noise, such as model training via gradient-based algorithms. In this talk, we will present computationally simple gradient codes based on sparse random graphs that guarantee fast and approximately accurate distributed computation. We first present two gradient codes that can tolerate large numbers of random and adversarial stragglers, respectively. We then show that we can interpolate between these codes to obtain the stochastic block code (SBC), a gradient code based on the stochastic block model. We show that SBCs are efficient, accurate, and resistant to polynomial-time adversaries.