91勛圖厙

Guest Editorial for Special Issue on Coded Computing

Submitted by admin on Mon, 10/28/2024 - 01:24

Computing is the next frontier for information theory. Intellectually, the goal of coded computing has been of interest from the days of von Neumann and Shannon. von Neumann examined this issue in his 1956 paper Probabilistic Logics and the Synthesis of Reliable Organisms From Unreliable Components, which was in turn motivated intellectually by Shannons 1948 paper, and by the application of understanding reliability of seemingly noisy biological systems.

Quantization of Distributed Data for Learning

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates.

Bivariate Polynomial Coding for Efficient Distributed Matrix Multiplication

Submitted by admin on Mon, 10/28/2024 - 01:24

Coded computing is an effective technique to mitigate stragglers in large-scale and distributed matrix multiplication. In particular, univariate polynomial codes have been shown to be effective in straggler mitigation by making the computation time depend only on the fastest workers. However, these schemes completely ignore the work done by the straggling workers resulting in a waste of computational resources.

Communication-Efficient and Byzantine-Robust Distributed Learning With Error Feedback

Submitted by admin on Mon, 10/28/2024 - 01:24

We develop a communication-efficient distributed learning algorithm that is robust against Byzantine worker machines. We propose and analyze a distributed gradient-descent algorithm that performs a simple thresholding based on gradient norms to mitigate Byzantine failures. We show the (statistical) error-rate of our algorithm matches that of Yin et al. (2018), which uses more complicated schemes (coordinate-wise median, trimmed mean).

Coded Sequential Matrix Multiplication for Straggler Mitigation

Submitted by admin on Mon, 10/28/2024 - 01:24

In this work, we consider a sequence of $J$ matrix multiplication jobs which needs to be distributed by a master across multiple worker nodes. For $i\in \{1,2,\ldots,J\}$ , job- $i$ begins in round- $i$ and has to be completed by round- $(i+T)$ . In order to provide resiliency against slow workers (stragglers), previous works focus on coding across workers, which is the special case of $T=0$ . We propose here two schemes with $T > 0$ , which allow for coding across workers as well as the dimension of time.

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

Submitted by admin on Mon, 10/28/2024 - 01:24

In this paper, we propose and analyze SQuARM-SGD, a communication-efficient algorithm for decentralized training of large-scale machine learning models over a network. In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterovs momentum and then sends sparsified and quantized updates to its neighbors regulated by a locally computable triggering criterion.

Factored LT and Factored Raptor Codes for Large-Scale Distributed Matrix Multiplication

Submitted by admin on Mon, 10/28/2024 - 01:24

We propose two coding schemes for distributed matrix multiplication in the presence of stragglers. These coding schemes are adaptations of Luby Transform (LT) codes and Raptor codes to distributed matrix multiplication and are termed Factored LT (FLT) codes and Factored Raptor (FRT) codes. We show that all nodes in the Tanner graph of a randomly sampled code have a tree-like neighborhood with high probability. This ensures that the density evolution analysis gives a reasonable estimate of the average recovery threshold of FLT codes.

Compressing Gradients by Exploiting Temporal Correlation in Momentum-SGD

Submitted by admin on Mon, 10/28/2024 - 01:24

An increasing bottleneck in decentralized optimization is communication. Bigger models and growing datasets mean that decentralization of computation is important and that the amount of information exchanged is quickly growing. While compression techniques have been introduced to cope with the latter, none has considered leveraging the temporal correlations that exist in consecutive vector updates. An important example is distributed momentum-SGD where temporal correlation is enhanced by the low-pass-filtering effect of applying momentum.

Coded Computing via Binary Linear Codes: Designs and Performance Limits

Submitted by admin on Mon, 10/28/2024 - 01:24

We consider the problem of coded distributed computing where a large linear computational job, such as a matrix multiplication, is divided into $k$ smaller tasks, encoded using an $(n,k)$ linear code, and performed over $n$ distributed nodes. The goal is to reduce the average execution time of the computational job. We provide a connection between the problem of characterizing the average execution time of a coded distributed computing system and the problem of analyzing the error probability of codes of length $n$ used over erasure channels.

Slow and Stale Gradients Can Win the Race

Submitted by admin on Mon, 10/28/2024 - 01:24

Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous methods can alleviate stragglers, but cause gradient staleness that can adversely affect the convergence error. In this work, we present a novel theoretical characterization of the speedup offered by asynchronous methods by analyzing the trade-off between the error in the trained model and the actual training runtime (wallclock time).