Optimizing Distributed Training for Large and Noisy Data
College:
The Dorothy and George Hennings College of Science, Mathematics, and Technology
Major:
Computer Science
Faculty Research Advisor(s):
Yulia Kumar
Abstract:
The study explores the transformative impact of deep learning across various application domains, underscoring the challenges presented by the scale of datasets and the complexity of neural networks (NNs). With datasets like ImageNet-1K expanding to 180 GB and NNs encompassing billions of parameters, the computational demands for training large models are rapidly exceeding the growth predicted by Moore's law. This has led to a paradigm shift toward distributed training across multiple machines. A crucial aspect of this research is addressing the inherent noise in datasets, arising from such factors as Gaussian noise injection and random cutout. The study focuses on the challenge of designing robust aggregation methods to manage Byzantine faults and nonlinear data augmentation. Researchers are investigating the use of advanced quantization techniques, such as vector, post-training, low-rank, quantization-aware, and mixed-precision training, to improve the computational efficiency and potentially the accuracy of training large-scale NNs in distributed environments .