Seminars and Events
Distributed ML System for Large-scale Models: Dynamic Distributed Training and Scalable Federated Learning
Event Details
In modern AI, large-scale deep learning models have emerged as the core technology behind many important Internet businesses, such as Search/ADs/Recommendation System/CV/NLP. BERT, Vision Transformer, GPT-3, and Switch Transformer models scale up the model size to a billion or even trillion number of parameters, showing non-trivial accuracy improvement for nearly all learning tasks. Distributed training using cloud clusters is key to the successful training of such large-scale models in a timely manner. Developing more advanced distributed training systems and algorithms can either reduce the energy cost or enable us to train even larger models. Furthermore, it is also essential to develop disruptive learning paradigms like federated learning, which can not only protect the privacy of users but also distribute the burden of handling unprecedented big data and models. This talk will mainly focus on distributed ML systems for large-scale models: dynamic distributed training for the cloud cluster (https://DistML.ai) and scale federated learning for the edge devices (https://FedML.ai).
In the first part, I will introduce PipeTransformer, an automated elastic pipelining for distributed training of Transformer models (BERT and ViT). In PipeTransformer, we design an adaptive on the fly freeze algorithm that can identify and freeze some layers gradually during training, and an elastic pipelining system that can dynamically reduce GPU resources to train the remaining active layers, and also forks more pipelines on released GPU resources to enlarge the width of data parallelism. In the second part, I will talk about scalable federated learning towards training large-scale models on resource-constrained edge devices and FedML Ecosystem, which aims at ubiquitously distributed training at the edge for diverse AI applications such as CV NLP, GraphNN, and IoT.
Registration and Other Information
We will send you email announcements with details of the upcoming speakers.
Register in advance for this webinar: https://usc.zoom.us/webinar/register/WN__0VhakI6Q6i3JsasdmNWcA
After registering, you will receive an email confirmation containing information about joining the Zoom webinar.AI Seminar Series
The recording for this AI Seminar talk will be posted on our USC/ISI YouTube page within 1-2 business days: