Hi everyone,
We're super excited to host
Deepak Narayanan for
this week's MLSys Seminar (October 30th) at 10:30 am PT.
The talk details are as follows:
Bio:
Deepak is a Senior Applied Deep Learning Research Scientist in the ADLR
group at NVIDIA, where he builds software systems to more efficiently train and serve LLMs.
He graduated from Stanford with a Ph.D. in Computer Science in September 2021, where he was advised by Prof. Matei Zaharia.
Title:
Training Large Language Models at Scale
Abstract:
Training LLMs efficiently is challenging for a few reasons: training can
require yottaFLOPs of compute, and accelerators have limited memory capacity making it impossible to fit large models on even a multi-GPU server. Consequently, new methods of model parallelism such as tensor and pipeline parallelism have been proposed. Unfortunately,
naïve usage of these methods leads to scaling issues at thousands of GPUs. In this talk, I describe various systems innovations incorporated into Megatron-LM (https://github.com/nvidia/megatron-lm)
that allow us to run training iterations for models with up to a trillion parameters on thousands of GPUs.
See everyone there!!
Best,
Simran