Meeting #149: [YSU AI Lab; Friday 15:30] GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

ยอดดู 40 ครั้ง

ข้ามไปที่ข้อความที่ยังไม่อ่านรายการแรก

Karen Hambardzumyan

ยังไม่อ่าน,

9 มี.ค. 2566 04:57:299/3/66

ถึง Machine Learning Reading Group Yerevan

This Friday at 3:33 pm, Arto Maranjyan from YerevaNN will present his joint work with Mher Safaryan and Peter Richtarik on “GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity”. This is a theoretical work that develops a new Local Training method called GradSkip, which achieves State of the Art communication complexity as in ProxSkip (previous SotA) and has reduced computational complexity.

Time: Friday, 3:33 pm
Language: English
Venue: YSU AI Lab
Paper: https://arxiv.org/abs/2210.16402
Codebase: https://github.com/artomaranjyan/GradSkip-code, https://github.com/burlachenkok/flpytorch

Abstract: In this work, we study distributed optimization algorithms that reduce the high communication costs of synchronization by allowing clients to perform multiple local gradient steps in each communication round. Recently, Mishchenko et al. (2022) proposed a new type of local method, called ProxSkip, that enjoys an accelerated communication complexity without any data similarity condition. However, their method requires all clients to call local gradient oracles with the same frequency. Because of statistical heterogeneity, we argue that clients with well-conditioned local problems should compute their local gradients less frequently than clients with ill-conditioned local problems. Our first contribution is the extension of the original ProxSkip method to the setup where clients are allowed to perform a different number of local gradient steps in each communication round. We prove that our modified method, GradSkip, still converges linearly, has the same accelerated communication complexity, and the required frequency for local gradient computations is proportional to the local condition number. Next, we generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and considering a generic proximable regularizer. This generalization, GradSkip+, recovers several related methods in the literature. Finally, we present an empirical study to confirm our theoretical claims.

Best,

Karen

ตอบทุกคน

ตอบกลับผู้สร้าง

ส่งต่อ

ข้อความใหม่ 0 รายการ