Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Mailing list description / Efficient “curriculum-based” training: Theoretical modeling through synthetic testbeds

17 views
Skip to first unread message

Hongyang Ryan Zhang

unread,
Apr 14, 2025, 11:17:18 AMApr 14
to NEU Machine Learning Annoucements
Hi All,

I created this mailing list in order to circulate talks, seminars, events, job posts etc., that are related to some aspect of statistical or theoretical machine learning within Northeastern (mainly Boston) campus.
  • I will be posting every once in a while about talks that are relevant to folks in this area.
  • Currently the list is mainly Khoury-based, and we're missing several colleagues in COE (I'll be working on that over time) and maybe other parts of the campus.
Below is a talk tomorrow by Abhishek Panigrahi (fifth-year PhD student from Princeton), at April 15 2pm, in person at 177-2206, or on Zoom: https://northeastern.zoom.us/j/91092322522. Please feel free to share this talk with your students and let me know if they'd like to attend in person and visit 177.

Thanks,
Hongyang

-----------------------------------------------------------------------------------------------------------------------------------------
Title: Efficient “curriculum-based” training: Theoretical modeling through synthetic testbeds


Abstract: In the current age of deep learning, more compute typically means better performance. However, alternate strategies have emerged for training smaller models more efficiently by introducing structured supervision during training. In this talk, I’ll explore how synthetic testbeds help uncover the effectiveness of such methods—and reveal the role of curriculum in accelerating learning.

I will present two recent works. The first investigates progressive distillation, where student models learn not only from a final teacher checkpoint but also from its intermediate checkpoints. Using sparse parity as a testbed, we identify an implicit curriculum available only through these intermediate checkpoints—leading to both empirical speedup and provable sample complexity gains. We extend the underlying curriculum ideas to pre-training transformers on real-world datasets (Wikipedia and Books), where intermediate checkpoints are found to progressively capture longer-range context dependencies.

The second part focuses on context-enhanced learning, a gradient-based analog of in-context learning (ICL) where models are trained with extra contextual information provided in-context but removed at evaluation, with no gradient computations on this extra information.  In a multi-step reasoning task, we prove that context-enhanced learning can be exponentially more sample-efficient than standard training, provided the model is ICL-capable. We also experimentally demonstrate that it appears hard to detect or recover learning materials that were used in the context during training. This may have implications for data security as well as copyright.

References

  1. Progressive distillation induces an implicit curriculum. ICLR’25 (Oral). Abhishek Panigrahi*, Bingbin Liu*, Sadhika Malladi, Andrej Risteski, Surbhi Goel

  2. On the Power of Context-Enhanced Learning in LLMs. In submission. Xingyu Zhu*, Abhishek Panigrahi*, Sanjeev Arora


Location: 177-2206 (Please send me a note if you don't have access to 177 and would like to be added to the visitor's list).

Schedule: 2pm - 3pm, April 15, 2025
Reply all
Reply to author
Forward
0 new messages