Hello all,
Hope you are all safe and healthy, the Priberam Machine Learning Seminars will continue to take place remotely via zoom on Tuesdays at 1 p.m.
Next Tuesday, April 28th, Gonçalo Correia, an IST / IT /DeepSpin Ph.D student will present his work on "Adaptively Sparse Transformers" at 13:00h (zoom link: https://zoom.us/j/94750567384 )
You can register for this event and keep watch on future seminars below:
https://www.eventbrite.pt/e/adaptively-sparse-transformers-tickets-103238900330
Best regards,
Zita Marinho,
Priberam Labs
http://labs.priberam.com/
Priberam is hiring!
If you are interested in working with us please consult the available positions at priberam.com/careers.
PRIBERAM SEMINARS -- Zoom 94750567384
__________________________________________________
Priberam Machine Learning Lunch Seminar
Speaker: Gonçalo Correia (IT / IST)
Venue: https://zoom.us/j/94750567384
Date: Tuesday, April 28th, 2020
Time: 13:00
Title:
Adaptively Sparse Transformers
Abstract:
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with α-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the α parameter -- which controls the shape and sparsity of α-entmax -- allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations.
Short Bio:
Gonçalo is a Ph.D. student in the DeepSPIN Project, supervised by André Martins. He previously did his MSc at The University of Edinburgh, having graduated in Artificial Intelligence. Gonçalo's main interests in machine learning include generative models, attention mechanisms, and interpretable latent variable modeling.
More info: http://labs.priberam.com/Academia-Partnerships/Seminars.aspx
Eventbrite:
https://www.eventbrite.pt/e/adaptively-sparse-transformers-tickets-103238900330