Priberam Machine Learning Lunch Seminars (T11) - 5 - "Adaptively Sparse Transformers", Gonçalo Correia (IST/IT/DeepSpin)

0 views
Skip to first unread message

Zita Marinho

unread,
Apr 22, 2020, 9:56:42 AM4/22/20
to priberam_...@googlegroups.com, si...@omni.isr.ist.utl.pt, isr-...@isr.tecnico.ulisboa.pt

Hello all,


Hope you are all safe and healthy, the Priberam Machine Learning Seminars will continue to take place remotely via zoom on Tuesdays at 1 p.m.


Next Tuesday, April 28thGonçalo Correia, an IST / IT /DeepSpin Ph.D student will present his work on "Adaptively Sparse Transformersat 13:00h (zoom link: https://zoom.us/j/94750567384 )


You can register for this event and keep watch on future seminars below:

https://www.eventbrite.pt/e/adaptively-sparse-transformers-tickets-103238900330

Food will not be provided but feel free to eat at the same time :) Please note that the seminar is limited to 100 people and this will work on a 1st come 1st served basis. So please try to be on time if you wish to attend.

Best regards,
Zita Marinho,

Priberam Labs
http://labs.priberam.com/


Priberam is hiring!

If you are interested in working with us please consult the available positions at priberam.com/careers. 


Image result for priberam logoPRIBERAM SEMINARS   --  Zoom 94750567384
__________________________________________________

Priberam Machine Learning Lunch Seminar
Speaker:  Gonçalo Correia (IT / IST)
Venue: https://zoom.us/j/94750567384
Date: Tuesday, April 28th, 2020
Time: 13:00 

Title:

Adaptively Sparse Transformers


Abstract:

Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word relationships. However, with standard softmax attention, all attention heads are dense, assigning a non-zero weight to all context words. In this work, we introduce the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns. This sparsity is accomplished by replacing softmax with α-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Moreover, we derive a method to automatically learn the α parameter -- which controls the shape and sparsity of α-entmax -- allowing attention heads to choose between focused or spread-out behavior. Our adaptively sparse Transformer improves interpretability and head diversity when compared to softmax Transformers on machine translation datasets. Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers. Furthermore, at no cost in accuracy, sparsity in attention heads helps to uncover different head specializations.

Short Bio:

Gonçalo is a Ph.D. student in the DeepSPIN Project, supervised by André Martins. He previously did his MSc at The University of Edinburgh, having graduated in Artificial Intelligence. Gonçalo's main interests in machine learning include generative models, attention mechanisms, and interpretable latent variable modeling.

More info: http://labs.priberam.com/Academia-Partnerships/Seminars.aspx


Eventbrite:

https://www.eventbrite.pt/e/adaptively-sparse-transformers-tickets-103238900330

Image result for priberam logo

Reply all
Reply to author
Forward
0 new messages