My group is holding a series of invited talks over the 2021 winter semester made by current rising stars in Information Retrieval (IR) and Knowledge Organization (KO). The seminar website is at https://wing-nus.github.io/ir-seminar/
. The final talk will be held tomorrow, 29th of Dec 10:00-11:00 and features Arman Cohan from AI2 discussing work on generalizing NLP beyond the sentence boundary. Please join us at the Zoom address if you’re interested. The talk and slides may be made available on the website but please do join us to avoid disappointment.
The talk is open to all, so feel free to circulate to others you might think are interested.
WING-NUS IR-KO Seminar 2021 - Talk 5
Title: Beyond Sentences and Paragraphs: Towards Document-level and Multi-document Understanding
Date/Time: 29 Dec 2021, Wednesday, 10:00 AM to 11:00 AM
Venue: Join Zoom Meetinghttp://bit.ly/knmnyn-zoom-nus
ZOOM Room ID: 770 447 8736
, PIN: 3244
Chaired by:A/P Min-Yen Kan, School of Computing
In this talk, I will describe a few of our recent works on developing Transformer-based models that target document-level and multi-document natural language tasks. I will first introduce Specter, a method for producing document representations using a Transformer model that incorporates document-level relatedness signals. I will then discuss Longformer, an efficient transformer model that can process and contextualize information across inputs of several thousands of tokens.
This is achieved by replacing the full self-attention mechanism in transformers with sparse local and global attention patterns. I will then discuss two of our efforts in developing general language models for multi-document tasks. CDLM is an encoder-only model for multi-document tasks that uses multiple related documents during pretraining and pretrains a dynamic global attention for multi document tasks. I will then briefly discuss our recent work on PRIMER, a general pre-trained model for multi-document summarization tasks. Finally, I will discuss some of our other efforts on creating challenging document level benchmarks.
Arman Cohan is a Research Scientist at the Allen Institute for AI (AI2) and an Affiliate Assistant Professor at University of Washington. His research focused on developing natural language processing (NLP) models for document-level and multi-document understanding, natural language generation and summarization as well as information discovery and filtering. He is also interested in applications of NLP in science and health domains. His research has been recognized with multiple awards including a best paper award at EMNLP 2017, an honorable mention at COLING 2018, and Harold N. Glassman Distinguished Doctoral Dissertation award in 2019.