Neural Network Architectures for Efficient and Robust NLP
NLP has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing “who” did “what” to “whom,” has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, hoards of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency.In this talk I will present two new methods to facilitate fast, accurate and robust NLP. First, I will describe Iterated Dilated Convolutional Neural Networks (ID-CNNs, EMNLP 2017), a faster alternative to bidirectional LSTMs for sequence labeling, which in comparison to traditional CNNs have better capacity for large context and structured prediction. Unlike LSTMs whose sequential processing on sentences of length N requires O(N) time even in the face of GPU parallelism, ID-CNNs permit fixed-depth convolutions to run in parallel across entire documents. They embody a distinct combination of network structure, parameter sharing and training procedures that enable dramatic 14-20x test-time speedups while retaining accuracy comparable to the Bi-LSTM-CRF. Second, I will present Linguistically-Informed Self-Attention (LISA, EMNLP 2018 Best Long Paper), a neural network model that combines multi-head self-attention with multi-task learning across dependency parsing, part-of-speech tagging, predicate detection and SRL. Unlike previous models which require significant pre-processing to prepare syntactic features, LISA can incorporate syntax using merely raw tokens as input, encoding the sequence only once to simultaneously perform parsing, predicate detection and role labeling for all predicates. Syntax is incorporated through the attention mechanism, by training one of the attention heads to focus on syntactic parents for each token. We show that incorporating linguistic structure in this way leads to substantial improvements over the previous state-of-the-art (syntax-free) neural network models for SRL, especially when evaluating out-of-domain, where LISA obtains nearly 10% reduction in error while also providing speed advantages.Emma Strubell is a final-year PhD candidate in the College of Information and Computer Sciences at UMass Amherst, advised by Andrew McCallum. Her research aims to provide fast, accurate, and robust natural language processing to the diversity of academic and industrial investigators eager to pull insight and decision support from massive text data in many domains. Toward this end she works at the intersection of natural language understanding, machine learning, and deep learning methods cognizant of modern tensor processing hardware. She has applied her methods to scientific knowledge bases in collaboration with the Chan Zuckerberg Initiative, and to advanced materials synthesis in collaboration with faculty at MIT. Emma has interned as a research scientist at Amazon and Google and received the IBM PhD Fellowship Award. She is also an active advocate for women in computer science, serving as leader of the UMass CS Women’s group where she co-organized and won grants to support cross-cultural peer mentoring, conference travel grants for women, and technical workshops. Her research has been recognized with best paper awards at ACL 2015 and EMNLP 2018.
Refreshments will be served at 4:00 in the 5th floor kitchen area.