(How) should we use domain knowledge in the era of deep learning? (A perspective from speech processing)
Deep neural networks are the new default machine learning approach in many domains, such as computer vision, speech processing, and natural language processing. Given sufficient data for a target task, end-to-end models can be learned with fairly simple, almost universal algorithms. Such models learn their own internal representations, which in many cases appear to be similar to human-engineered ones. This may lead us to wonder whether domain-specific techniques or domain knowledge are needed at all.
This talk will provide a perspective on these issues from the domain of speech processing. It will describe two lines of work attempting to take advantage of domain knowledge without compromising the benefits of deep learning: (1) hierarchical multitask learning and (2) cross-modal representation learning. The main applications will be to speech recognition, but the techniques discussed are general.
Karen Livescu is an Associate Professor at TTI-Chicago. She completed her PhD and post-doc in electrical engineering and computer science at MIT and her Bachelor’s degree in physics at Princeton University. Her main research interests are in the intersection of speech and language processing and machine learning. Her recent work includes multi-view representation learning, acoustic word embeddings, visually grounded speech modeling, and automatic sign language recognition. Her recent professional activities include serving as a member of the IEEE Spoken Language Technical Committee, an associate editor for IEEE Transactions on Audio, Speech, and Language Processing, a technical co-chair of ASRU 2015 and 2017, and a program co-chair of ICLR 2019.