Hi all, our next talk will be Wednesday, 10/14 at 4pm in CEPSR 7LE4. We're hosting a talk by Tasha Nagamine on her recent work on understanding how DNNs represent different phonemes when used for speech recognition. A talk title and abstract follows, please forward to anyone interested, thanks!
Exploring How Deep Neural Networks Form Phonemic Categories
Tasha Nagamine, Columbia University
10/14/15, 4pm, CEPSR 7LE4
Deep neural networks (DNNs) have become the dominant technique for acoustic-phonetic modeling due to their markedly improved performance over other models. Despite this, little is understood about the computation they implement in creating phonemic categories from highly variable acoustic signals. In this paper, we analyzed a DNN trained for phoneme recognition and characterized its representational properties, both at the single node and population level in each layer. At the single node level, we found strong selectivity to distinct phonetic features in all layers. Node selectivity to specific manners and places of articulation appeared from the first hidden layer and became more explicit in deeper layers. Furthermore, we found that nodes with similar phonetic feature selectivity were differentially activated to different exemplars of these features. Thus, each node be- comes tuned to a particular acoustic manifestation of the same feature, providing an effective representational basis for the formation of invariant phonemic categories. This study reveals that phonetic features organize the activations in different layers of a DNN, a result that mirrors the recent findings of feature encoding in the human auditory system. These insights may provide better understanding of the limitations of current models, leading to new strategies to improve their performance.