Speaker: Caio Corro
Speaker bio: https://caio-corro.frTitle: Learning a neural parser in a low-resource scenario with a structured latent variable model
Abstract:Discrete structures such as dependency trees are often used to inject prior linguistic knowledge into statistical models. Many systems are built on top of a pipeline that starts with predicting a linguistic structure (e.g., syntactic or semantic representations) using a parser and then makes a task-specific prediction relying on this predicted structure (e.g., choose a polarity label in sentiment analysis). Unfortunately, most parsers rely on large amounts of manually-annotated data for training, which is available only for a small fraction of languages and domains.
Therefore, it is appealing to rely on other forms of supervision to learn the parameters of the parser. On the one hand, raw text data is available in many languages. It can be used for semi-supervised learning to complement a small set of available annotated data. On the other hand, even when annotated data is not available, assuming a structured representations of sentences can be beneficial, as it provides inductive biases about the structure of the language. In this case, we want to induce task-specific structured representations of language in such a way as to benefit a given downstream task. In other words, an inductive bias is injected in the model, i.e. structures are good for natural languages, but no assumption is made about the appropriate content: the parser is trained end-to-end while optimizing performance on the downstream task.
In practice, structures induced in this way tend not to resemble any accepted syntactic or semantic formalism as it lets the model induce the one which is better suited for the particular downstream task.
In this talk, I will explain how both problems can be cast as learning the parameters of a statistical model with structured latent variables. During training, exact inference in these models requires marginalizing over latent variables which is intractable (e.g. summing over all dependency trees for a given sentence). Recently, differentiable Monte-Carlo estimation (i.e. the reparametrization trick) has been explored for training statistical models parametrized with neural networks. We follow this line of work and introduce a differentiable relaxation which we use to approximate samples and compute gradients with respect to the parser parameters. Our method (Differentiable Perturb-and-Parse) relies on differentiable dynamic programming over stochastically perturbed arc weights. We show the effectiveness of our approach on several tasks and datasets.