Divendres 7 de febrer: Audi Primadhanty i Pranava Swaroop Madhyastha - Probabilistic Inference for Weakly-Supervised Entity-Relation - Learning Word Embeddings for Language Modelling

1 view
Skip to first unread message

Xavier Lluís

unread,
Feb 3, 2014, 2:15:02 PM2/3/14
to semina...@googlegroups.com

Hola!

Us informem que el proper divendres 7 de febrer tenim programat dos seminaris del TALP a càrrec d'Audi Primadhanty i Pranava Swaroop Madhyastha, a l'aula S208 de l'Edifici Omega del Campus Nord de la UPC. Aquests seminaris seran un assaig de la defensa de Pla de Recerca i Proposta de Tesi.

L'hora d'inici del seminari seran les 12:00.

Aquests són els detalls dels seminaris:

Títol Probabilistic Inference for Weakly-Supervised Entity-Relation
i
Learning Word Embeddings for Language Modelling
Ponent Audi Primadhanty i Pranava Swaroop Madhyastha
Lloc Omega-S208 Campus Nord - UPC
Dia 7 Febrer 2014
Horari de 12:00h a 14:00h
Abstract Probabilistic Inference for Weakly-Supervised Entity-Relation Extraction

We investigate the task of extracting entities and relations from text documents given only a few examples of desired entities and relations. The task is relevant for information extraction in new, open domains where the availability of annotated corpus is negligible or expensive to obtain. We begin with the task of named entity classification by proposing a probabilistic generative model that uses hidden states. the purpose of hidden states is to capture commonalities of the contexts in which entities of different types appear. Our hope is that this model will have improved robustness when it comes to recognize unseen entities.
Our aim is to further extend such techniques for extracting relations in any domain for specific target entities and relations in a large unlabeled corpus, requiring only few examples for each entity and relation type.

Abstract Learning Word Embeddings for Language Modelling In Natural Language Processing, state-of-the-art systems for tasks such as parsing, semantic role labeling, word-sense disambiguation, etc. make use of lexical features. Most of these systems are trained using annotated corpus, which are used to gather statistics about each lexical item and its linguistic relations. However, even for large annotated corpora, it is unlikely to observe each lexical item in the context of all its possible relations. In this setting, one would like to exploit a notion of word similarity, and assume that similar words have similar behaviour.

The focus of this thesis proposal is to formulate statistical models that improve performance on linguistic prediction tasks by making use of distributional word space representations. In particular, we are interested in designing computationally efficient and robust learning algorithms for lexical embeddings that use a combination of both supervised training methods and unsupervised training methods that use a large text corpus to induce a distributional representation. We present preliminary experiments to infer usefulness and proof of concept of the proposed approach.

Us recordem que tota la informació dels propers seminaris i dels seminaris passats (incloses les transparències), la podeu trobar a la web dels seminaris del TALP.

Fins aviat!

Pranava i Xavi

Xavier Lluís

unread,
Feb 6, 2014, 6:53:50 AM2/6/14
to semina...@googlegroups.com

Hola!

Us recordem que el proper divendres 7 de febrer tenim programats dos seminaris del TALP a càrrec d'Audi Primadhanty i Pranava Swaroop Madhyastha, a l'aula S208 de l'Edifici Omega del Campus Nord de la UPC. Aquests seminaris seran un assaig de la defensa de Pla de Recerca i Proposta de Tesi.

L'hora d'inici del seminari seran les 12:00.

Aquests són els detalls dels seminaris:

Títol Probabilistic Inference for Weakly-Supervised Entity-Relation
i
Learning Word Embeddings for Language Modelling
Ponent Audi Primadhanty i Pranava Swaroop Madhyastha
Lloc Omega-S208 Campus Nord - UPC
Dia 7 Febrer 2014
Horari de 12:00h a 14:00h
Abstract Probabilistic Inference for Weakly-Supervised Entity-Relation Extraction

We investigate the task of extracting entities and relations from text documents given only a few examples of desired entities and relations. The task is relevant for information extraction in new, open domains where the availability of annotated corpus is negligible or expensive to obtain. We begin with the task of named entity classification by proposing a probabilistic generative model that uses hidden states. The purpose of hidden states is to capture commonalities of the contexts in which entities of different types appear. Our hope is that this model will have improved robustness when it comes to recognize unseen entities.

Our aim is to further extend such techniques for extracting relations in any domain for specific target entities and relations in a large unlabeled corpus, requiring only few examples for each entity and relation type.

Abstract Learning Word Embeddings for Language Modelling In Natural Language Processing, state-of-the-art systems for tasks such as parsing, semantic role labeling, word-sense disambiguation, etc. make use of lexical features. Most of these systems are trained using annotated corpus, which are used to gather statistics about each lexical item and its linguistic relations. However, even for large annotated corpora, it is unlikely to observe each lexical item in the context of all its possible relations. In this setting, one would like to exploit a notion of word similarity, and assume that similar words have similar behaviour.

The focus of this thesis proposal is to formulate statistical models that improve performance on linguistic prediction tasks by making use of distributional word space representations. In particular, we are interested in designing computationally efficient and robust learning algorithms for lexical embeddings that use a combination of both supervised training methods and unsupervised training methods that use a large text corpus to induce a distributional representation. We present preliminary experiments to infer usefulness and proof of concept of the proposed approach.

Reply all
Reply to author
Forward
0 new messages