| Títol |
Query expansion for mixed-script information
retrieval |
| Ponent |
Parth Gupta
|
| Lloc |
Omega-S208 Campus Nord - UPC |
| Dia |
Dimarts 8 de setembre de 2015 |
| Horari |
12:30h - Presentació |
| Abstract |
For many languages that use non-Roman based indigenous scripts
(e.g., Arabic, Greek and Indic languages) one can often find a
large amount of user generated transliterated content on the Web
in the Roman script. Such content creates a monolingual or
multi-lingual space with more than one script which is referred
as the Mixed-Script space. IR in the mixed-script space is
challenging because queries written in either the native or the
Roman script need to be matched to the documents written in both
the scripts. Moreover, transliterated content features extensive
spelling variations. In this talk, the concept of Mixed-Script
IR will be introduced and through analysis of the query logs of
Bing search engine, we estimate the prevalence and thereby
establish the importance of this problem. The talk will also
cover a deep-learning based principled solution to the term
modelling challenge where the Mixed-Script terms are modelled
jointly through deep-autoencoder.
|
| Bio |
Parth Gupta (http://www.dsic.upv.es/~pgupta/)
is a PhD student at Technical University of Valencia (UPV),
Spain and a researcher at PRHLT Research Center. His research
area is at the intersection of Information Retrieval, Natural
Language Processing and Machine Learning. Most recently he is
working on Deep Learning architecture to learn abstract
representation of text and terms across the languages and
scripts. Deep Autoencoder based approach proposed in his PhD
work won the Transliterated Search shared task at FIRE 2013
organised by Microsoft Research. Currently, he is an Applied
Scientist Intern at Microsoft Bing working with query
formulation team in London. He has published in reputed
conferences and journals such as SIGIR, COLING, ECIR,
Knowledge-Based Systems, Neurocomputing. In the past, he has
worked at FBK Research Center (Trento, Italy), Institute of
Infocomm Research (Singapore) and Microsoft Research (India) as
Research Intern. He is also associated with the open-source
search engine library - Xapian, where he contributed the
learning-to-rank module as a Google Summer of Code student (GSoC
2011) and later as a mentor (GSoC 2012,2014). |