KBA participants:
The KBA Organizers are very pleased to announce that we will have four
excellent talks at TREC on Thursday 11/20 10:45 a.m.-12:30 p.m. in the
Green Auditorium at NIST.
In this specialized audience, the speakers will be focusing on the core
issues in the task and and their diverse ideas & approaches.
Check out the abstracts below!
Talk 1: 10:45 - 11:10
Title: Use of Time-Aware Language Model in Entity Driven Filtering System
Presenter Name: Vincent Bouvier
Coauthor: Patrice Bellot
Presenter Affiliation: Kware / LSIS
Abstract: Tracking entities, so that new or important information about
that entities is caught, is a real challenge and has many applications
(e.g., information monitoring, marketing,... ). We are interesting in how
to represent an entity profile to fulfill two purposes: 1. entity
detection and disambiguation, 2. novelty and interestedness detection. We
propose to use two language model as part of entity profiles. The
Reference Language Model (RLM) is mainly used for disambiguation. We also
formalize a Time-Aware Language Model, which is mainly used for novelty
detection. To rank documents, we propose to use a semi-supervised
classification approach which uses meta-features computed on documents
using entity profiles as well as time series.
Talk 2: 11:10 - 11:35
Title: Streaming Document Filtering using Distributed Non-Parametric
Representations
Presenter Name: Ignacio Cano
Coauthors: Sameer Singh and Carlos Guestrin
Presenter Affiliation: University of Washington
Abstract: When dealing with large, streaming corpora of text documents,
practitioners are often interested in identifying references to entities
of interest, and studying their prominence and topics over time. Current
tools are quite restrictive for this setting; they are unable to handle
streaming data, do not partition the references according to topics, and
do not identify the vitality of the references.
In this work, we propose a system that uses a flexible representation of
entity contexts that are updated in a streaming fashion. Each entity
context is represented by topic clusters, that are estimated in a
non-parametric manner by assuming that the context of each entity in a
single document belongs to a single topic. To address the lexical sparsity
and generalize to unseen documents, each document is represented by its
mean word embedding, while each topic cluster is represented by the mean
embedding vector of the documents in the cluster. Further, we associate a
staleness measure to each topic cluster, dynamically estimating the
relevance of each entity based on document frequencies. We update the
topic identities, number of topics, and the staleness of topics in an
online fashion, observing only a single document at a time.
This combination of non-parametric clustering, staleness, and distributed
word embeddings provides an efficient yet accurate representation of
entity contexts that can be updated in a streaming manner.
From UW Task 3 submission: Our browser-based visualization demo provides
an easy to use interface that enables users to switch between multiple
entities of interest, select the time ranges to explore over, explore the
prominence of topics over time, and understand the topics using word
clouds.
Code Repository:
https://github.com/sameersingh/er-visualizer
Talk 3: 11:35 - 12:00
Title: BUPT_PRIS at TREC 2014 Knowledge Base Acceleration Track
Presenter Name: Yuanyuan Qi
Coauthors: Ye Xu and Dongxu Zhang
Presenter Affiliation: Pattern Recognition and Intelligent System Lab.,
Beijing University of Posts and Telecommunications
Abstract: We treat the Vital Filtering (VF) task as a classification task
much more than Information Retrieval task, so we use three different ways
to classify the vital documents: 1. Support Vector Machine (SVM); 2.
Random Forest (RF); 3. K-Nearest Neighbor (KNN) and submit all the results
generated by the 3 ways.
The Streaming Slot Filling task, our system achieved the goal of filling
slots by employing a pattern learning and matching method. We found
patterns of slots which are same to TAC-KBP by using KBP training data,
and then used bootstrapping method with only single iteration to recall
more patterns and also make patterns suit for KBA corpus. We generated
slot answers by matching the patterns and ranking the candidate answers by
scoring them with the patterns it was matched. Specially, we manually
picked up some training seeds for those slot types that KBP did not
contain to use bootstrapping method.
Talk 4: 12:00 - 12:25
Title: MSR KMG at TREC 2014 KBA Track Vital Filtering Task
Presenter Name: Chin-Yew Lin
Coauthors: Jingtian Jiang, Chin-Yew Lin, and Yong Rui
Presenter Affiliation: Microsoft Research
Abstract: Our strategy for vital filtering is to first retrieve as many
relevant documents as possible and then apply classification and ranking
methods to differentiate vital documents from the non-vital documents. We
first index the corpus and retrieve documents containing a given entity
name. This is also our baseline method. Then we proposed many features and
performed classification and ranking for entity-document pairs. We mainly
leveraged four kinds of features in our classification and ranking models:
1) action patterns: the entity name and its associated verb in the
sentence mentioning the entity, 2) local profile of the entity mention,
e.g., the title, profession, and address appearing in the document, 3)
temporal burst of entity mentions, and 4) time range of the documents, the
earlier documents get a higher score than the later documents. Our
experiment results showed that the features were quite powerful, and our
system significantly outperformed the baseline.
--
KBA Organizers