KBAers,
Please read and respond.
We will hold open discussion on the SSF target slots until next Friday
April 19th. Please give us reactions and any suggested changes.
If you want to see another slot type added, speak up now.
Tomorrow morning, the query entities and target slots will be posted to
the Active Participants section of
trec.nist.gov
Look for this tarball:
trec-kba-ccr-and-ssf-2013-04-08.d41d8cd98f00b204e9800998ecf8427e.tar.gz
Here is the README from that tarball.
Regards,
The KBA Organizers
KBA 2013
========
The list of KBA 2013 entities and target slots to fill is enclosed in
this tarball. The format is:
http://trec-kba.org/schemas/v1.1/filter-topics.json
We will hold open discussion on the SSF target slots until next Friday
April 19th. Please give us reactions and any suggested changes.
If you want to see another slot type added, speak up now.
Entities for both CCR and SSF
-----------------------------
Entities come from both Wikipedia and Twitter.
Since Twitter does not offer a name-expansion API, it is acceptable to
manually examine the twitter profile page to identify alternate names
for these entities. This is still considered "run_type": "automatic",
because a human entering this entity as a query could easily be asked
to examine the twitter profile page (and no other texts).
The training time range (TTR) runs through the end of February 2012.
Some entities have no vital examples in the TTR.
For example, some entities only appear in the corpus when they die in
the second half of 2012.
You will notice that the proportion of judged documents increases
signficantly some time after the TTR, and yet, the number of vitals
does not. This is because the spinn3r portion of the corpus has more
re-visits to certain URLs, so we enabled the assessors to assign
ratings to these pages in bulk. We also filtered out many of these
re-visits.
To optimize assessor resources in the first assessment period (March
2013), we did not judge every document fo four of the larger new
entities not any of the entities from KBA 2012. If time permits
during the June/July assessor window, we will conduct more vital
judging on pooled assertions from run submissions for all entities.
Streaming Slot Filling
----------------------
Each entity is one of these three types:
Facility (FAC)
Person (PER)
Organization (ORG)
The entity_type determines the target slots to fill:
type PER: Affiliate, Contact_Meet_PlaceTime, AwardsWon, DateOfDeath, CauseOfDeath, Titles, FounderOf, EmployeeOf
type FAC: Affiliate, Contact_Meet_Entity
type ORG: Affiliate, TopMembers, FoundedBy
Three of these slots require special explanation (below). The other
values are directly from TAC-KBP or ACE definitions:
http://www.nist.gov/tac/2012/KBP/task_guidelines/TAC_KBP_Slots_V2.4.pdf
http://projects.ldc.upenn.edu/ace/docs/English-Events-Guidelines_v5.4.3.pdf
Since this is *streaming* slot filling, we are only interested in new
slot values that were not substantiated earlier in the streamcorpus,
which ranges from October 2011 to February 2013. In examining the
Vital-rated documents identified by assessors, we observed that many
of the interesting events in the lives of these people and
organizations, did not easily fit existing slots in KBP or ACE.
Rather than invent specific new slots, we propose these three
generalized slots classes.
1) "Affiliate" is any type of relation that *directly* connects the
target entity to another entity of any type. This is the union of all
close relations, such as StudentOf, EmployeeOf, MemberOf, and their
inverses from ACE and KBP, and similar relations in which the relation
is of a simple unambiguous type. We propose to use this generalized
notion of "close" relation as the base class for all the PER-PER and
PER-FAC and PER-ORG relations. Instead of enumerating all possible
such close relations, we intend to allow KBA systems to generate many
such examples, which assessors will then judge in the post-hoc pooled
assessment scheduled for June 17-July 17 2013.
While this is somewhat in the spirit of open IE, our aim is
specifically to enable upstream filtering systems to down-select the
stream for feeding KB population systems operating with fixed
inventories of slot types, which will typically be more specific than
Affiliate.
Since little annotation has been performed for "Affiliate", we are
concerned that it could generate a lot of noise. We welcome feedback
and ideas on how to refine this.
Example 1: "Matthew DeLorenzo and Josiah Vega, both 14 years old and
students at Elysian Charter School, were honored Friday morning by
C-SPAN and received $1,500 as well as an iPod Touch after winning a
nationwide video contest."
target_id:
http://en.wikipedia.org/wiki/Elysian_Charter_School
Affiliate: "Matthew DeLorenzo"
Affiliate: "Josiah Vega"
NOT an affiliate: "C-SPAN"
NOT an affiliate: "iPod Touch"
Example 2: "Allen Curtis, administrator, Citrus Health and Rehab, and
Inverness Mayor Bob Plaisted, standing from left in rear, joined the
Joyner family in celebrating the 104th birthday of Edith Joyner on
Wednesday at Citrus Health and Rehab."
target_id:
https://twitter.com/bobplaisted
Affiliate: "Allen Curtis"
Affiliate: "the Joyner family"
Affiliate: "Edith Joyner"
NOT an Affiliate: "Citrus Health and Rehab"
That is, merely being present at a place once does not substantiate an
Affiliation. Affiliation is closer than Meeting(place-time).
Example 3: "Veteran songwriters and performers Ben Mason, Jeff
Severson and Jeff Smith will perform on Saturday, April 14 at 7:30 pm
at Creative Cauldron at ArtSpace, 410 S. Maple Avenue."
target_id:
http://en.wikipedia.org/wiki/Jeff_Severson
Affiliate: "Ben Mason"
Affiliate: "Jeff Severson"
Affiliate: "Jeff Smith"
NOT an Affiliate: "Creative Caldron"
NOT an Affiliate: "Art Space"
Example 4: "Driftwagon has teamed up with Peruvian designer
Dunkelvolk."
target_id:
http://en.wikipedia.org/wiki/Dunkelvolk
Affliate: "Driftwagon"
2) Contact_Meet_Entity is a catch-all slot for Facilities. It is a super
set including any event in which one or more entities (of any type) are present
at the target facility.
Example: "The Senior Wellness Coalition of Fargo-Moorhead will host a
wellness seminar from 1 to 3 p.m. March 28 at the Hjemkomst Center,
202 1st Ave."
target_id:
http://en.wikipedia.org/wiki/Hjemkomst_Center
Contact_Meet_Entity: "The Senior Wellness Coalition of Fargo-Moorhead"
3) Contact_Meet_PlaceTime is a catch-all slot for Persons. It is a super
set including any event in which the target entity is present at a
particular place at a particular time. For SSF, we want short
passages that a human or downstream algorithm can digest to generate
structured location/date-time values.
The place or time might not specified in the text. When this happens,
just emit the best available string.
Example: "Lt. Gov. Drew Wrigley and Robert Wefald, a retired North Dakota
district judge and former state attorney general, unveiled the crest Friday
during a ceremony at the North Dakota Capitol."
target_id:
http://en.wikipedia.org/wiki/Hjemkomst_Center
Contact_Meet_PlaceTime: "Friday during a ceremony at the North Dakota Capitol"
Output Format
-------------
We have added an optional component to the run submission format
described here:
http://trec-kba.org/trec-kba-2013.shtml
Specfically:
ninth column: slot name from the TAC KBP slot ontology. Used in
SSF. Runs for CCR should use 'NULL' in this field. This field most
have a string from the list below. Optionally, this field may contain
a second string separated from the first by a colon ":", where the
second string is a system-selected name for a sub-type or variant of
the target slot. This will not be used in scoring and is provided
solely for the purpose of allowing systems to output more information
about the algorithm's perspective on the slot. This field must not
contain any spaces.