Finding information about people in the World Wide Web is one of the most common activities of Internet users. Person names, however, are highly ambiguous. In most cases, the results for a person name search are a mix of pages about different people sharing the same name. The user is then forced either to add terms to the query (probably losing recall and focusing on one single aspect of the person), or to browse every document in order to filter the information about the person he/she is actually looking for. In an ideal system the user would simply type a person name, and receive search results clustered according to the different people sharing that name.
In 2007 the Web People Search Task (Artiles et al. 2007) was the first competitive evaluation focused on this problem. The 16 participating systems received a set of web pages for a person name, and they had to cluster them into different entities. This second evaluation provides a new testbed corpus, improved evaluation metrics, and an additional attribute extraction subtask.
In this task systems receive as input a set of web search results obtained when
performing a query for an (ambiguous) person name. The expected output is a clustering
of the web pages, where each cluster is assumed to contain all (and only those) pages
that refer to the same individual.
This subtask consists of extracting 18 kinds of "attribute values" for target individuals
whose names appear on each of the provided Web pages. The organizers will distribute
the target Web pages in their original format (i.e., html), and the participant systems
have to extract attribute values from each page.
Complete guidelines and data
The clustering and the attribute extraction task will be regarded as two separate subtasks, and therefore a team can choose to participate in only one or both of them. The organizers will provide annotated data for developing/training systems. On a second stage, an unannotated corpus will be distributed, systems output will be collected and evaluation results returned to the participants. Each team can submit up to five runs. Every team is expected to write a paper describing their system and discussing the evaluation results.
Please send an email expressing your interest to the task organizers (weps-or...@lsi.uned.es).
Updated information about the task can be found at the WePS web site (http://nlp.uned.es/weps).