SIGIR'07 Workshop PAN
Plagiarism Analysis, Authorship Identification, and Near-Duplicate
Detection
-- http://www.aisearch.de/pan-07 --
In conjunction with the 30th Annual International ACM SIGIR Conference
on
Research & Development on Information Retrieval, Amsterdam, 23-27 July
2007.
--------------------------------------------------------------------------
ABOUT THIS WORKSHOP:
The workshop shall bring together experts and prospective researchers
around the
exciting and future-oriented topic of plagiarism analysis, authorship
identification, and high similarity search. This topic receives
increasing
attention, which results, among others, from the fact that information
about
nearly any subject can be found on the World Wide Web. At first sight,
plagiarism, authorship, and near-duplicates may pose very different
challenges;
however, they are closely related in several technical respects.
Plagiarism analysis is a collective term for computer-based methods to
identify
a plagiarism offense. In connection with text documents we distinguish
between
corpus-based and intrinsic analysis: the former compares suspicious
documents
against a set of potential original documents, the latter identifies
potentially
plagiarized passages by analyzing the suspicious document with respect
to
changes in writing style.
Authorship identification divides into so-called attribution and
verification
problems. In the authorship attribution problem, one is given examples
of the
writing of a number of authors and is asked to determine which of them
authored
given anonymous texts. In the authorship verification problem, one is
given
examples of the writing of a single author and is asked to determine
if given
texts were or were not written by this author. Authorship verification
and
intrinsic plagiarism analysis represent two sides of the same coin.
Near-duplicate detection is mainly a problem of the World Wide Web:
duplicate
Web pages increase the index storage space of search engines, slow
down result
serving, and decrease the retrieval precision. Near-duplicate
detection relates
directly to plagiarism analysis: at the document level, near-duplicate
detection
and plagiarism analysis represent also two sides of the same coin. For
a
plagiarism analysis at the paragraph level, the same specialized
document models
(e.g. shingling, fingerprinting, hashing) can be applied, where a key
problem
is the selection of useful chunks from a document.
The development of new solutions for the outlined problems may benefit
from the
combination of existing technologies, and in this sense the workshop
provides a
platform that spans different views and approaches. The following list
gives
examples from the outlined field for which contributions are welcome
(but not
restricted to):
- retrieval models for plagiarism analysis, authorship
identification, and style analysis
- software plagiarism, cross-language plagiarism, plagiarism in Web
communities
and social networks
- NLP technologies for authorship identification and style analysis
- knowledge-based methods for plagiarism analysis and authorship
identification
- handling proper citation
- methods for identifying near-duplicate and versioned documents
(for all kinds
of contents, including text, source code, image, and music
documents)
- shingling, fingerprinting, and similarity hashing
- hash-based search, high-dimensional search, approximate nearest
neighbor search
- efficiency issues and performance tradeoffs
- tailored indexes for plagiarism analysis and near-duplicate
detection
- plagiarism analysis and near-duplicate detection on the Web
- evaluation, building of test collections, experimental design and
user studies
IMPORTANT DATES:
Deadline for paper submission May 27, 2007
Notification to authors June 24, 2007
Camera-ready copy due July 1, 2007
Workshop opens July 27, 2007
Contributions will be peer-reviewed by experts from the related field.
WORKSHOP ORGANIZATION:
Benno Stein, Bauhaus University Weimar
Moshe Koppel, Bar-Ilan University, Israel
Efstathios Stamatatos, University of the Aegean
Contact: pan...@aisearch.de
URL: http://www.aisearch.de/pan-07
PROGRAM COMMITTEE:
Shlomo Argamon, Illinois Institute of Technology
Yaniv Bernstein, Google Switzerland
Dennis Fetterly, Microsoft Research
Graeme Hirst, University of Toronto
Timothy Hoad, Microsoft
Heiko Holzheuer, Lycos Europe
Jussi Karlgren, Swedish Institute of Computer Science
Hans Kleine Buening, University of Paderborn
Moshe Koppel, Bar-Ilan University, Israel
Hermann Maurer, University of Technology Graz
Sven Meyer zu Eissen, Bauhaus University Weimar
Efstathios Stamatatos, University of the Aegean
Benno Stein, Bauhaus University Weimar
Ozlem Uzuner, State University of New York
Debora Weber-Wulff, University of Applied Sciences Berlin
Justin Zobel, RMIT University
[ comp.ai is moderated ... your article may take a while to appear. ]