Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

CFP: SIGIR'07 Workshop PAN. Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection

13 views

Skip to first unread message

st...@upb.de

unread,

Apr 17, 2007, 5:15:36 AM4/17/07

1st CALL FOR PAPERS

SIGIR'07 Workshop PAN

Plagiarism Analysis, Authorship Identification, and Near-Duplicate
Detection

-- http://www.aisearch.de/pan-07 --

In conjunction with the 30th Annual International ACM SIGIR Conference
on
Research & Development on Information Retrieval, Amsterdam, 23-27 July
2007.

--------------------------------------------------------------------------

ABOUT THIS WORKSHOP:

The workshop shall bring together experts and prospective researchers
around the
exciting and future-oriented topic of plagiarism analysis, authorship
identification, and high similarity search. This topic receives
increasing
attention, which results, among others, from the fact that information
about
nearly any subject can be found on the World Wide Web. At first sight,
plagiarism, authorship, and near-duplicates may pose very different
challenges;
however, they are closely related in several technical respects.

Plagiarism analysis is a collective term for computer-based methods to
identify
a plagiarism offense. In connection with text documents we distinguish
between
corpus-based and intrinsic analysis: the former compares suspicious
documents
against a set of potential original documents, the latter identifies
potentially
plagiarized passages by analyzing the suspicious document with respect
to
changes in writing style.

Authorship identification divides into so-called attribution and
verification
problems. In the authorship attribution problem, one is given examples
of the
writing of a number of authors and is asked to determine which of them
authored
given anonymous texts. In the authorship verification problem, one is
given
examples of the writing of a single author and is asked to determine
if given
texts were or were not written by this author. Authorship verification
and
intrinsic plagiarism analysis represent two sides of the same coin.

Near-duplicate detection is mainly a problem of the World Wide Web:
duplicate
Web pages increase the index storage space of search engines, slow
down result
serving, and decrease the retrieval precision. Near-duplicate
detection relates
directly to plagiarism analysis: at the document level, near-duplicate
detection
and plagiarism analysis represent also two sides of the same coin. For
a
plagiarism analysis at the paragraph level, the same specialized
document models
(e.g. shingling, fingerprinting, hashing) can be applied, where a key
problem
is the selection of useful chunks from a document.

The development of new solutions for the outlined problems may benefit
from the
combination of existing technologies, and in this sense the workshop
provides a
platform that spans different views and approaches. The following list
gives
examples from the outlined field for which contributions are welcome
(but not
restricted to):

- retrieval models for plagiarism analysis, authorship
identification, and style analysis
- software plagiarism, cross-language plagiarism, plagiarism in Web
communities
and social networks
- NLP technologies for authorship identification and style analysis
- knowledge-based methods for plagiarism analysis and authorship
identification
- handling proper citation

- methods for identifying near-duplicate and versioned documents
(for all kinds
of contents, including text, source code, image, and music
documents)
- shingling, fingerprinting, and similarity hashing
- hash-based search, high-dimensional search, approximate nearest
neighbor search
- efficiency issues and performance tradeoffs

- tailored indexes for plagiarism analysis and near-duplicate
detection
- plagiarism analysis and near-duplicate detection on the Web
- evaluation, building of test collections, experimental design and
user studies

IMPORTANT DATES:

Deadline for paper submission May 27, 2007
Notification to authors June 24, 2007
Camera-ready copy due July 1, 2007
Workshop opens July 27, 2007

Contributions will be peer-reviewed by experts from the related field.

WORKSHOP ORGANIZATION:

Benno Stein, Bauhaus University Weimar
Moshe Koppel, Bar-Ilan University, Israel
Efstathios Stamatatos, University of the Aegean

Contact: pan...@aisearch.de
URL: http://www.aisearch.de/pan-07

PROGRAM COMMITTEE:

Shlomo Argamon, Illinois Institute of Technology

Yaniv Bernstein, Google Switzerland

Dennis Fetterly, Microsoft Research

Graeme Hirst, University of Toronto

Timothy Hoad, Microsoft

Heiko Holzheuer, Lycos Europe

Jussi Karlgren, Swedish Institute of Computer Science

Hans Kleine Buening, University of Paderborn

Moshe Koppel, Bar-Ilan University, Israel

Hermann Maurer, University of Technology Graz

Sven Meyer zu Eissen, Bauhaus University Weimar

Efstathios Stamatatos, University of the Aegean

Benno Stein, Bauhaus University Weimar

Ozlem Uzuner, State University of New York

Debora Weber-Wulff, University of Applied Sciences Berlin

Justin Zobel, RMIT University

[ comp.ai is moderated ... your article may take a while to appear. ]

0 new messages