[Apologies for multiple postings]
-------------------------------------------------------------------------
After addressing at the FIRE evaluation forum the problem of
plagiarism detection in SOurce COde (SOCO) in 2014, also from a
Cross-Language perspective (CL-SOCO) in 2015, and Personality
Recognition in SOCO (PR-SOCO) in 2016, this year we will address the
problem of Authorship Identification of SOurce COde (AI-SOCO).
Website:
https://sites.google.com/view/ai-soco-2020/
To be organized at FIRE 2020 (
http://fire.irsi.res.in/fire/2020/home)
10 - 13 December
Virtual Conference
-------------------------------------------------------------------------------
Task Description:
General authorship identification is essential to the detection of
undesirable deception of others' content misuse or exposing the owners
of some anonymous hurtful content. This is done by revealing the
author of that content. Authorship Identification of SOurce COde
(AI-SOCO) focuses on uncovering the author who wrote some piece of
code. This facilitates solving issues related to cheating in academic,
work and open source environments. Also, it can be helpful in
detecting the authors of malware softwares over the world.
The dataset is composed of source codes collected from the open
submissions in the Codeforces online judge. Codeforces is an online
judge for hosting competitive programming contests such that each
contest consists of multiple problems to be solved by the
participants. A Codeforces participant can solve a problem by writing
a solution for it using any of the available programming languages on
the website, and then submitting the solution through the website. The
solution's result can be correct (accepted) or incorrect (wrong
answer, time limit exceeded, etc.).
In our dataset, we selected 1000 users and collected 100 source codes
from each one. So, the total number of source codes is 100,000. All
collected source codes are correct and written using the C++
programming language. For each user, all collected source codes are
from unique problems.
Given the pre-defined set of source codes and their writers, the task
participants should build systems that are able to detect the writer
given any new, unseen before source codes from the previously defined
writers list.
Full task description can be found at:
https://sites.google.com/view/ai-soco-2020/
------------
Timeline
------------
8th June – Open track websites
8th June – Training and development data release
31st July – Test data release
7th September – Run submission deadline
20th September – Results declared
31st October – Working notes and overview papers due (tentative)
10th-13th December – FIRE 2020
----------------
Organizers
----------------
Ali Fadel, Jordan University of Science and Technology, Jordan
Husam Musleh, Jordan University of Science and Technology, Jordan
Ibraheem Tuffaha, Jordan University of Science and Technology, Jordan
Mahmoud Al-Ayyoub, Jordan University of Science and Technology, Jordan
Yaser Jararweh, Duquesne University, USA
Elhadj Benkhelifa, Staffordshire University, UK
Paolo Rosso, Universitat Politècnica de València, Spain
For regular updates subscribe to our mailing list:
ai-soc...@googlegroups.com
Regards,
The organizers of the PAN AI-SOCO shared task @ FIRE 2020