Code-sniffer: A python package to detect plagiarism in submitted assignements

2,906 views
Skip to first unread message

Dilawar Singh

unread,
Feb 2, 2014, 10:11:23 AM2/2/14
to wncc...@googlegroups.com

This is poor-man's MOSS. It can detect most kind to cheating done in assignment. It has some success in detecting copying in pdf files given that python library `pdfmider` is able to convert the pdf to a text file.
 
Pypi : https://pypi.python.org/pypi/code-sniffer

github: www.github.com/dilawar/sniffer

You need to download the zip file from moodle and unzip in a directory. Below that directory each student have his own directory. Student directory can be nested.  For example, if my path of assignment is /path/to/A and I have three student X, Y, Z then.

     /path/to/A
     |--X
     |    |- file1.vhd
     |    |- test
     |        |- testbench.vhd
     |- Y
     |   |- file1.vhd
     |- Z
     |   |- file1.vhd
         |- testbench.vhd   

My experience is that copying in coding assignment run as high as 30-45%, and once caught if you cry enough in front of instructor they let you go (who know they might be thinking of committing suicide. Its not good to be to too strict. Was the reason I was told). Moreover, unlike many other universities they don't like to make a big deal out of it, no matter how loudly they show speak against it in public. Perhaps they have breathed the same air which their students are breathing now.

 I developed this tool during my TAship for VLSI Lab course. Over the time, some empirical parameters have been tunes and algorithm gives good enough result. A cython fork is under development to speed up the matching. Its default configuration file is `~/.config/sniffer/config`. If you put the config file somewhere else you can use `--config config-file` option with this tool. In the end, it generates text files with various level of severity of matching and graphviz file which can provide an overview on how much cheating has taken place in class. One example is shown in figure from Assingmnet 04 of VLSI Lab class in 2011. Each edge is a copy case. Each node is a student. Thicker the edge, larger the copy.

Application does not do anything terribly smart for most of the copy is not terribly smart. It uses the fact the programming languages impose a structure and breaking that structure and copying is not possible. So far, this application has not reported any false positive; although it is capable of under-reporting which can be verified manually by looking at `severity-moderate.csv` file. 

See the github page for more detail on how to use the application and for reporting bugs.

- Dilawar

Guna Prasaad

unread,
Feb 2, 2014, 10:14:29 AM2/2/14
to wncc...@googlegroups.com

Potential disaster out here. It's interesting!!

Guna Prasaad
Third Year Undergraduate,
Computer Science & Engineering,
IIT Bombay

--
--
The website for the club is http://wncc-iitb.org/
To post to this group, send email to wncc...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "Web and Coding Club IIT Bombay" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wncc_iitb+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages