text-match submissions

text-match submissions

by Angus Wallace -
Number of replies: 3

Does anyone have suggestions for how to text-match between student submissions (eg. turnitin) to detect cheating?


Cheers, Angus

In reply to Angus Wallace

Re: text-match submissions

by Richard Lobb -

We just use Moodle's Download responses to get a .xlsx spreadsheet of all the submissions to a quiz. Convert this to a .csv (exporting direct to .csv from Moodle isn't safe - there are bugs in PHP's CSV encoder). Use a Python script that reads the .csv with the csv module and does an n^2 comparison of submissions. 

Our three standard comparators:

  1. A simple equality comparison catches the really dumb cheats, who just submit their friend's code without modification.
  2. A comparator that collapses white space, deletes comments and blank lines and replaces all words except keywords with a simple generic identifier like 'X' before doing an equality comparison catches people who try to tweak a friend's code to avoid detection.
  3. A comparator that first strips comments and blank lines and then records a set of metrics such as the counts of all the most common keywords and the number of lines of code at various indentation levels, and then finds the euclidean distance between the two programs in this large multidimensional space, is a remarkably good detector of even quite sophisticated cheating, at least in Python.

All cheat detection is inherently unpleasant. Where possible I like to stick to (1) as the case is so totally unarguable for all except trivial programs. Comparators (2) and (3) increase the risk of false positives and (3) in particular requires careful manual checking.

Richard

In reply to Richard Lobb

Re: text-match submissions

by Bart Siniarski -

Hi Richard, 

Your approach sounds very interesting. Would you be able to share the Python script with me? I would appreciate it a lot!

Thank you, 

Bart

In reply to Bart Siniarski

Re: text-match submissions

by Angus Wallace -

Thanks Richard and Bart for the reponses,

In the end I followed Richard's suggestion for downloading the data, and then used Moss (http://theory.stanford.edu/~aiken/moss/) for doing the matching. I used a simple (and ugly!) python routine to process the data:


import mosspy
m = mosspy.Moss(1, "matlab")
for question in range(11,16):
for submission in questionFilenames[question]:
m.addFile(submission)

url = m.send() # Submission Report URL

Cheers, Angus