Exporting responses for plagiarism checking

Exporting responses for plagiarism checking

by Michael Fairbank -
Number of replies: 4

I'm trying to extract CodeRunner student answers for plagiarism detection. But firstly, I'm having trouble extracting student answers from Moodle.  To export answers for a quiz, I open a quiz then go to quiz "settings:Results:Responses".  Then "Download table data" as json.


Unfortunately it loses all the \n new line characters from the student's code.  It does the same, whether I export to json or csv.  So a student's exported Python-code answer might look like this:

def evaluate_knapsack_fitness(chromosome, item_values, item_weights):          assert len(chromosome)==len(item_values)          assert len(chromosome)==len(item_weights)          total_value=0          total_weight=0          for i in range(len(chromosome)):              if chromosome[i]==1:                  total_weight+=item_weights[i]                  total_value+=item_values[i]          if total_weight>max_knapsack_weight:              return 0          else:              return total_value

It's not easy to restore the \n characters.  I have a heuristic (any consecutive 6 spaces not preceded by a space replace to \n) but it's not working perfectly.  For example, when a student has some white space at the end of their program line, then it gets it wrong.  Is there a better solution?  If not, could coderunner be modified export answers with a "\n" in there or "<CR>" or something similar to indicate explicitly each new line character; assuming it is Moodle that forces CodeRunner to remove all those new-line characters in the first place? 

As my reconstructed answers have slightly incorrect spacings, the indentations are sometimes wrong, which means JPlag cannot parse the Python code correctly.  My current workaround is to use MOSS, which appears to complain less about incorrect indentations.   I'm still working on the script to parse the exported student answers into a format MOSS/JPlag can handle.  It can join the answers students make in multiple quizzes, making the plagiarism detection for small programming exercises slightly more reliable. If anyone wants this script then let me know.

In reply to Michael Fairbank

Re: Exporting responses for plagiarism checking

by Anton Dil -
I had a similar problem when I started doing this for Java, but I think the CSV was okay. Probably because on reading lines from the csv file I was implicitly stripping the newlines. After switching to Apache's CSV library it worked okay.
In reply to Anton Dil

Re: Exporting responses for plagiarism checking

by Michael Fairbank -
Thanks. I've just re-checked the csv export, and studied its hexdump. There are definitely no indicates of where there ever was a \n character. I think for the language Java it's fine to reconstruct new lines, because java terminates each line with a semicolon; and white space is flexible in java anyway. So I think that's why you managed successfully. But Python is dependent on correct layout of white space, and has no semicolon hints to help deduce where line-breaks should appear.
In reply to Michael Fairbank

Re: Exporting responses for plagiarism checking

by Richard Lobb -
This looks like a Moodle 4 bug - probably the table exporter. The newlines were all present in Moodle 3, but aren't there any more in Moodle 4. The CodeRunner script downloadquizattempts.php, which just loads the raw data from the quiz_attempt_step DB table and exports it using the Moodle SQL Table exporter, now suffers the same problem. It didn't used to. I've checked the newlines are still present in the DB table itself. I've also checked that an essay question using a raw text response field also suffers from this problem.