CodeRunner: An interesting bug (reposting after deleting earlier

Hi Tim (probably)

One of our staff members just came across a rather alarming, though fortunately very rare, bug. He found after his test that some students were exceeding the time limit for one of his questions, and felt that his limit had been too tight. So he tried experimentally regrading after raising the time limit (which was set by the prototype not the actual question) firstly to 30 secs and then to a much higher value, specifically 120 secs. But that latter value is higher than allowed by the Jobe server, which then gave an error on every submission during the regrade.

Now ... in a discussion some years back, we decided that if a submission results in a Jobe error the submission should be ignored because it's almost certainly not the student's fault (we were thinking of overloads). If I understand correctly, a regrade works by rerunning history, i.e. replaying all the question-attempt-step entries, rewriting history as it does so. The consequence then is that the sequence of recorded question-attempt-steps has lost all the actual submissions, leaving only the prechecks and the final end-of-quiz value. At least I think that's what has happened. Does it sound plausible?

The loss of data is not critical in this case because he regraded only a single student and knew all their submission history. But it could have been much more serious. Do you have any suggestions?

-- Richard

Re: An interesting bug (reposting after deleting earlier - sorry about the spam)

de Tim Hunt - jueves, 7 de abril de 2022, 01:25

Ouch! that is, indeed, a nasty one. I am glad (if that is the right word) that the context means that could have been worse.

I am sure your diagnosis is correct. Also, I am sure there a tracker issue about ... well not this exact issue, but similar ... and I just cannot find it.

I do recall the previous discussion. For regrading, it would be better to just throw an exception when we hit the error from the sandbox. Then the system would know it had gone wrong.

But, that would not work well during an attempt. If there was more than one question on the page, then we don't want an error with one question to lead to data for all the other questions being lost.

And, it is a feature of how the system is design that the code that runs during regrading is exactly the same as runs during the attempt.

Hmm. Looking at the code in qbehaviour_adaptive_adapted_for_coderunner, I can't see the behaviour you mention, with Discard. What am I missing?

Re: An interesting bug (reposting after deleting earlier - sorry about the spam)

de Richard Lobb - jueves, 7 de abril de 2022, 21:46

Thanks for helping out here Tim.

"I can't see the behaviour you mention, with Discard. What am I missing?" Yeah, I couldn't see it either. Immediately after I first posted the problem I started looking for the discard behaviour but couldn't find it, so I deleted the posting to give myself more time. But then realised I was too late and the email had already gone out, so I reposted it. I had vaguely hoped you'd know where the problem was because I had got a bit lost searching down through the inheritance hierarchy.

I've since managed to replicate the problem on my test machine and it's more subtle than I first assumed when my colleague told me about it.

For my test, as a student I submitted a couple of incorrect answers to a question, each preceded by a precheck, then submitted a correct answer. Here's the response history after that:

I then raised the time limit on that question to an illegally high value and regraded. After which the submission history was:

You'll see that the first submit has vanished and the second is in the state 'Incomplete'. [I assume the Incomplete state arises because I set the state in CodeRunner to $ignore which subsequently gets turned into Incomplete]

I then reset the question to a valid time limit and regraded again. The submission history then became:

And now the first two submits have vanished.

I am somewhat baffled at this stage. Do you have any idea what's happening? If not, I can go into serious debug mode but it will probably take me several hours and it's coming up bedtime here in NZ.

Re: An interesting bug (reposting after deleting earlier - sorry about the spam)

de Tim Hunt - jueves, 7 de abril de 2022, 22:48

I am not really sure (my brain is not working great - lack of sleep).

One guess to look at: could this be related to the checks to see if the response has changed? (is_same_response).

If I was going to debug this, I would want to see if I could reproduce it in one of the 'walkthrough' unit tests.

Re: An interesting bug (reposting after deleting earlier - sorry about the spam)

de Richard Lobb - domingo, 10 de abril de 2022, 21:43

Thanks Tim. I agree it would have been good to have a walkthrough unit test to use, but all my walkthroughs are testing a single question, not a whole quiz. I did try to modify one of the mod/quiz walkthroughs to use with CodeRunner but couldn't figure out how and gave up after an hour or so. Time is a bit too precious, sorry.

But I believe I understand exactly what is happening, although not necessarily what to do about it.

When I originally wrote the behaviour_adapted_for_coderunner class I mostly copied code from the adaptive class without, I confess, understanding quite a lot of it. After poring over it again I'm afraid I still don't understand the logic around invalid submissions.

The line

$status = $this->process_save($pendingstep);

returns question_attempt::DISCARD if the question attempt state is FINISHED or if the submission response is the same as in the previous recorded step.

The behaviour then checks for an incomplete (invalid) submission, which is kept if it follows a valid one or otherwise is kept or discarded according to the return value from process_save. This is the bit I don't understand.

If instead the response is complete, another check with is_same_response is done, but now with the previously saved try step rather than simply the previous step. If the responses are the same, the new step is discarded.

However, with CodeRunner things are even more complex. The problem is that is_same_response does not get given the information it needs to distinguish prechecks from checks. When I call it from within my behaviour I include a check on whether it's a precheck or a check so that two checks in a row, or two prchecks, are not considered to be the same response. [Edit: that should be only two checks in a row, or two prechecks, are considered to be the same response.] This all works fine normally. BUT, the extra logic I included to deal with invalid submissions (the sandbox is down, say) follows the above logic from the adaptive behaviour, which uses the (invalid) return value from process_save. That means that if there are two invalid submissions in a row with the same response, the second is discarded, even if it was a Check following a Precheck [Edit: or vice-versa]. BUG!

My apparently working fix (see the screen shots below) is to ignore the return value from process_save, and discard a submission only if there are two successive try submissions containing the same response and both are either checks or prechecks. If this is a sufficient condition for discarding, then I can simplify my logic quite a lot. But I'm not sure I understand the logic around discarding well enough. Are there other situations I need to handle?

Here are the screen shots of the response history following a complex series of prechecks and checks with a valid question, then the response history after breaking the question (setting the time limit to an out-of-bounds value, resulting in sandbox errors) and regrading, then the history after fixing the question and regrading yet again. The third one is the same as the first, so all is well again.

Before breaking the question

After breaking the question and regrading

After fixing the question and regrading again

Re: An interesting bug (reposting after deleting earlier - sorry about the spam)

de Tim Hunt - lunes, 11 de abril de 2022, 08:50

I am not sure why we are calling $this->process_save at the start of process_submit. I guess that once upon a time it seemed like a good way to avoid code duplication, but it sounds like that is not true any more, so perhaps we should inline that code, then look at the result, and start editing to simplify.

Bear in mind that the real entry-point into the behaviour for processing the student's submission is process_action. And for most behaviours (including CodeRunner's) it to do some specific checks on the 'action' variables like '-submit' and '-finish', and based on that, to dispatch to a more specific method like process_submit, process_finish, process_save. However, that is just a common pattern (that seemt to work well). There is no requriement to do that.

Continuing this collection of various different points...

The most important reason for the is_same_response / DISCARD behaviour is this:

The quiz lets the student navigate around the quiz using the navigation panel, or the prev/next button. Doing this processes all the data from the page (using process_save), in case the student changed any responses, but we only want to acutally bother writing a new step to the database if they acutally changed something.
Similarly, Moodle quiz lets you have more than one question on a page (even if that is an option we don't think is a good idea for CodeRunner). When one question on the page is submitted, then the responses for all the other questions on the page are offered up to be saved (again, it case the response was changed.)

So, that is what is going on. Is it ever appropriate to discard a check or a precheck? I guess that depends on whether we want to penalise students for submitting an identical response, or if we only want to re-do the grading after they have bothered to change something.

Now that I have writted all that out, I think I agree with your proposed fix.

I am not sure why you think you need to write your walkthrough test to use a quiz attempt. You can test regrading like this: https://github.com/moodleou/moodle-qtype_oumultiresponse/blob/main/tests/walkthrough_test.php#L506

I hope that is helpful.

Re: An interesting bug (reposting after deleting earlier - sorry about the spam)

de Richard Lobb - lunes, 11 de abril de 2022, 21:22

Many thanks Tim. That's certainly helpful and reassuring.

I've settled on a slightly safer fix. I'm just changed the code that deals with a sandbox error to mark the step as invalid and then keep it. It's known at that stage that the step is exactly not the same as the previous try (including the test for whether it's a check or precheck) or it would already have been discarded. Everything else is unchanged. I figure I can't break anything that way (a rash statement, I know).