Testing submissions stops when a test case fails to run

Re: Testing submissions stops when a test case fails to run

by Richard Lobb -
Number of replies: 0
There are two reasons for aborting testing when a test times out.

Firstly, I strongly prefer all-or-nothing grading, giving students the opportunity to fix their bugs rather than giving them part marks when some tests pass and some fail. See my old diatribe here. However, I accept that that's very much a personal view and CodeRunner should support different viewpoints where practicable.

The second reason is perhaps more compelling. At least in courses I teach, by far the most common cause of a timeout is a broken while loop, such as one that fails to increment the loop control variable. This code is doomed to timeout on every test. If there's a 3 sec timeout and ten test cases, that's 30 sec of Jobe sandbox CPU time. Not only is that a long time for a student to wait under test or exam conditions but it has the potential to overload the Jobe server, resulting in denial of service to all other users. Consider an exam with several hundred students of which 10 have such a bug. That's 5 minutes of lost CPU time on the Jobe server.

In summary: I don't wish to change the existing approach of aborting on a timeout.

However, there are workarounds.

The easiest workaround is to build your own timeout into your tests, making sure that your timeout expires well before the Jobe timeout. This is highly language dependent, but in Python you might write a test in the following form:

from subprocess import run, TimeoutExpired
try:
     run( .... , timeout=2)
except TimeoutExpired:
    print("This test timed out")
You might want to hide the test infrastructure in either the global_extra field or the individual test.extra fields.

A better but vastly more complex solution is to write your own custom question type that uses a combinator template grader to run all the test in whatever way you fancy. Our own in-house question types have a template parameter abortonerror that determines whether or not to continue testing after an error. An overall watchdog timer goes off before the Jobe server's timeout in order to prevens the tester from losing control if a total time budget is exceeded. With this approach you need to set a very long timeout - say 50 secs - for the total run, which again raises the spectre of a jobe server overload.