Continuing testing when a test case give an error

Continuing testing when a test case give an error

by Sine Mete -
Number of replies: 3

Hello, 

We use cpp_program for assignments. We usually have more than 10 test cases. When a student fails a test case resulting infinite loop, std error etc. testing being abborted. We want to grade test cases seperately, we want testing to continue. How can I do that with or without a template? I have seen a template example with IOexception but we cant not include all exceptions and infinite loop. 

Thank you for your support.

Best,

In reply to Sine Mete

Re: Continuing testing when a test case give an error

by Richard Lobb -
I personally prefer not to continue after a serious error like infinite loops or runtime exception. An infinite loop, for example, might take 5 to 10 sec of Jobe server time before it times out. If it does that for every test, the job could take a minute or so of Jobe server CPU time, and too many of those might actually overload the server. It's also very annoying for the student to have to wait for the entire run to complete before they can proceed. My preferred approach is to let students see the first error, and then let them fix that (for a penalty) and continue.

However, that's just my take and several other people have asked this question, so there's clearly a need. 

CodeRunner treats any output to stderr as a runtime error and aborts testing. There's no way to switch off this behaviour, so you have to catch the exceptions or stderr output and redirect the output to stdout. With write-a-function questions in Python or other interpreted languages this can be done fairly easily in the template but with write-a-program questions or with compiled languages like C/C++ you have to use the template as a control program to manage the whole compile-and-run process. If you want to handle infinite loops, you will also need to use some sort of watchdog timer to ensure that your control program gets back control before the entire job times out.

If you're asking write-a-program questions, you can follow the approach described here. You would need to change the compile command to C++ and you will need to change the line that runs the code to limit the execution time and to redirect stderr to stdout. Something like the following (untested) could replace the two lines starting output = subprocess.check_output(...):

    run_result = subprocess.run(["./prog"], text=True, capture_output=True, timeout=2)
    print(run_result.stderr + "\n" + run_result.stdout)

You will also need to catch subprocess.TimeoutExpired exceptions.

If you're not a Python programmer you can implement all this functionality in C++, but it's harder (for me, anyway).

Most of our custom question types at the University of Canterbury use scripted control of the sort described above. The initial complexity is worth the cost, because you then have complete control of the process and can, for example, do extra style checks on student-supplied code, ban undesired constructs, catch specific errors, etc.

I can provide further help if necessary, but you might prefer not to have all this extra complexity.


In reply to Richard Lobb

Re: Continuing testing when a test case give an error

by Sine Mete -
Thanks a lot for your reply. We should do an effective grading so we want to use this approach. But I'll be glad if you could provide further help. Especially for us to compile multiple files submitted as attachements. Normally I did multiple file compilation by using "advanced customisation parameter" but now should we change the line from the code in the link you gave as "return_code = subprocess.call("gcc {0} -o prog prog.c file1.cppcfile2.cpp".format(cflags).split())"?
In reply to Sine Mete

Re: Continuing testing when a test case give an error

by Richard Lobb -
If you're happy to compile all C++ source files present in the working directory in order to build your runnable, you should be able to just do

return_code = subprocess.run(f"g++ {cflags} -o prog *.cpp", shell=True)
Is that all you need help with?

By the way ... the approach I've described uses a per-test template, which runs each test case separately. This means the same code gets compiled every time, and C++ compiles are relatively expensive. This isn't likely to be a problem with assignment questions but in a test/exam situation is best avoided as it puts unnecessary load on the Jobe servers. You can avoid the problem by using a combinator template that first compiles the code and then proceeds to run all tests if and only if the compile succeeds. However, combinator templates are much harder to write, as you have to build your own test results table.