CodeRunner: Testing for 4 possible outputs?

Sorry if this has been asked before!

I’m new to CodeRunner and am finding it a brilliant tool in the classroom!

I am planning to teach my students about lists and random number generators by them creating a Hogwarts House random generator using Python and was hoping to use codeRunner for testing it but can’t work out how I would test for four possible outputs (if written correctly the students code will randomly choose between the 4 houses and output one of them).

I’ve experimented with writing my own template but couldn’t work out how to extract the students output, only their full code which isn’t the route I’d ideally want to take. Perhaps a regex solution might be possible?

If anyone can help I’d really appreciate it!

Re: Testing for 4 possible outputs?

by Richard Lobb - Wednesday, 25 April 2018, 10:39 PM

Hi Estelle

Good to know you're finding CodeRunner useful.

An interesting question - I've never tried to grade a question in which randomness is specified!

Where possible I get students to write functions rather than whole program. That way I can tell them exactly what data is available (i.e., the parameters) and what to print or (in this case) return. Then testing is straightforward and you avoid the hassles of input prompts. In this case, for example the test code could call their function 1000 times and check how often each of the 4 houses appeared. [As an aside, I know that functions are generally not taught till some way into a course but I find that if you give students the function header and possibly the required return statement at the end, they're happy enough to write the body, even if they don't really understand functions.]

However, I can appreciate that you want students to write the whole program and you might not want to introduce functions into the game. So ...

As you say, one way to check if their program generates one of the four required houses is to use a Regular Expression grader. But that has two major shortcomings. Firstly, a student's program that simply printed any one of the houses, with no randomisation, would pass. Secondly, displaying the regular expression you're looking for in the Expected column would just confuse them, so you'd have to hide that column (and perhaps add an Extra Template column with an explanation). Probably not a good approach.

So now indeed you need to write a template that runs their program (probably many times) and gathers all the output to grade it. For this sort of job I usually use the Python subprocess module. You might be able to get by with exec instead - it's much faster as it doesn't launch a new Python each time - but it's harder to make exec robust against errors in the student's code. [Edit: actually, exec is a much better approach - see below.]

Here's an example of a template to check a program that is required to randomly print either "Great!" or "Rubbish!". Note that it's not a combinator template (so you have to uncheck that box) and the expected output should be set to "It worked!".

from subprocess import check_output, CalledProcessError

TOLERATED_DIFFERENCE = 30  # What's a good number?!

student_answer = """{{ STUDENT_ANSWER | e('py') }}"""
with open('prog.py', 'w') as stud_prog:
    stud_prog.write(student_answer)

num_great = 0
num_rubbish = 0
failed = False
try:
    for i_test in range(100):
        cmd = ['python3', 'prog.py']
        result = check_output(cmd, universal_newlines=True)
    
        if result == 'Great!\n':
            num_great += 1
        elif result == 'Rubbish!\n':
            num_rubbish += 1
        else:
            print("Sorry, but the output '{}' isn't valid".format(result))
            failed = True
            break
    
    if not failed:
        if abs(num_great - num_rubbish) < TOLERATED_DIFFERENCE:
            print("It worked!")
        else:
            print("Output was wrong.")
            print("You printed 'Great!' {} times and 'Rubbish! {} times".format(num_great, num_rubbish))
except CalledProcessError as err:
    print("I couldn't run your code!")

Richard

Re: Testing for 4 possible outputs?

by Richard Lobb - Friday, 27 April 2018, 9:24 PM

On reflection, using subprocess for a case like this is overkill and just too inefficient to tolerate. Exec is a much better approach. It allows you to run the student job up to about 100,000 times to give really good stats, rather than the 100 or so that's possible with subprocess. The example below runs it 1000 times which should be plenty for a simple case like this.

If the student code fails within the exec - even if it gets stuck in a loop forcing a timeout - you still get to see the "time limit exceeded" error message.

import sys, io

TOLERATED_DIFFERENCE = 300  # What's a good number?!

prog = """{{ STUDENT_ANSWER | e('py') }}"""

num_great = 0
num_rubbish = 0
message = ''
saved_stdout = sys.stdout

try:
    for i_test in range(1000):
        sys.stdout = io.StringIO()
        exec(prog)  # Run the student code
        result = sys.stdout.getvalue().rstrip()
        if result == 'Great!':
            num_great += 1
        elif result == 'Rubbish!':
            num_rubbish += 1
        else:
            message = "Sorry, but the output '{}' isn't valid".format(result)
            break
    
    if not message:
        if abs(num_great - num_rubbish) < TOLERATED_DIFFERENCE:
            message = "It worked!"
        else:
            message = "Output was wrong.\n"
            message += "You printed 'Great!' {} times and 'Rubbish! {} times".format(num_great, num_rubbish)
except Exception as err:
    message = "I couldn't run your code!\n"
    message += str(err)

sys.stdout = saved_stdout
print(message)