Hi Estelle
Good to know you're finding CodeRunner useful.
An interesting question - I've never tried to grade a question in which randomness is specified!
Where possible I get students to write functions rather than whole program. That way I can tell them exactly what data is available (i.e., the parameters) and what to print or (in this case) return. Then testing is straightforward and you avoid the hassles of input prompts. In this case, for example the test code could call their function 1000 times and check how often each of the 4 houses appeared. [As an aside, I know that functions are generally not taught till some way into a course but I find that if you give students the function header and possibly the required return statement at the end, they're happy enough to write the body, even if they don't really understand functions.]
However, I can appreciate that you want students to write the whole program and you might not want to introduce functions into the game. So ...
As you say, one way to check if their program generates one of the four required houses is to use a Regular Expression grader. But that has two major shortcomings. Firstly, a student's program that simply printed any one of the houses, with no randomisation, would pass. Secondly, displaying the regular expression you're looking for in the Expected column would just confuse them, so you'd have to hide that column (and perhaps add an Extra Template column with an explanation). Probably not a good approach.
So now indeed you need to write a template that runs their program (probably many times) and gathers all the output to grade it. For this sort of job I usually use the Python subprocess module. You might be able to get by with exec instead - it's much faster as it doesn't launch a new Python each time - but it's harder to make exec robust against errors in the student's code. [Edit: actually, exec is a much better approach - see below.]
Here's an example of a template to check a program that is required to randomly print either "Great!" or "Rubbish!". Note that it's not a combinator template (so you have to uncheck that box) and the expected output should be set to "It worked!".
from subprocess import check_output, CalledProcessError
TOLERATED_DIFFERENCE = 30 # What's a good number?!
student_answer = """{{ STUDENT_ANSWER | e('py') }}"""
with open('prog.py', 'w') as stud_prog:
stud_prog.write(student_answer)
num_great = 0
num_rubbish = 0
failed = False
try:
for i_test in range(100):
cmd = ['python3', 'prog.py']
result = check_output(cmd, universal_newlines=True)
if result == 'Great!\n':
num_great += 1
elif result == 'Rubbish!\n':
num_rubbish += 1
else:
print("Sorry, but the output '{}' isn't valid".format(result))
failed = True
break
if not failed:
if abs(num_great - num_rubbish) < TOLERATED_DIFFERENCE:
print("It worked!")
else:
print("Output was wrong.")
print("You printed 'Great!' {} times and 'Rubbish! {} times".format(num_great, num_rubbish))
except CalledProcessError as err:
print("I couldn't run your code!")
Richard