CodeRunner: Checking student answers vs solution output, for randomization

I'm just getting started with CodeRunner, and I liked the idea of using some randomization in questions (as described in the docs here).

However, it feels somewhat limited regarding what I will be able to randomize, since I need to be able to express the "expected output" as a TWIG expression.

As a simple example, consider the prompt "Write a program that prints 'hello' repeatedly, {{NUM}} times" where {{NUM}} is a randomized parameter (say between 6 and 10).

For the solution code, I can fill in:

for i in range({{NUM}}):
     print('hello')

However, for the expected output, I believe I would need to use a {% for ... %} loop. While this seems possible in this case, the TWIG templating language has its limitations, and it feels cumbersome to have to rewrite a generalized solution to the problem in both Python AND TWIG.

What I would like to have happen is that the output from each students code will get compared to the output from the (templated) instructor solution that I wrote in Python, for each of specified input test cases... but I don't want to have to write anything in the "expected output" cases.

I believe this should be possibly in CodeRunner by using a sufficiently advanced grading template grader. Suggestions for how to accomplish this would be welcome!

Also, I am curious about whether other question authors agree that having a grading system like this would be extremely helpful for writing randomized questions more quickly/easily. If so, perhaps someone could create a new prototype that provides this capability out-of-the-box?

Re: Checking student answers vs solution output, for randomization

by Richard Lobb - Sunday, 30 December 2018, 7:28 PM

Generating the expected output with Twig is usually possible but you're right that it can become tedious or even impracticable.

Yes, you can write a question type that uses the sample answer to check correctness on the Jobe server. But there are two shortcomings:

You still need to manually generate the entries in the "For example" table, since this is displayed to the student before the code is submitted.
The usefulness of both the validate-on-save capability and the bulk-tester script (see here) is compromised.

That said, we do sometimes use the sample answer to generate the expected output. The current version of our python3_cosc121 question type (attached in file uoc_prototype_python3_cosc121) has a "useanswerfortests" template parameter, documented as follows:

useanswerfortests: if true, a run with the sample answer precedes the run with the student answer and the results from the sample answer are used for correctness testing but only if no expected output is supplied. However, because this takes place at runtime, this doesn't work for "Use as example" tests, for which the expected output must be supplied by the question author.

However, that question type is very complex and while you're welcome to use it, I don't think you'll wish to actually maintain it. There's probably over 1000 lines of Python buried in the template plus the support files, and it's in need of some serious refactoring.

I also attach a vastly simpler question type that uses a per-test-case template grader and runs the sample answer to generated the 'expected' field of the test case if and only if it's empty. The export file includes two simple tests, which is the full extent of testing that the question type has had. It has not been used in production and probably has various deficiencies. Perhaps you could fix any you find and post back here?

questions-TestAndDevel-Use answer for expected-20181230-1921.xml

uoc_prototype_python3_cosc121.xml

Re: Checking student answers vs solution output, for randomization

by Forrest Stonedahl - Monday, 31 December 2018, 1:06 PM

Thanks -- this looks very helpful.

Since we are just starting to implement CodeRunner in our courses (probably going into production next fall term), it could be a while before I get around to testing (and potentially improving on) the question type you provided.

In terms of installing this -- is it just sufficient to import the XML file that you attached into the question bank for some Moodle course? And then we could choose to turn it into a "new question type" as described here, to make it easier to make more questions that are all based on that pattern?

Re: Checking student answers vs solution output, for randomization

by Richard Lobb - Monday, 31 December 2018, 2:18 PM

Both the xml files contain a prototype question, that is, a question that defines a new question type. So when you import the xml file into some course, or at the system level, the new question type will appear immediately in the dropdown menu of CodeRunner question types - you don't have to modify anything. The file questions-TestAndDevel-Use answer for expected-20181230-1921.xml defines a new question type python3_test_with_answer, plus two sample questions of that type, while the file uoc_prototype_python3_cosc121.xml defines a question type python3_cosc121.

Please realise that the python3_test_with_answer question type has had close to zero testing. I intended it only as a starting point if you wished to develop your own question type of that sort. With only around 50 lines of code, it shouldn't be hard to develop further or maintain. The python3_cosc121 type is used in many hundreds of questions on our production server, but the combination of randomisation + useranswerfortests is relatively new and not extensively tested. It's a very complex question type and not suitable for maintenance by a newcomer to CodeRunner. Neither question type is part of the standard CodeRunner package, so they come with no guarantees and no official support.

Also, do note the warning #1 in the section you reference on user-defined prototypes.

Richard

Re: Checking student answers vs solution output, for randomization

by Forrest Stonedahl - Tuesday, 1 January 2019, 5:19 AM

Thanks -- to avoid issues in our first roll-out, will probably stick primarily with the time-tested python3 and python3_with_input question types, but at some point I do plan to explore some of the more advanced randomization capabilities, building off of the prototype you provided, and doing additional testing.

We have had some issues with students plagiarizing each other's code for practice exercises in the past, and additional per-student randomization seems like one route to discourage that.

Re: Checking student answers vs solution output, for randomization

by Richard Lobb - Tuesday, 1 January 2019, 10:26 AM

Yep, I think holding off randomisation until you've got the basics in place is an excellent plan. Your priorities will likely change once you have experience with CodeRunner. I'm guessing you'll decide some form of style checking (e.g. pylint) is a higher priority.

Even trivial randomisation like varying the name of a function to be written and/or the order of its parameters can be helpful in deterring mindless copying of code. We've introduced a lot of such randomisation into our question bank this year. However, I don't expect it to make very much difference, if any, to outcomes. I think it's much more important to get across to students that copying code doesn't teach them to program and if they can't program they'll fail the course. Our tests and final exams are CodeRunner based and we tell students that copying code is as smart as training for a marathon by having a friend tow them round the course on a bike.

Re: Checking student answers vs solution output, for randomization

by Jenny Harlow - Tuesday, 1 January 2019, 1:45 PM

Just adding to Richard's response to support his suggestions on students and copying/cheating. One can spend a lot of time trying to make it harder to cheat or copy -- it basically never stops and only some cheating is brain-dead copying. I have done my share of that and I eventually realised that it made my attitude very negative and I was not spending nearly as much time trying to make better questions and quizzes for the students who actually did want to learn.

I've found it much more effective all round to give a clear early message about the stupidity of copying and to back that up early by analysing code submissions for copying and also having an early test on the basics that they just have to know, and then to spend my time helping the willing students and challenging the good ones, rather than trying to stop the others from shooting themselves in the foot.

In terms of analysing submissions for copying, all the submissions are available to you to download and it seems to get around pretty quickly that I mean business when I just do some simple checks using cheat-checking code that Richard developed and pull up those caught out. I actually only really focus on picking up the really dumb identical code copiers, but -- as I warn the students with a big grin -- I can do pretty much any analysis I like and take as long as I like about it, and having nice timestamped electronic evidence tends to make disciplinary processes quite straightforward if we choose to take it to that level...

So far I have not used randomisation and I am keen to bring it in, but I have to say that that is more because I'm interested and because it could help to make sets of "drill questions" for students practising for tests than because of the possible cheating deterrent.

Jenny