CodeRunner: On "Specific Feedback"

A user asks:

"How do you tend to give feedback for coderunner questions, do you just use the passing or failing of the tests as the feedback? I was confused when I saw that the Moodle quiz settings allow me to choose when students can see 'specific feedback', but then coderunner didn't seem to have any way to give specific feedback. For the time being I'm using {{ TEST.extra }} to give some feedback for certain tests, but this felt like a bit of a hack?"

From an implementation standpoint, the CodeRunner result table is the specific feedback and is enabled/disabled by the specific feedback checkboxes in the quiz review settings. However, this user obviously wants to deliver more feedback than that.

It is possible to deliver any sort of feedback you like by using a combinator template grader, which is responsible for constructing the entire specific feedback in HTML. This is very powerful and you can use it to include things like images and graphs in the feedback. I use a combinator template grader in nearly all my questions now. However, setting up a question type to use a combinator template grader is difficult and if all you want to do is give simple feedback message like "Well done!" it's far too heavyweight for the job.

So ... what sort of feedback do users want to give students above and beyond the result table?

Richard

Re: On "Specific Feedback"

by Sam Fearn - Thursday, 15 November 2018, 4:25 AM

Hi,

I was the one who originally asked the question on GitHub, so I can expand on what I was thinking. For context, I'm using CodeRunner to teach an introductory course on Python(3) for first year undergraduate students.

The way I am currently structuring my questions is to ask the students to write one (or more) named functions which return some particular required output. My test is then of the form `print(the_function(args))`, and the expected output is the string that this should produce. The feedback that students receive is then of two forms, whether or not they pass the tests, and some general feedback regardless of the students answer which includes tips, useful built-in functions which could help as well as links back to our teaching resources.

The most important bit of feedback is therefore the Result columns which show the tests being run as well as the difference between expected result and actual output (and hence whether the tests pass).

In fact I have recently made the situation slightly worse than this, as due to floating point errors which can appear in some calculations, I added a formatting command to do some rounding, so the test is now of the form `print(formatting(function(args)))`. Since I didn't want the students to have to think about this formatting, I didn't want the table of tests to display this at all, so I use one of the other test fields (TEST.stdin, sorry) to hold `print(function(args))` and then use a customised Result Columns to include ["Test","stdin"]. Maybe there is better way to do this?

What I would to be able to achieve is to provide some more specific feedback for each test. It seems to me that the TEST.extra field could potentially be used to display such feedback, but I would like to be able to set this dynamically based on for instance the output of some custom commands defined in the template.

As an example, let's say I ask for a basic function which doubles a number and returns it as a float. If one of my tests then calls this function with an integer input, and the student has forgotten to type cast, obviously they will fail the test. I might then like to give a message like "Check the type of your answer" if and only if the students function returns an int, not if they type cast but forget to double.

Best,

Sam

Re: On "Specific Feedback"

by Richard Lobb - Thursday, 15 November 2018, 3:32 PM

Thanks Sam. Some interesting questions.

Let me explain how we deal with some of the issues you raise and then we can then discuss whether CodeRunner itself needs any extra functionality to provide better feedback.

Firstly we often put comments within the test code to explain to the student just what we're testing for. For example we might have a test

# Check that function returns the
# correct type (should be bool not str)
print(type(is_allowed_to_drink(age)))

Since such tests tend to just clutter the result table for most students, whose functions do return the right type, we would probably set this test to Hide if succeed. That way, only those students whose functions fail the test will see it, and we hope the comment on the test is sufficient feedback for them to fix their code. Even without the comment, the test code itself certainly shows the error and some students might actually learn more without the comment, as they then have to figure out the test code itself.

In tests and assignments we usually set both the Hide if succeed and the Hide rest if fail checkboxes on all except the For example tests, so that what students see either a green result table or a red one with the example tests (which you would hope they had already tested with) and the first failing test.

When the test code really does get too complex to expose to students, e.g. with tkinter questions, I do tricks like you describe - the Test column of the result table describes what I'm testing for and the actual test code is in the (hidden) TEST.extra field, using a template that runs both the TEST.testcode and TEST.extra code for each test case. The output from the code in the TEST.extra field is then just something like "OK" or it is a specific message saying what failed. The TEST.expected field is then just set to "OK".

Another trick I have is to use a template that first checks for the existence of a file _prefix.py among the support files; if this is found, its contents are inserted into the test run before the test code. That file can include quite complex test functions that you can call from the per-test-case test code. This saves you having to repeat the complex code for each test or customise the template to include it.

Testing code with floating point numbers is always problematic. We teach the format method of a string quite early and use that in tests. For example

print("Average speed {:.2f}".format(student_answer))

You still need to ensure that your test data avoids situations where small errors might change the rounded output. In the example above, if the correct were 23.155, the computed answer might be printed as either 23.15 or 23.16.

A much better solution is used by my colleague Jenny Harlow in Matlab courses. She has a sophisticated combinator template grader that extracts the numbers from both the expected answer (generated by running the sample answer) and the student answer and compares them all to a given tolerance. With this approach you have complete control over the feedback but at the cost of a huge increase in complexity.

Those ideas may help with some of your problems?

Richard

Re: On "Specific Feedback"

by Sam Fearn - Friday, 16 November 2018, 4:48 AM

Hi Richard,

Thanks for the reply. I think I was part way towards what I wanted to achieve, and just missing a few points.

When the test code really does get too complex to expose to students, e.g. with tkinter questions, I do tricks like you describe - the Test column of the result table describes what I'm testing for and the actual test code is in the (hidden) TEST.extra field, using a template that runs both the TEST.testcode and TEST.extra code for each test case. The output from the code in the TEST.extra field is then just something like "OK" or it is a specific message saying what failed. The TEST.expected field is then just set to "OK".

I don't quite understand this part. You talk about having the Test column of the results describe the test, but also about running both TEST.testcode and TEST.extra. What do you put in the TEST.testcode field in order to do this? If I just put a string obviously I get errors when this gets executed, do you just make it a comment?

Another trick I have is to use a template that first checks for the existence of a file _prefix.py among the support files; if this is found, its contents are inserted into the test run before the test code.

For my Python questions, would you just do this with an import inside a try statment?

Re: On "Specific Feedback"

by Sam Fearn - Friday, 16 November 2018, 4:48 AM

Hi Richard,

Thanks for the reply. I think I was part way towards what I wanted to achieve, and just missing a few points.

"When the test code really does get too complex to expose to students, e.g. with tkinter questions, I do tricks like you describe - the Test column of the result table describes what I'm testing for and the actual test code is in the (hidden) TEST.extra field, using a template that runs both the TEST.testcode and TEST.extra code for each test case. The output from the code in the TEST.extra field is then just something like "OK" or it is a specific message saying what failed. The TEST.expected field is then just set to "OK"."

"Another trick I have is to use a template that first checks for the existence of a file _prefix.py among the support files; if this is found, its contents are inserted into the test run before the test code."

For my Python questions, would you just do this with an import inside a try statment?

Edit: Apparently blockquote tags don't work?

Re: On "Specific Feedback"

by Richard Lobb - Saturday, 17 November 2018, 10:25 AM

Sorry, that was a bit confusing. I have two main python question types: the standard one that's used for 95% of questions and a special one for tkinter. The standard one executes both TEST.testcode and TEST.extra, after the student's code. If I wish to hide test code with this question type, I put Python comments in testcode and the hidden executable stuff in extra and possibly in _prefix.py. The tkinter question type executes only the student code and TEST.extra, so in that case the testcode is pure English commentary on what the test is doing.

For my Python questions, would you just do this with an import inside a try statement?

I actually build the program to be tested as a string consisting of any _prefix code, other prefix stuff, the student code and the testing code. Then I either exec that under control of a watchdog timer or run it in a subprocess. However, in simpler questions that aren't using a combinator template grader, you should be able to do as you suggest. Or read it into a string and exec it.