CodeRunner: How to grade students properly?

This may be a bit off-topic for this forum, but at least we use Python CodeRunner for the questions and I hope that others in this forum may have spent some thoughts on that particular topic too!

We are running a programming course for mechanical engineers. The aim of the course is that students learn the basics of programming and can apply it for engineering problems. Our aim is really the basics and the pinnacle of the course is to use numpy and matplotlib on easy problems.
The exam consists of two programming questions plus additional theory questions, with 70%/30% split of the points. Each programming question is worth 35%, and typically they should implement a function or a class or even write some code, that does not rely on writing a function at all (like a small script).

We saw that a lot of students already struggle to fully implement such "simple tasks" and thus we give partial points for each of the programming questions. Otherwise, the grades would probably be 80% failed and 20% best grade ;) Especially, as they do not study computer science, we want to be somewhat forgiving.
However, those partial points are directly related to the test-cases of CodeRunner and are given manually. The reason for that is: 1) Testing code in-depth is really hard with CR in my opinion (i.e., testing if certain functions were (not) used or if something is hardcoded) and 2) even if we could test in that level of detail, giving points with a fine granularity is also very hard to do in CR (except for even more tests).

But, as you can imagine, this brings trouble. Grading code is a PITA and often not that easy... Of course, I have read Appendix 2 and I totally agree with it. But to change the mode, we would need to move away from the type of questions we are using right now (solving a somewhat complex problem) towards many smaller questions (solving one concept at a time). Because the aim of our course is that they learn to write code for a complex problem, we also want to examine that. Thus, I would like to keep the questions as they are but change the grading scheme. But how?

I'm pretty sure I need a minimal set of requirements for the code, otherwise the submission is 0 points: For example, code submitted must be free from syntax-errors.
Next, the grading must be fair and comparable. I.e., solving the same thing with other code must give the same amount of points.
However, everything else ... I'm totally unsure what the best way would be, and I see pros and cons for anything... So before I ramble even more about this, I would like to know the following things:

are you also grading exam questions with partial points? If so, how do you do it? Simply sum up the working unit-tests?
If not, do you then have more, shorter programming questions? Does this work in such an exam setting, specifically for non-computer-scientists? In such a case, I would really be interested how your exams look like!

So if you have some best-practice examples or other resources, I would be glad to know about them!

Re: How to grade students properly?

by Sebastian Bachmann - Wednesday, 18 February 2026, 4:46 AM

I thought more about it and also talked to some colleagues. We think that partial grading within a single question will either degrade into guessing or into all-or-nothing any ways.
Thus, this changes my question a bit.

Assume I want to give as an exam question the task to write a class. Say they should write the class BankAccount, which has an attribute account_number, an attribute amount and an attribute record. Next, they should implement the method __add__ which add the given amount to the account and writes a line in the record string.
I would like to grade now three things: 1) the student can in general work with classes, 2) the student can write a constructor, 3) the student can write the __add__ method. Of course, one depends on the other. Without the class construct, you cannot have a constructor, without the constructor no method that accesses attributes...
So, to be able to grade these three things, I would need to set up three CodeRunner questions. However, for each question, I would need to give the environment that makes the rest of the code work. I.e., to be able to write the __add__ method, I would need to provide a working class. Hence, the question would read like "assume you have a class BankAccount with the given Attributes [...], write the Method __add__ that [...]".
Or how would you write the questions then?

Re: How to grade students properly?

by Mike McDowell - Wednesday, 18 February 2026, 12:48 PM

I've tried a few different ways over the years (teaching high school computer science here) and find I modify my approach based on what the question is asking. I guess I'd say the types loosely look like this:

- Give detailed requirements for simple questions (for example function must take 3 args in the following order and return this). This seems to work best for basic skill building.

- Use a graduated testing approach:
--- test 1 checks to see if the class exists and if it has the attributes/methods required
--- test 2 checks for function (create an object and test the the expected values / returned values)
--- test 3 checks say a helper function
--- test 4 would run the main function and test overall program operation
--- This helps "guide" students in creating their code so the parts are all verified as working before they try to mix it all together in main. I'll also use "hide rest if fail" for the early tests so they're forced to pass those before tackling main.

- Multiple questions
--- Note: It's critical you ensure the quiz Navigation method is 'Sequential'. This way they can NOT go back to a previous question once they've moved on
--- Question 1 tackles the class creation
--- Question 2 starts with the correct answer from question 1 given as pre-answer, then would ask them to implement __add__ and/or whatever else you need
--- Question 3 then starts with the correct answer from question 2 if necessary, etc.

- Insert student code into your hidden code
--- Create your prefab code in the test or global extra for example, then in the template you would take the __add__ method they wrote in {{ STUDENT_CODE }} and insert it in your class

If you have other ideas or thoughts I'm all ears!

Re: How to grade students properly?

by Richard Lobb - Thursday, 19 February 2026, 4:12 PM

Some good topics for discussion here - thanks Sebastian.

There's certainly no right way to assess students. Every course has its own different priorities, constraints and pedagogy.

It's going to be very difficult to assess programming skills in just 30% of an exam, Sebastian. So you have my sympathies. We have 100% of a 3-hour exam in which to assess introductory programming, and even that is challenging. But I think moving to more smaller questions, as you propose in your second email, will help.

Mike's suggestion of using a quiz with sequential navigation is an interesting one. I've never tried that, though I prefer not to constrain student's navigation if possible.

To address that problem I think I'd probably ask three independent object-oriented questions rather than a chain of inter-dependent ones. For example

Write a simple class with specified attributes and no custom methods (essentially a data class).
Write some completely different class that has more complex behaviour involving custom methods.
Write a class that behaves as shown in the examples. [This sort of question requires students to infer attributes and methods from the observed behaviour and is for the A-level students only.]

As an example of the third question: "Write a class Whatsit that behaves as shown in the examples below", where one of the tests is

thing = Whatsit("Rover", 10.4, ['Woof'])
print(thing)
thing.teach('Bark')
print(thing)
thing.heavier_by(1.002)
thing.teach('Woof')
print(thing)
thing.unteach('Woof')
print(thing)
thing.unteach('Woof')
print(thing)
thing.unteach('Miaow')
print(thing)
thing.unteach('Bark')
print(thing)

and the expected output from that test is

Rover (10.4 kg) Woof
Rover (10.4 kg) Woof Bark
Rover (11.4 kg) Woof Bark Woof
Rover (11.4 kg) Bark Woof
Rover (11.4 kg) Bark
Error: Rover can't Miaow
Rover (11.4 kg) Bark
Rover (11.4 kg)

Of course, with three independent questions like this, students have more code to write in total. But you could largely resolve that by preloading the answer box with as much of the answer as you like (e.g. with functionality already examined in earlier questions). In our exams, students have an IDE with the usual on-line help (excluding AIs!), all their lecture notes, learning module summaries, etc. We think this makes for a more authentic assessment. In this context, preloading an answer box with some code that constructs an object isn't giving anything way.

Re: How to grade students properly?

by Sebastian Bachmann - Friday, 20 February 2026, 8:23 AM

Thank you both for your insights! I really like the idea of the making the questions gradually harder. I also like the approach of sequential questions - but I'm not sure if the students like it ;)
In any case, I think we need to extend the time for the exam. With two (short) programming exercises in the exam, it was okay, but the time to read the questions will of course increase, the more questions we add.
3 hours of exam are quite a luxury! We have two exams with 45 min each - mainly because we have 500+ students and not enough personal, thus we need to have multiple slots per day.

We already allow the usage of an IDE, next to CodeRunner - however, and that is very interesting and a bit off-topic, a lot of students hardly ever use the features of the IDE! They only use it as a text-editor. We see that they never test or even debug code in the IDE. We show the IDE in the lecture and also explain how to run stuff and debug it. So I don't know why they don't to use it. I cannot tell for sure, but my feeling is, that due to the availability of LLMs, they don't do any real programming or debugging during the preparation for the exam... It was very strange to see, that many can write down okay-ish code but then struggle very hard to find typos or other smaller bugs in their code - even though the IDE shows red signs and the error messages are very descriptive. Another possibility is, that they struggle with time and want to rather run the CodeRunner tests than debugging in the IDE.
We have to think about the open-book approach. I understand and agree that it is a bit more realistic that way. I think that shifts a bit the focus of the exam, from remembering the syntax towards algorithmic thinking - which I would welcome! At the moment, we settled for easier questions and also give hints it they require something that was only in a few of the examples during the lecture, but no resources other than an IDE and integrated python help, but I guess if the difficulty is increased a bit, one could also allow more resources.

Re: How to grade students properly?

by Mike McDowell - Saturday, 21 February 2026, 7:17 AM

Maybe this is best for another thread, but are you using an IDE alongside coderunner in safe exam browser? Or only when working on open book assignments?

I've been kicking the idea around but wanted to make it available while assessing in safe lockdown browser.