CodeRunner: Assessing code quality

A user emailed me a question along the lines of "How do we go about assessing code quality in a CodeRunner question. I thought my answer might be of interest to others, so here is my (edited) response:

Assessing code style at the university of Canterbury

Your mention of metrics of structure, coupling, cohesion suggest a software engineering flavour, with reasonable size programs. I mostly teach introductory programming nowadays, where students are generally writing lots of smallish functions (at most 20 lines) introducing various aspects of the language (Python). They develop bigger programs in the assignments but in those we do most of the design and get the students to write the component functions, building in a series of questions towards the final product, in which they assemble all the components they've developed. Towards the end of the assignment they do have to break a few larger tasks into multiple functions but usually only 2 or 3. This style of assignment doesn't offer scope for quality metrics like coupling and cohesion. Instead we're concerned with much more rudimentary aspects like function length, depth of nesting, choice of identifiers and comments.

Some years ago we introduced pylint as a pre-processor for all our Python questions and this was a real game changer. Pylint is an industrial-strength Python style checker with a large set of style rules - see here. Prior to pylint we had some students writing monstrously complicated code with huge functions, despite all our style rules. That sort of nonsense stopped overnight. Code that isn't pylint compliant just doesn't get run, so long functions, deep nesting, excessively many branch points etc don't even make it to the start line.

As time has passed, however, pylint has kept adding more and more professional level rules while removing some of the ones that were more appropriate in introductory programming courses, such as preventing the use of global variables. So we also have our own style checker which does a parse of the submitted code and enforces a set of our own rules including ones that allow us to require or proscribe the use of certain programming constructs, as appropriate for the Python aspects we're teaching. We're now starting to wonder whether we should give up on pylint and just implement the rules that matter to us.

A big problem of course is that no automatic checker can measure the appropriateness of identifiers and comments ("docstrings" mostly, in Python). Pylint enforces the use of 3 or more characters for variable names except for special cases like i, j, s (generic string), c (generic character) etc. It also enforces the existence of docstrings for all functions and modules. These rules at least make students aware that these are things that "real programmers" care about, but it doesn't stop them using variable names like 'aaa' (when their attempt to use just 'a' is thwarted) etc. And a typical docstring is "asdfasdfa".

We used to attempt to deal with this by human style marking the final assignment but it is hugely time-consuming and contributes relatively little to their final grade. In our new first year engineering course this year, with over 1000 students in the first semester, we gave up on that because we simply didn't have the resources for such marking. I think that's a bit sad, but I've not seen any significant change in code quality as a result. Doing human-based tyle marking in the last assignment of the year is a bit of a waste of time, to be honest.

In our introductory C programming course we do something similar but more basic. Prior to running it, we parse students' C code in Python with pycparser, and check things like function lengths, number of parameters, function complexity. We also enforce a layout standard by passing their code through a layout program (astyle) and if their code changes in layout, we reject it.

I should have said that all the above checks (except human style marking) can be run by the student with a Precheck button before they submit their code for correctness marking. So it doesn't cost them any marks. Style isn't really something we mark, it's something we require as a starting point.

If you're interested in our question types that do all the above, they're available in a github repository here: https://github.com/trampgeek/ucquestiontypes. But be warned that the questions are appallingly complicated and not suitable for newcomers. You're totally welcome to try using them out of the box but you probably won't be able to bend them to your own uses.

Good luck!

Richard