CodeRunner Documentation (V3.0.0)

12 APPENDIX: How programming quizzes should work

Historical notes and a diatribe on the use of Adaptive Mode questions ...

The original pycode was inspired by CodingBat, a site where students submit Python or Java code that implements a simple function or method, e.g. a function that returns twice the square of its parameter plus 1. The student code is executed with a series of tests cases and results are displayed immediately after submission in a simple tabular form showing each test case, expected answer and actual answer. Rows where the answer computed by the student's code is correct receive a large green tick; incorrect rows receive a large red cross. The code is deemed correct only if all tests are ticked. If code is incorrect, students can simply correct it and resubmit.

CodingBat proves extraordinarily effective as a student training site. Even experienced programmers receive pleasure from the column of green ticks and all students are highly motivated to fix their code and retry if it fails one or more tests. Some key attributes of this success, to be incorporated into pycode, were:

  1. Instant feedback. The student pastes their code into the site, clicks submit, and almost immediately receives back their results.

  2. All-or-nothing correctness. If the student's code fails any test, it is wrong. Essentially (thinking in a quiz context) it earns zero marks. Code has to pass all tests to be deemed mark-worthy.

  3. Simplicity. The question statement should be simple. The solution should also be reasonably simple. The display of results is simple and the student knows immediately what test cases failed. There are no complex regular-expression failures for the students to puzzle over nor uncertainties over what the test data was.

  4. Rewarding green ticks. As noted above, the colour and display of a correct results table is highly satisfying and a strong motivation to succeed.

The first two of these requirements are particularly critical. While they can be accommodated within Moodle by using an adaptive quiz behaviour in conjunction with an all-or-nothing marking scheme, they are not how many people view a Moodle quiz. Quizzes are commonly marked only after submission of all questions, and there is usually a perception that part marks will be awarded for "partially correct" answers. However, awarding marks to a piece of code according to how many test cases it passes can give almost meaningless results. For example, a function that always just returns 0, or the empty list or equivalent, will usually pass several of the tests, but surely it shouldn't be given any marks? Seriously flawed code, for example a string tokenizing function that works only with alphabetic data, may get well over half marks if the question-setter was not expecting such flaws.

Accordingly, a key assumption underlying CodeRunner is that quizzes will always run in Moodle's adaptive mode, which displays results after each question is submitted, and allows resubmission for a penalty. The mark obtained in a programming-style quiz is thus determined by how many of the problems the student can solve in the given time, and how many submissions the student needs to make on each question.