Using the regular expression grader

by Richard Lobb -

A teacher asked me the following question:

I had this silly question in a tutorial for absolute beginners to get a feeling of the Pascal syntax:

Write a program to print a "visiting card" similar to this:

------------------------------
  25/D, Gamanarambha Veediya
  Kosmadalawa 12450
------------------------------
    • The output has four lines
    • Lines 1 and 4: any number of hyphens
    • Lines 2 and 3: any text, can't be empty
So the "answer" could be,
 
program printVisitingCard;
  begin
  WriteLn('-----------------------');
  WriteLn(' Gemunu Jayawarsiri');
  WriteLn('  24B, Talawatugoda Road');   
  WriteLn('-----------------------');
end.
So the output can not be taken literally, only the pattern should match. What is the way of doing this?
 
My response
 
It's not a silly question at all. Teachers learning to use CodeRunner often stumble over this sort of thing, as they're used to asking questions in which the output requirements are rather vaguely specified.
 

You can't use the default 'exact match' grader on questions of this sort. You must either use a regular expression grader or write your own custom grader that intercepts the output and prints helpful messages like "I expected 4 lines but got only 3". Custom graders are a lot more work and are not usually worth the effort except when you are creating a new question type that you wish to use repeatedly.

To specify the expected answer with a regular expression you must click the Customise checkbox and then select Regular Expression from the Grading dropdown. The enter the regular expression you want into the test case's Expected field. The in-line help explains a bit about the required syntax.

In the example above, a possible regular expression is:

\A-+\n.+\n.+\n-+\Z

Here, \A matches the start of the output and \Z matches the end. Note that there is no newline at the end of the expected output because CodeRunner strips all trailing white space from the output, as well as any trailing white space on each line.

BUT: I prefer to avoid questions that require a regular expression grader, as the students don't understand the result table "Expected" column, and the Show Differences button isn't available. It's nearly always easier to ask students to write programs that generate precisely-specified output. That might seem overly fussy, but programming is like that. The compilers/interpreters are fussy about requiring precise syntax in the program, so why not be equally fussy about requiring exact output?

Hiding the regular expression

As a final comment, if you do wish to use regular expression grading but don't want the students to see the regular expression you're using, you can hide the Expected column from the result table by setting a custom value for the Result columns field in the (customised) question authoring form. That means you won't confuse the students by showing them the regular expression. Instead, they will be confused when they get marked wrong but don't know why!

However, with small classes, and a teacher on hand to help, that might be quite acceptable.

The default value of the Result columns field is

[["Test", "testcode"],["Input", "stdin"], ["Expected", "expected"], ["Got", "got"]]

To hide the Expected column, set the field to

[["Test", "testcode"],["Input", "stdin"], ["Got", "got"]]

Note that empty columns are not displayed in the result table, so in fact only the Got column would appear in the above question, which would look like this:

image.png

Displaying images in the CodeRunner feedback

by Richard Lobb -

Sometimes we want to ask students to write programs that display images. For example "Write a program that uses matplotlib to display a graph showing two cycles of a sine wave". Or "Write a function that uses the GraphViz "dot" program to plot the Huffman coding tree for a given set of character frequencies". What a shame we can't ask such questions in CodeRunner.

Well, actually, we can! For example:

Example question showing matplotlib image in feedback

In this post I'll show how you can author questions that capture and display images generated by student code. The two key ideas are:

  1. CodeRunner's combinator template graders give you complete control of the feedback, including the ability to include arbitrary html.
  2. Data URIs allow you to embed the entire contents of an image within the URL (which of course is itself within the HTML).

Combinator Template Graders

If you click the Customise checkbox in the question authoring form, a Customisation section appears as shown in the image below. This lets you edit the template (more on this later) and to set the way in which the question is graded, via a Grading drop down menu. If you select Template grader instead of the default Exact match, as shown in the image below, the role of the template changes. It must now grade the student's submission, rather than just run it. 

Question author form showing Customisation panel

There are two different types of template grader: per-test template graders and combinator template graders. The former, which is what you must write if the Is combinator checkbox is not checked, has to deal with only a single test case. It essentially defines one row of the result table for each run. The latter (combinator template graders), which are what we're concerned with here, are what you must write if Is combinator is checked. Combinator template graders are considerably more powerful but also more difficult to write. They are responsible for grading all test cases and defining the entire result table (if there is to be one) plus two optional additional HTML feedback sections, one before the result table and one after. These are called the prologuehtml and the epiloguehtml respectively. The image(s) we wish to return from the run can be placed in one or other (or both) of those.

The output from a combinator template grader must be a JSON string. The string is documented in the on-line help for the Grading element of the form:

If the template is a combinator, the JSON string output by the template grader should again contain a 'fraction' field, this time for the total mark, and may contain zero or more of 'prologuehtml', 'testresults', 'epiloguehtml', 'columnformats' and 'showdifferences'. The 'prologuehtml' and 'epiloguehtml' fields are html that is displayed respectively before and after the (optional) result table. The 'testresults' field, if given, is a list of lists used to display some sort of result table.

You can read further details in the on-line help itself; for now I wish to focus just on the epiloguehtml which is an html string, normally appearing after the result table. For now we'll assume that the only feedback we wish to give the student is the image itself. We'll further assume that the "grader" will mark any image correct. The output from the template grader then needs to have only the fraction and the epiloguehtml fields. For example, if the latter is simply a cheery message, the output from the template grader could be something of the form

{
    fraction: 1.0,
    epiloguehtml: "<h2>Well done!</h2>"
}

To see such a combinator template grader in action, create a Python3 question, customise it, select a Template grader, leave Is combinator checked and set the template to the code

import json
print(json.dumps({"fraction": 1.0, "epiloguehtml": "<h2>Well done!</h2>"}))

If you test drive this question, submitting any answer at all (it doesn't need to be code), you should see something like the following:

Trivial combinator template grader output

Now we'll look into how to include an image in that feedback.

Data URIs

Data URIs, also known as Data URLs, are URIs where the data that the URI "defines" is encoded within the URI itself rather than being pointed to by a traditional URL. We wish to encode image data into the URI and since the image data is (usually) binary we need to first encode it in base64. If we confine our attention to either png or jpeg images, the format of our data URIs will be either:

  1. data:image/png;base64,<filedatabase64> or
  2. data:image/jpeg;base64,<filedatabase64>

where <filedatabase64> is the contents of the file in base64.

Knowing that, a Python function to build a Data URI from a given png or jpeg image file is the following:

import base64
def make_data_uri(filename):
    """Given a png or jpeg image filename (which must end in .png or .jpg/.jpeg 
       resp.) return a data URI as a UTF-8 string.
    """
    with open(filename, 'br') as fin:
        contents = fin.read()
    contents_b64 = base64.b64encode(contents).decode('utf8')
    if filename.endswith('.png'):
        return "data:image/png;base64,{}".format(contents_b64)
    elif filename.endswith('.jpeg') or filename.endswith('.jpg'):
        return "data:image/jpeg;base64,{}".format(contents_b64)
    else:
        raise Exception("Unknown file type passed to make_data_uri")

Running the student code safely

The biggest complication with template graders in general, and particularly with combinator template graders, is that they must output the specified JSON record defining the feedback. They must do this even if the student code is faulty, or throws an exception or segment faults or times out, etc. This means the code has to be run separately from the template itself in some sort of secondary sandbox with a watchdog timer.

In Python, the subprocess module is usually the best tool for the job, with a piece of code like the following:

import subprocess
output = ''
failed = False
try:
    outcome = subprocess.run(
        ['python3', '-c', student_code],
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        timeout = 2, # 2 second timeout MUST BE LESS THAN JOBE TIMEOUT
        universal_newlines=True,
        check=True
    )
except subprocess.CalledProcessError as e:
    outcome = e
    output = "Task failed with return code = {}\n".format(outcome.returncode)
    failed = True
except subprocess.TimeoutExpired as e:
    outcome = e
    output = "Task timed out\n"
    failed = True
output += outcome.stdout
if outcome.stderr:
    output += "*** Error output ***\n"

Note that the watchdog timer (in this case the timeout parameter to subprocess.run) must be less than the Jobe timeout for the task, which for Python defaults to 3 seconds. Otherwise an endless loop in the student code will result in the entire run being aborted by Jobe and the required JSON grader output won't happen.

Generating the JSON output

Suppose we are expecting the student code to generate an image file called outputimage.png. Then, having run the student code in a subprocess as above, the template grader can finish by picking up that image, encoding it into a data URI, and inserting that data URI into the epiloguehtml field of the final JSON output. For example:

html = ''
if output:
    html += "<pre>{}</pre>".format(output)
if not failed and os.path.isfile('outputimage.png'):
    data_uri = make_data_uri('outputimage.png')
    html += """<img class="data-uri-example" title="Output image" src="{}" alt="outputimage.png">
    """.format(data_uri)
# Lastly print the JSON-encoded result required of a combinator grader
print(json.dumps({'epiloguehtml': html,
                  'fraction': 0.0 if failed else 1.0
}))

Note that any textual output from the program precedes the image in the displayed feedback.

Picking up matplotlib images

The example at the start of this post shows a Python matplotlib question. Normally the student code for such a question will not explicitly generate a file; rather, the image will be displayed on screen when the code is run. To capture the matplotlib output into an image file when running on Jobe, we must prefix the student code with a request to matplotlib to use a non-interactive backend such as Agg, which writes figures to a png file. We also have to add suffix code to call plt.savefig.

The attached file matplotlibimagedemo.xml is an export of the question shown at the start of this posting. Also attached in the file matplotlibimagecapturetemplate.py is the template code for that question.

Grading the output

The astute reader will have noticed that while this question produces nice-looking output, the student gets given full marks for any code at all that runs correctly, regardless of whether it produces an appropriate output image or not. So ... it's useless, right?

Well, maybe not totally useless. That particular question does let students experiment with matplotlib code and see the output images in, say, a tutorial context. But it's certainly useless from an assessment point of view.

However, all is not lost. If what you're trying to grade is the matplotlib image properties, you can pick up the current axes (after the student code is run) and check various attributes such the title, axis labels, data plotted, etc against the expected ones and issue part marks accordingly.

But ... this blog has gone on long enough, so I'll finish with an example Python function that prints out the properties of the current matplotlib axes object (assumed to be a bar chart). You can use that as a starting point for your own questions.

def print_plot_info():
    """Output key attributes of current plot, as defined by plt.gca()"""
    try:
        current_axes = plt.gca()
        subplot = current_axes.axes   # The actual plot, assuming just one subplot
        print("Plot title: '{}'".format(current_axes.title.get_text()))
        print("X-axis label: '{}'".format(subplot.get_xlabel()))
        print("Y-axis label: '{}'".format(subplot.get_ylabel()))
        print("Number of bars:", len(subplot.patches))
        x_tick_labels = [label.get_text() for label in subplot.get_xticklabels()]
        if all(label.strip() == '' for label in x_tick_labels):
            x_tick_labels = [str(pos) for pos in subplot.get_xticks()]
        print("\nX-axis ticks:\n" + ', '.join(x_tick_labels))
        y_tick_labels = [label.get_text() for label in subplot.get_yticklabels()]
        if all(label.strip() == '' for label in y_tick_labels):
            y_tick_labels = [str(pos) for pos in subplot.get_yticks()]
        print("\nY-axis ticks:\n" + ', '.join(y_tick_labels))
        bar_height_strings = [format(bar.get_bbox().ymax, '.0f') for bar in subplot.patches]
        print("\nBar heights:\n" + ', '.join(bar_height_strings))
    except Exception as exception:
        print("Failed to get plot info:", str(exception))

Happy authoring!

-- Richard