To share my approach, I use pixel by pixel comparison. In the customisation section, I add code something like this:

import sys

import matplotlib.pyplot as plt

from matplotlib.pyplot import imread

from scipy.linalg import norm

from scipy import sum, average

import numpy as np

#from skimage.measure import compare_ssim as ssim

def ans(y_vals):

plt.plot(y_vals)

return plt

def save_plt(plt, fname):

plt.savefig(fname)

plt.close()

def main(file1, file2):

# read images as 2D arrays (convert to grayscale for simplicity)

img1 = normalize(imread(file1).astype(float))

img2 = normalize(imread(file2).astype(float))

#return ssim(img1, img2, multichannel=True)

diff = img1 - img2

# compare

n_0 = norm(diff.ravel(), 0)

return n_0 * 1.0 / img1.size

def normalize(arr):

rng = arr.max() - arr.min()

if rng == 0:

rng = 1

amin = arr.min()

return (arr - amin) * 255 / rng

{{ STUDENT_ANSWER }}

The student will have to return the plt object in order to save the image they created using their function. Then I test the cost like this:

y_vals = [1, 4, 9, 16]

plt1 = make_graph(y_vals)

save_plt(plt1, "output.png")

plt2 = ans(y_vals)

save_plt(plt2, "ans.png")

val = main("output.png", "ans.png")

if val <= 0.0001:

print("Pass!")

else:

print("Your graph is different to the expected output, difference: {:.4f}".format(val))

Here I can set a threshold value to adjust how much it could differ. The above example just draws a simple line so I expect a lot less difference so set a low threshold.

However, I do have issues as mentioned by Richard. For example, I examine students to add xticks but sometimes the location of them are shifted by 1, causing a lot of pixel differences. In such case I actually retrieve the xtick values and compare with expected values. Like this, each question can add addtional checks to ensure the produced answer is correct. Nevertheless, it works in some simple cases but would need a good check with test cases and anticipated student submissions to ensure students are marked correctly.