To share my approach, I use pixel by pixel comparison. In the customisation section, I add code something like this:
import sys
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread
from scipy.linalg import norm
from scipy import sum, average
import numpy as np
#from skimage.measure import compare_ssim as ssim
def ans(y_vals):
plt.plot(y_vals)
return plt
def save_plt(plt, fname):
plt.savefig(fname)
plt.close()
def main(file1, file2):
# read images as 2D arrays (convert to grayscale for simplicity)
img1 = normalize(imread(file1).astype(float))
img2 = normalize(imread(file2).astype(float))
#return ssim(img1, img2, multichannel=True)
diff = img1 - img2
# compare
n_0 = norm(diff.ravel(), 0)
return n_0 * 1.0 / img1.size
def normalize(arr):
rng = arr.max() - arr.min()
if rng == 0:
rng = 1
amin = arr.min()
return (arr - amin) * 255 / rng
{{ STUDENT_ANSWER }}
The student will have to return the plt object in order to save the image they created using their function. Then I test the cost like this:
y_vals = [1, 4, 9, 16]
plt1 = make_graph(y_vals)
save_plt(plt1, "output.png")
plt2 = ans(y_vals)
save_plt(plt2, "ans.png")
val = main("output.png", "ans.png")
if val <= 0.0001:
print("Pass!")
else:
print("Your graph is different to the expected output, difference: {:.4f}".format(val))
Here I can set a threshold value to adjust how much it could differ. The above example just draws a simple line so I expect a lot less difference so set a low threshold.
However, I do have issues as mentioned by Richard. For example, I examine students to add xticks but sometimes the location of them are shifted by 1, causing a lot of pixel differences. In such case I actually retrieve the xtick values and compare with expected values. Like this, each question can add addtional checks to ensure the produced answer is correct. Nevertheless, it works in some simple cases but would need a good check with test cases and anticipated student submissions to ensure students are marked correctly.