Testing c++, floating point, student answers, with tolerance

Re: Testing c++, floating point, student answers, with tolerance

de Richard Lobb -
Número de respuestas: 0
I think you've started by copying the simple example of a template grader from the documentation, but the assumption there was that the student answer was just a simple bit of text to be graded, rather than a program to be run.

The Twig STUDENT_ANSWER variable is whatever the student enters into their answer box. So in your case it's a C++ program, which you first need to compile and run in order to get the output it generates.

Here's a simplistic per-test template grader that assumes the output from the student program is just a single floating point number to be compared to the expected value within a certain tolerance. I attach a question that uses this template - one that asks students to write their own Newton root finder.
import subprocess, json

TOLERANCE = 0.0001

__student_answer__ = """{{ STUDENT_ANSWER | e('py') }}"""
test = json.loads("""{{TEST | json_encode }}""")

with open("prog.cpp", "w") as outfile:
    outfile.write(__student_answer__)
    
fraction = 0 # Pessimism
abort = False

compile_result = subprocess.run(
    ['g++', 'prog.cpp', '-o', 'prog'],
    capture_output=True,
    text=True,
)

if compile_result.returncode != 0:
    got = f"*** Compile error ***\n{compile_result.stderr}"
    abort=True
else:
    # Compile OK. Try running it.
    result = subprocess.run(['./prog'], capture_output=True, text=True, check=True)
    got = result.stdout
    
    try:
        got_float = float(got)
        expected_float = float(test['expected'])
        if abs(got_float - expected_float) <= TOLERANCE:
            fraction = 1
    except ValueError:
        got = f"Output was '{got.rstrip()}'\nCannot be converted to a float.\n"
        got += "Further testing aborted."
        abort = True
    except Exception as e:
        got = f"Unexpected error: {e}"

result = {"fraction": fraction, "got": got, "abort": abort}
print(json.dumps(result))

As the text of the attached question explains, this is inefficient as it does a separate compile-and-run for each test. You really should use a combinator template grader instead, iterating through all the test cases, but that does add extra complexity.