Testing c++, floating point, student answers, with tolerance

Testing c++, floating point, student answers, with tolerance

von William Oquendo -
Anzahl Antworten: 1

Dear all,

I want to test some students answers with some tolerance (like 5% relative). Currently I have been just comparing exact answers but using less decimals, which is not ideal. I am pretty sure the grader combinator is the answer, but I have not been able to implement it successfully. I have read:

and tried to implement it using the function at the end (is in python but as long as I understand it can work with c++ programs because it just captures the expected output and the students answer). But I must be sincere, I dont even know where to put the testing function, and It seems that llm are confusing old and new terms in code runner or just hallucinating fields where to put this so I am stuck. 

If someone has a clue about a simple implementation for c++ questions ( i mention the language just in case it is important), or if is there an actual question example implementing this, I will be glad if you share it with me.

Thanks in advance.


import json
import sys

# Get the student's output and the expected output
got = """{{ STUDENT_ANSWER | e('py') }}"""
expected = """{{ TEST.expected | e('py') }}"""

# Define the tolerance
tolerance = 0.05

try:
    # Convert the outputs to floating-point numbers
    got_float = float(got)
    expected_float = float(expected)

    # Compare the numbers with tolerance
    if abs(got_float - expected_float) <= tolerance*expected_float:
        # If the answer is correct, print a JSON object with a fraction of 1
        result = {"fraction": 1.0, "got": got}
    else:
        # If the answer is incorrect, print a JSON object with a fraction of 0
        result = {"fraction": 0.0, "got": got}

except ValueError:
    # If the output cannot be converted to a float, mark it as incorrect
    result = {"fraction": 0.0, "got": got}

# Print the JSON object to be processed by CodeRunner
print(json.dumps(result))
Als Antwort auf William Oquendo

Re: Testing c++, floating point, student answers, with tolerance

von Richard Lobb -
I think you've started by copying the simple example of a template grader from the documentation, but the assumption there was that the student answer was just a simple bit of text to be graded, rather than a program to be run.

The Twig STUDENT_ANSWER variable is whatever the student enters into their answer box. So in your case it's a C++ program, which you first need to compile and run in order to get the output it generates.

Here's a simplistic per-test template grader that assumes the output from the student program is just a single floating point number to be compared to the expected value within a certain tolerance. I attach a question that uses this template - one that asks students to write their own Newton root finder.
import subprocess, json

TOLERANCE = 0.0001

__student_answer__ = """{{ STUDENT_ANSWER | e('py') }}"""
test = json.loads("""{{TEST | json_encode }}""")

with open("prog.cpp", "w") as outfile:
    outfile.write(__student_answer__)
    
fraction = 0 # Pessimism
abort = False

compile_result = subprocess.run(
    ['g++', 'prog.cpp', '-o', 'prog'],
    capture_output=True,
    text=True,
)

if compile_result.returncode != 0:
    got = f"*** Compile error ***\n{compile_result.stderr}"
    abort=True
else:
    # Compile OK. Try running it.
    result = subprocess.run(['./prog'], capture_output=True, text=True, check=True)
    got = result.stdout
    
    try:
        got_float = float(got)
        expected_float = float(test['expected'])
        if abs(got_float - expected_float) <= TOLERANCE:
            fraction = 1
    except ValueError:
        got = f"Output was '{got.rstrip()}'\nCannot be converted to a float.\n"
        got += "Further testing aborted."
        abort = True
    except Exception as e:
        got = f"Unexpected error: {e}"

result = {"fraction": fraction, "got": got, "abort": abort}
print(json.dumps(result))

As the text of the attached question explains, this is inefficient as it does a separate compile-and-run for each test. You really should use a combinator template grader instead, iterating through all the test cases, but that does add extra complexity.