GUI questions are hard.
We too have a final week on GUIs in our introductory programming course, taught using tkinter. We have a special question type python3_tkinter, which is included in the CodeRunner release on github here . I wouldn't say it was an unqualified success (tkinter isn't the greatest of starting points) but it goes OK with classes of up to six hundred students, without too many complaints. And it's certainly much better than trying to mark over five thousand questions by hand (600 x #questions_in_lab). We always set the penalty regime to zero to reduce student frustration. We always have a question in the final exam of that sort too (our exam is computer-based) and we make sure students know that if they think their answer is correct but is being marked wrong they have to contact us after the exam and we'll check it manually. At most 1 or 2% do so, and perhaps a quarter of those have a genuine case. [For example: tkinter accepts unique leading substring of keyword parameters, such as value= instead of values=. Our question type does not mimic the complete set of such weird behaviours.]
I attach an example of a simple question using that question type.
However, we don't allow the use of the Canvas widget. I think it would be very difficult to support that in a way that allowed you to give the student useful feedback if their answer was wrong.