An example would be: given a non-deterministic finite automaton, construct a deterministic one that accepts the same language. The expected automaton is drawn as a labelled graph, just as it would be on paper. You can get ideas for automata/grammar questions from tools such as Exorciser https://www.swisseduc.ch/compscience/exorciser/index.html and JFLAP http://www.jflap.org/
The question author provides the correct answer (also by drawing a graph). A student's answer is compared against that. As you suggest, you can test all strings up to a certain length and/or some longer, random strings. If you find a string that is not correctly handled, it can be given to the student as feedback. For finite automata, there are decision procedures which can determine the correctness without testing any strings, but in practice testing is enough as there are usually short counterexamples and having these is useful for students to correct their answers. Also the testing method extends to more general automata such as pushdown automata or Turing machines whose equivalence is undecidable.
This method has been applied to other formalisms such as grammars or regular expressions (these don't need a graphical input). For grammars, you can generate all strings up to a certain length in the language generated by the student's grammar and compare them with the language generated by the correct grammar. The method can be applied to any formalism for which you can create a parser for the students' answers and an automated testing/verification procedure.