-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Example of Verify command in Pedal command line
- Loading branch information
Showing
13 changed files
with
293 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
Testing Your Tests | ||
================== | ||
|
||
A major feature of Pedal is the ability to "test your tests". | ||
You can create example student ``Submission``, and then also create | ||
the expected output of those submissions. You can then run your | ||
autograding script on those submissions, and see if the feedback | ||
you expect is generated. | ||
|
||
The system is also capable of saving the output generated from a given | ||
submission, for future comparisons, as a way of "freezing" the expected | ||
output. This is basically the same as regression testing, but for | ||
autograding feedback! | ||
|
||
Verify Mode | ||
----------- | ||
|
||
The ``verify`` mode is a special mode of the ``pedal`` command line tool | ||
that allows you to run your autograding script on a submission, and | ||
compare the output to a "frozen" version of the output. You can | ||
run the command as follows: | ||
|
||
.. code-block:: console | ||
pedal verify grade_assignment.py submissions/ | ||
The names of the ``grade_assignment.py`` script and the ``submissions/`` folder | ||
are completely under your control. They simply expect the path to a | ||
Pedal autograding script, and a folder containing submissions. | ||
The system will run the script on all of the Python files in the | ||
submissions folder, and compare them to the output files with the same | ||
names. | ||
|
||
If you have not yet created the output files, you can use the ``--create_output`` | ||
argument to generate them (note that this will delete existing output files): | ||
|
||
.. code-block:: console | ||
pedal verify grade_assignment.py submissions/ --create_output | ||
This will run the autograding script on all of the submissions, and save | ||
the output to the output files. The expected output is stored in easy-to-read | ||
``.out`` files using a format similar to ``ini`` files. The results are shown | ||
using the built-in unittest module, so inconsistencies will be shown as passing | ||
and failing unit tests. | ||
|
||
Example | ||
------- | ||
|
||
Let's try an example of testing a simple autograding script. Let us say | ||
that the students were assigned the following prompt: | ||
|
||
The code below stores the string value ``"order out of"`` into the variable | ||
``a_phrase``. Using only string subscripting and the addition operator, | ||
print the phrase ``"out of order"``. | ||
|
||
And they had the following starting code: | ||
|
||
.. code-block:: python | ||
a_phrase = "order out of" | ||
print(a_phrase) | ||
The autograding script might look like this: | ||
|
||
.. code-block:: python | ||
from pedal import * | ||
ensure_literal("order out of") | ||
prevent_ast("Str", at_most=1) | ||
ensure_ast("Subscript", at_least=2) | ||
prevent_ast("Subscript", at_most=5) | ||
ensure_function_call("print") | ||
assert_not_output(student, "out of order ", exact_strings=True, | ||
message="There's an extra space at the end!", | ||
label="printing_extra_space") | ||
assert_output(student, "out of order", exact_strings=True) | ||
This shows off a number of the features of Pedal: | ||
|
||
* `ensure_literal` checks that the student has the string "order out of" in their code. | ||
* `ensure_ast` and `prevent_ast` checks that the student has at least 2, but no more than 5, subscripts. | ||
* `prevent_ast` stops them from adding any string literals to their code, besides the one given. | ||
* `ensure_function_call` checks that they are using the `print` function. | ||
* `assert_not_output` checks that they are not printing "out of order " with an extra space at the end, with a custom | ||
message and label. | ||
* `assert_output` actually checks that they are printing "out of order" in the end. | ||
|
||
A correct solution to this problem is provided on the left, and the output file that would be generated | ||
is on the right. The exact syntax of the output file is explained further below, but for now you can see that | ||
the file lists the fields and their values of the `Feedback` object that was generated by the autograding script. | ||
|
||
+-------------------------------------------------------+-------------------------------------------------------+ | ||
| **Correct.py** | **Expected Output** | | ||
+=======================================================+=======================================================+ | ||
| .. literalinclude:: verify_examples/correct.py | .. literalinclude:: verify_examples/correct.out | | ||
| | :language: ini | | ||
| | | | ||
+-------------------------------------------------------+-------------------------------------------------------+ | ||
|
||
This is a difficult problem for students, and they might make a number of mistakes. | ||
For example, they could forget to add the space in the middle of the phrase: | ||
|
||
+-----------------------------------------------------------+-------------------------------------------------------+ | ||
| **Incorrect: Missing Space** | **Expected Output** | | ||
+===========================================================+=======================================================+ | ||
| .. literalinclude:: verify_examples/missing_space.py | .. literalinclude:: verify_examples/missing_space.out | | ||
| | :language: ini | | ||
| | | | ||
+-----------------------------------------------------------+-------------------------------------------------------+ | ||
|
||
One of the most commonly made (especially when ChatGPT is asked) is to ignore the | ||
instructions and add a space as a string literal: | ||
|
||
+-----------------------------------------------------------+-------------------------------------------------------+ | ||
| **Incorrect: Space Literal** | **Expected Output** | | ||
+===========================================================+=======================================================+ | ||
| .. literalinclude:: verify_examples/literal_space.py | .. literalinclude:: verify_examples/literal_space.out | | ||
| | :language: ini | | ||
| | | | ||
+-----------------------------------------------------------+-------------------------------------------------------+ | ||
|
||
Another common mistake is to add an extra space at the end (``"out of order "`` instead of ``"out of order"``), | ||
by incorrectly grabbing the last character of the string in the third subscript: | ||
|
||
+-----------------------------------------------------------+-------------------------------------------------------+ | ||
| **Incorrect: Extra Space** | **Expected Output** | | ||
+===========================================================+=======================================================+ | ||
| .. literalinclude:: verify_examples/extra_space.py | .. literalinclude:: verify_examples/extra_space.out | | ||
| | :language: ini | | ||
| | | | ||
+-----------------------------------------------------------+-------------------------------------------------------+ | ||
|
||
Another common mistake is to try to print the answer directly, which is not allowed. | ||
Note that the student tries to be clever here, and includes an unused reference to the variable `a_phrase`, | ||
so that the system does not complain about the unused variable: | ||
|
||
+---------------------------------------------------------------+-----------------------------------------------------------------+ | ||
| **Incorrect: Printing Answer** | **Expected Output** | | ||
+===============================================================+=================================================================+ | ||
| .. literalinclude:: verify_examples/tried_printing_answer.py | .. literalinclude:: verify_examples/tried_printing_answer.out | | ||
| | :language: ini | | ||
| | | | ||
+---------------------------------------------------------------+-----------------------------------------------------------------+ | ||
|
||
If we had all of the output files generated, we could run the following command: | ||
|
||
.. code-block:: console | ||
pedal verify grade_assignment.py verify_examples/ | ||
And that would generate the following unittest output: | ||
|
||
.. code-block:: console | ||
..... | ||
---------------------------------------------------------------------- | ||
Ran 5 tests in 0.001s | ||
OK | ||
This would show that all of the tests passed, and that the autograding script is working as expected. | ||
But what if we decided later to get rid of the custom message for the extra space mistake, without | ||
updating our output files? Then, the output would look like: | ||
|
||
.. code-block:: diff | ||
====================================================================== | ||
FAIL: validate.py, using extra_space.py | ||
---------------------------------------------------------------------- | ||
Traceback (most recent call last): | ||
File "pedal\command_line\modes.py", line 450, in new_test | ||
self.assertEqual(entire_expected, entire_actual, | ||
AssertionError: "correct: False\nsuccess: False\nscore: 0[156 chars]ne\n" != 'correct: False\nsuccess: False\nscore: 0[265 chars]ne\n' | ||
correct: False | ||
success: False | ||
score: 0 | ||
scores: [] | ||
category: specification | ||
- label: printing_extra_space | ||
+ label: assert_output | ||
title: Failed Instructor Test | ||
- message: There's an extra space at the end! | ||
+ message: Student code failed instructor test. | ||
+ I ran your code. | ||
+ The printed output was: | ||
+ out of order | ||
+ But I expected the output to be: | ||
+ out of order | ||
location: None | ||
: Wrong value for 'label', 'message' in extra_space. | ||
You can see the helpfulness of the custom feedback message, since the two strings look very similar. | ||
Most likely, you would want to keep the assertion with its custom message. | ||
But more importantly, the verification system is showing you the lines that are different between the actual | ||
and expected output. | ||
|
||
|
||
Output File Format | ||
------------------ | ||
|
||
The output files use a format similar to ``ini`` format, which is a simple key-value format. | ||
Sections are divided by square brackets, and each key-value pair is separated by a colon. | ||
A value can be a string, a number, or any other ``repr``-able object. | ||
The results will be compared as strings, so be careful with floating point numbers! | ||
Indentation is used in the message field to allow for multi-line messages. | ||
|
||
Only the included fields found in the file are compared, so you can include as much or as little information | ||
as you want in the output files. If a field is not important, then you don't need to include it | ||
in the output file. So if you wanted to confirm that a feedback was not ``correct``, but didn't | ||
care what the exact message was, you could have a file like this: | ||
|
||
.. code-block:: ini | ||
[standard.final] | ||
correct: False | ||
This would only check that the feedback was not correct, and would ignore the message, the location, and any other fields. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
[standard.final] | ||
correct: True | ||
success: True | ||
score: 1 | ||
scores: [] | ||
category: complete | ||
label: set_correct_no_errors | ||
title: Complete | ||
message: Great work! | ||
location: None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
a_phrase = "order out of" | ||
print(a_phrase[6:] + a_phrase[5] + a_phrase[:5]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
[standard.final] | ||
correct: False | ||
success: False | ||
score: 0 | ||
scores: [] | ||
category: specification | ||
label: printing_extra_space | ||
title: Failed Instructor Test | ||
message: There's an extra space at the end! | ||
location: None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
a_phrase = "order out of" | ||
print(a_phrase[6:] + a_phrase[5] + a_phrase[:6]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
[standard.final] | ||
correct: False | ||
success: False | ||
score: 0 | ||
scores: [] | ||
category: specification | ||
label: prevent_ast | ||
title: May Not Use Code | ||
message: You used a string literal on line 2. | ||
You may not use that more than 1 times, | ||
but you used it 2 times. | ||
location: <Location(2, 21, None)> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
a_phrase = "order out of" | ||
print(a_phrase[6:] + " " + a_phrase[:5]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
[standard.final] | ||
correct: False | ||
success: False | ||
score: 0 | ||
scores: [] | ||
category: specification | ||
label: assert_output | ||
title: Failed Instructor Test | ||
message: | ||
Student code failed instructor test. | ||
I ran your code. | ||
The printed output was: | ||
out oforder | ||
But I expected the output to be: | ||
out of order | ||
location: None |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
a_phrase = "order out of" | ||
print(a_phrase[6:] + a_phrase[:5]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
[standard.final] | ||
correct: False | ||
success: False | ||
score: 0 | ||
scores: [] | ||
category: specification | ||
label: prevent_ast | ||
title: May Not Use Code | ||
message: You used a string literal on line 3. | ||
You may not use that more than 1 times, | ||
but you used it 2 times. | ||
location: <Location(3, 6, None)> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
a_phrase = "order out of" | ||
a_phrase | ||
print("out of order") |