-
Notifications
You must be signed in to change notification settings - Fork 110
TM031 Automated Grading Support
Goal: define and implement a (JavaScript) interface that can run a set of test suites of an assignment against a set of implementations, exporting data as needed.
(Scroll down to #Motivation for the original beginning of the document.)
This is a list of all the things that should go into pyret-lang
. Everything else can, and therefore will, exist outside of Pyret.
- Finish check results API.
- Add tests to the
checker-api
branch (see open pull request: #997). - I was initially waiting on Joe or Ben to comment on the proposed interface before doing so.
- Add tests to the
- Get
shared-gdrive
imports working from the command line.- I believe most of the work is already done on the
httplib
branch.
- I believe most of the work is already done on the
- Add command-line option to specify a local directory to serve as the source of
my-gdrive
imports.- Haven't done any work for this, but it should be a relatively straightforward addition.
After all that is done, I envision the usage to look like this:
To evaluate a student implementation, run something like
$ make foo-tests-ta.jarr
$ node foo-tests-ta.jarr --my-gdrive student_alpha@brown.edu/final/ --run-full-report > student_alpha_impl.json
To evaluate a student test, run
$ make student_alpha@brown.edu/sweep/foo-tests.jarr
$ node student_alpha@brown.edu/sweep/foo-tests.jarr --my-gdrive foo-ta-resources/ --run-full-report > student_alpha_test.json
From there, the JSON data can be processed outside of Pyret, and contains all the data one would want in order to assign grades.
The pedagogy that Brown's CS019 and CS173 have adopted involves having students hand in two files: foo-code.arr
and foo-tests.arr
. The former would be an implementation of some specified functions, and may contain implementation-dependent tests, while the latter would contain implementation-independent tests. Evaluation of the submission involves both checking foo-code.arr
for correctness, by running the staff's test suite against it, as well as checking foo-tests.arr
for its ability to classify incorrect implementations, by running it against one known-correct implementation ("gold") and some number of known-buggy implementation ("coals").
As a result, for each assignment, there's a lot of (a) iterations of Pyret-running-something that need to happen, and (b) data to be collected.
Suppose student submissions are from Captain Teach, and exporting gives you the following directory structure:
submissions/
├── student_alpha@brown.edu/
│ ├── sweep/
│ │ └── foo-tests.arr
│ └── final/
│ └── foo-code.arr
| └── foo-tests.arr
│
├── student_beta@brown.edu/
│ ├── sweep/
│ │ └── foo-tests.arr
│ └── final/
│ └── foo-code.arr
| └── foo-tests.arr
├── ...
.
.
.
Then, the input could/would be:
- submissions directory:
submissions/ :: DirectoryIdentifier
- the sub-directory:
"final" :: String
- implementation name:
"foo-code.arr" :: String
- test name:
"foo-tests.arr" :: String
- the staff test suite:
foo-tests-ta.arr :: FileIdentifier
- the staff gold:
foo-gold.arr :: FileIdentifier
- the staff coals:
[foo-coal-1.arr, foo-coal-2.arr] :: List<FileIdentifier>
, orcoals/ :: DirectoryIdentifier
- timeout:
x-minutes :: Time
From there, it should:
- For each
$student_email
, runfoo-tests-ta.arr
where itsimport my-gdrive("foo-code.arr")
resolves tosubmissions/$student_email/final/foo-code.arr
- For each
$student_email
, for each$staff_impl = [gold.arr, foo-coal-1.arr, foo-coal-2.arr]
, runsubmissions/$student_email/final/foo-tests.arr
, where itsimport my-gdrive("foo-code.arr")
resolves to$staff_impl
- Any time running Pyret takes longer than
x-minutes
, halt, reporttimeout
as an error, and move on. - Output organized data
- Not require all these arguments. E.g. when grading sweeps, we can skip the first step.
Optionally, it could:
- Output summarized grade data for each student, based on some specified grading heuristic.
- Enforce internal consistency: create a "submission" with
foo-gold.arr
andfoo-tests-ta.arr
, make sure that "submission" gets a 100% score. - Collect data about external consistency: for each
$student_email
, runsubmissions/$student_email/final/foo-tests.arr
where itsimport my-gdrive("foo-code.arr")
resolves tosubmissions/$student_email/final/foo-code.arr
.
- Check Result API
- Ability to have
import my-gdrive("foo-code.arr")
resolve to a specific, chosen replacement forfoo-code.arr
. - Ability to have
shared-gdrive
imports resolve correctly from the command-line.
- Awareness of and/or integration with Captain Teach, including awareness of and robustness against common hand-in issues.
- Web interface. There's some work on the
grade
branch ofcode.pyret.org
, which was able to get the job done this semester. It wasn't great, but it worked.