[racket] testing student programs

Todd O'Bryan

unread,

Oct 16, 2010, 2:16:29 PM10/16/10

to PLT-Scheme Mailing List

I know this has come up on the list before, and I've reread those
threads but am little confused.

Here's a sample student program file:
------------------------------------------------------------------------------------------
; volume-of-solid: number number number -> number
; given the length, width, and height of a rectangular prism,
; produces the volume
(define (volume-of-solid length width height)
(* length width height))

(check-expect (volume-of-solid 2 3 4) 24)
(check-expect (volume-of-solid 3 5 7) 105)
--------------------------------------------------------------------------------------------

I'd like to test this file with something like:
---------------------------------------------------------------------------------------------
#lang racket

(define score 0)

(when (= (volume-of-solid 3 4 5) 60)
(set! score (add1 score)))

(when (= (volume-of-solid 10 5 4) 200)
(set! score (add1 score)))
---------------------------------------------------------------------------------------------
but I can't figure out how to do it safely.

It seems like if I use make-module-evaluator, I'm stuck in the context
of the original student program--that is, Beginning Student Language,
without the ability to accumulate a score or use constructs that
aren't defined in BSL. If I provide the student functions in full
Racket, I don't get the safety of the sandbox, and the student code
could do something not nice to my test system.

Obviously, there's a place here for a really nice macro-based testing
harness that checks for errors in each student function call, lets you
assign points for each test, etc., but I have to figure out how to get
the definitions I want to test safely into a context that lets me
write the code to evaluate them.

Thanks in advance,
Todd
_________________________________________________
For list-related administrative tasks:
http://lists.racket-lang.org/listinfo/users

Nadeem Abdul Hamid

unread,

Oct 16, 2010, 3:00:21 PM10/16/10

to PLT-Scheme Mailing List

I don't think it should be that difficult once you get an evaluator
set up. I've done something like what you want, taking some ideas from
the handin server code. What I came up with (no fancy macros) is you
define a test specification for an assignment like this (in a #lang
racket file):

****************************************
;; this is my implemented solution of an exercise,
;; to test against students':
(define (total-profit x)
...blah blah blah....)

(define the-assignment
(assignment "Homework 2"
"hw2"
'(htdp intermediate)
(list
(problem "2a. Movie Theater Profit"
"hw2-movie.rkt"
`(
(proc total-profit 1)
(type (total-profit 5) "a number" ,number?)
(test (total-profit 0) -20)
(test (total-profit 4) ,(total-profit 4)) ;
<-- note unquote
(test (total-profit 10) ,(total-profit 10))
))
.... more problems ...
****************************************

then run the program and it loads and tests students' files against
the specification to produce an output text file like this:

****************************************
ASSIGNMENT: Homework 2
Language: (htdp intermediate)
Passed 46 out of 48 tests.

PROBLEM: 2a. Movie Theater Profit
Passed 8 out of 8 tests.
PASS: File name matches 'hw2-movie.rkt'?
PASS: File evaluated without error?
PASS: File ran without timeout?
----
PASS: Is 'total-profit' defined as a function of 1 parameter?
PASS: Does (total-profit 5) produce a number?
PASS: Does (total-profit 0) produce an expected result?
PASS: Does (total-profit 4) produce an expected result?
PASS: Does (total-profit 10) produce an expected result?

... etc. ...
***************************************

From my experience, BSL files are somewhat of a pain to test in this
way, because definitions are processed as syntax, so I came up with a
hack to override the language of BSL files and load them in ISL mode
instead.

I'll attach a few files of mine that hopefully you can adapt for your
purposes:*
*(I've actually sent the files separately
to Todd; if anyone else wants, I'll be
happy to send them individually.)
eval2.rkt is the stuff for setting up an evaluator (given
source code as a stream of bytes); checker2.rkt is the stuff for
checking an evaluator against assignment specifications such as the
one above; and csc-auto-check.rkt is the main script for checking
student subdirectories, and it also provides a simple gui interface to
choose the assignment spec and student directory to check against.
I've also attached a complete homework assignment specification file.
There are probably some rough edges here and there in this code, and
certainly some additional stuff could be done to make it more useable,
less tedious to write test specifications, etc., but it works to some
degree. I tried to get coverage tests working, and succeeded to some
extent, but not completely, so that is disabled in the code now, which
causes some failures in the test suite. The checker does provide
simple timeout functionality (in case student code has an infinite
loop), and handles the case when an input file has syntax errors (all
tests automatically fail), and there is a flag in the output
indicating that the file did not evaluate without errors.

HTH,

nadeem

Eli Barzilay

unread,

Oct 16, 2010, 3:18:47 PM10/16/10

to Todd O'Bryan, PLT-Scheme Mailing List

40 minutes ago, Todd O'Bryan wrote:
> I know this has come up on the list before, and I've reread those
> threads but am little confused.

> [...]

> It seems like if I use make-module-evaluator,

For student languages it might be better to use `make-evaluator'.
(With `make-module-evaluator' you should use something like "#lang
htdp/bsl", but IIRC it's not completely the same.)

Get your sandbox up with:

(define e (make-evaluator '(special beginner) "

(define (volume-of-solid length width height)
(* length width height))
(check-expect (volume-of-solid 2 3 4) 24)
(check-expect (volume-of-solid 3 5 7) 105)

"))

But also note that the `check-expect' expressions are not being
executed. (They're done in a way that makes it very hard to run them,
and I don't think that there's a known easy way for that.)

You can also use a path that points to a file

(define e (make-evaluator '(special beginner)
(string->path "/some/path")))

or just plain s-expressions.

> I'm stuck in the context of the original student program--that is,
> Beginning Student Language, without the ability to accumulate a
> score or use constructs that aren't defined in BSL.

You shouldn't do your accounting inside the sandbox -- think about it
as a way to restrict a piece of code, but if you run code in it, then
your code isn't safe. For example, what if the student code redefines
`+'? (I know that in the student languages you can't redefine things,
but it's generally not a good idea.)

So just do the evaluation in the student's context, and the rest
outside:

> (define score 0)
> (when (= (e '(volume-of-solid 3 4 5)) 60) (set! score (add1 score)))
> (when (= (e '(volume-of-solid 10 5 4)) 200) (set! score (add1 score)))
> score
2

You can also do something like this:

> (define volume-of-solid (e 'volume-of-solid))
> (when (= (volume-of-solid 3 4 5) 60) ...)

but that means that you're running the body of the student function
outside of the sandbox -- so it can now eat up your memory etc.

One more note: this works because the result of these expressions are
numbers, and numbers are the same inside and outside the sandbox.
This becomes an issue when you're dealing with structs -- the
sandboxed environment will have its own idea for these structs from
the outside, so you need to specify sharing of the relevant modules if
you're dealing with these cases. For example, if you want to check
posn results, etc. Yet another alternative is to do the comparison
inside the sandbox:

(when (e '(= (volume-of-solid 3 4 5) 60)) (set! score (add1 score)))

which will work for structs too -- but that again depends on `=' doing
the expected thing in the sandbox.

> Obviously, there's a place here for a really nice macro-based
> testing harness that checks for errors in each student function
> call, lets you assign points for each test, etc., but I have to
> figure out how to get the definitions I want to test safely into a
> context that lets me write the code to evaluate them.

The sandbox tests do something similar to that:

http://git.racket-lang.org/plt/HEAD:/collects/tests/racket/sandbox.rktl

and note how the testing needs to switch into evaluation in the
sandbox and outside of it.

But for just the kind of test counting as you do above, I suspect that
this is more in the direction of what you want:

> (define tests '([(volume-of-solid 3 4 5) 60]
[(volume-of-solid 10 5 4) 200]))
> (for/fold ([score 0]) ([test (in-list tests)])
(+ score (if (equal? (e (car test)) (cadr test)) 1 0)))
2

--
((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay:
http://barzilay.org/ Maze is Life!

Reply all

Reply to author

Forward