REQ,RFC,RFP: How to secure client-side notebook grading in WASM for scale and offline learning

76 views
Skip to first unread message

Wes Turner

unread,
Jan 30, 2023, 11:41:04 AM1/30/23
to Teaching with Jupyter Notebooks
REQ: Request, RFC: Request for Comment, RFP: Request for Proposal

Given:
- Students can learn with notebooks given grading and feedback.
  Students can learn by reviewing and preparing notebooks of executable code and markdown, given grading and feedback.

Problem:
- In order to grade or review notebooks, Instructors must execute arbitrary code with sufficient process isolation.
- Notebook grading solutions like otter-grader spawn containers in order to grade submitted notebooks with sufficient process isolation by executing the notebook and running tests of the inputs and outputs.
- With server-side grading, internet access and sufficient CPU, RAM, and Storage are necessary to grade notebooks.
- With client-side grading, students could get early feedback by running the tests locally;
  though, client-side grading implies that the answers - the tests - must be distributed to the clients in order that they can run the tests locally.
- Cloud providers currently provide Confidential Computing services that execute code and data that must be encrypted before upload; Homomorphic Encryption and Zero-Knowledge Proofs.
- Instructors could encrypt the grading code for notebooks in order to do client-side grading.
  - Client-side grading with sufficient obfuscation of the answers (the tests) would be advantageous because:
    - Students could get fast early feedback
    - Students could do their work without internet access
    - Instructors could offload resource demands to the students' machines
  - Client-side grading with sufficient obfuscation of the answers (the tests) would be disadvantageous because:
    - Students would have a copy of the tests if the homomorphic encryption is insufficient or indeed effectively only obfuscatory
    - Students should not run arbitrarily encrypted [grading] code on their machines
      - Like Instructors, Students should run unsigned/untrusted code 

Is this resolveable? Could there be low-risk client-side grading of Jupyter Notebooks? Maybe client-side and server-side grading: run the tests locally, and then submit when the tests are already passing locally?

If the disadvantageous Impacts are solvable or acceptable, then the Capability would be:
- [ ] ngrader/otter-grader in WASM with a grading code encryption feature
- [ ] detect unsigned WASM consuming resources on local machines
- [ ] vscode extension for grading your own notebooks locally?

Christopher Brooks

unread,
Jan 30, 2023, 11:58:26 AM1/30/23
to Wes Turner, Teaching with Jupyter Notebooks
Wes,

Having put some thought into this previously I think it is useful to think of unit testing for learning as being made up of two parts, the diagnostic test (e.g. the asset conditional statement) and the feedback string (whatever the feedback is that the student should learn from). A give pedagogical unit test then might fall somewhere along two dimensions:
a) learning feedback vs certification of correctness; the first aims to detect errors to teach, the second to provide a grade based on demonstration of competencies
b) answer leaking vs. answer hiding; the first being a unit test which not only probes the student code but, when read by the student, "gives away the answer" which is pedagogically poor, while the latter hides the answer so the feedback string is what the student learns from

Sometimes there are clever ways to test for something that needs to exist in an answer and, if it doesn't, has a high probability of indicating a given conceptual issue but doesn't give away the answer. Sometimes there isn't.

My interest in this (currently) isn't in WASM but actually in just the "server side" container processes. If it's a docker spawner being used, can I host the certified autograder process in the student container and still be guaranteed the student can't manipulate the results/unit tests? The issues are similar though (e.g. speed of feedback response and general agility of the autograding process).

Regards,

Chris

--
You received this message because you are subscribed to the Google Groups "Teaching with Jupyter Notebooks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jupyter-educat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jupyter-education/480d5726-a52b-44d3-b108-008a66829577n%40googlegroups.com.


--
Christopher Brooks
Assistant Professor, School of Information

E-Mail: broo...@umich.edu
Web: http://christopherbrooks.ca

School of Information
University of Michigan
4439 North Quad
105 S. State St.
Ann Arbor, MI 48109-1285

Philipp Risius

unread,
Jan 30, 2023, 12:21:57 PM1/30/23
to Teaching with Jupyter Notebooks

Hi,

I have been developing, for 3 years now, a Fundamentals of Programming with Python class. I had many of the same problems and ideas, and have settled on the following structure:

  • students and instructors run nbgrader
  • tests are transparent
    • Students install a self-written "pytest-nbgrader" plugin (so far, unpublished, and requires polishing) that provides an interface between the two
    • they receive test cases as .yml files with input — expected output pairs and test categories to be run for each subtask
    • dozens or even hundreds of test cases are generated automatically using the model solution
  • students develop and test from inside the notebook
    • their code is passed to pytest via pytest-nbgrader
    • pytest runs the predefined tests as input — expected output pairs
    • pytest reports back any test cases that fail
    • students can compare expected and actual outputs for inputs that make tests fail
  • submissions are graded server-side
    • student submissions are run in a container
    • additional test cases can be defined
    • if client-side test cases pass, server-side cases should pass as well
    • lookup table of input — expected output pairs generated from distributed test cases is impractical and doesn't work
This has turned out to work well: Students do not try to cheese the tests. We have one TA looking over solutions manually and they have not detected attempts at cheesing. What they have discovered is copy/paste plagiarism of last year's model solution (could also be done automatically). That the same tasks were handed out in more than one year is simply due to resource limits — test generation is relatively straightforward and the actual work lies in thinking of novel tasks (we have not yet come around to parametrizing them, not that this would be very desirable in my opinion).

While this nbgrader+pytest approach has proved appropriate for entry-level undergrad Programming 101 (where students don't have the capability to cheese and it's easier to implement the simple programming concepts), it has worked well also for a C++ class (with pytest-cpp because the nbgrader is so neat) and I expect it to scale and extend well to larger classrooms and more complex topics. Currently, the next challenge is applying this to an ML/AI class, where we could have a server-side test set.

In the future, we may move to a JupyterHub for ease of environment rollout, but the structure would generally stay the same.

Best,
Philipp

Eric Van Dusen

unread,
Jan 30, 2023, 12:26:49 PM1/30/23
to Christopher Brooks, Sean Morris, Chris Pyles, Tetsu HARUYAMA, Jeremy Tuloup, Wes Turner, Teaching with Jupyter Notebooks
Hi 
Thanks for this conversation!

Wes  - thanks for these ideas and I think this would be an exciting / fruitful direction to build towards.  I am hopeful about Lite/WASM as having a huge future in the education space, basically in enabling more students to easy/frictionless access to notebooks without a server.  

I will just add a few anecdotes from my perspective - but I think they related to the design thinking that you are both describing.  

UC Berkeley Data Science classes teaching at scale with otter-grader - 1000s of students in dozens of notebooks where the students get real time feedback via public tests embedded in the notebooks metadata.  Students then submit assignments to Gradescope where the docker containers are created to test against private tests.  This public side of this is inherently workable in Lite/WASM.  Do students find the public tests and use them to do their homework?  ~ well that is potentially an empirical question ~ but my general sense is that they do not and our current system generally works.  

A second issue is that we have a team who helps instructors at other institutions use OER notebooks built for Data 8  at California Community Colleges, many of whom do not have access to Gradescope.  And generally an easy to use non server dependent way to use Otter enabled notebooks would be really useful in this space.  For this a lite/wasm approach as you outlined would be useful and democratizing in terms of who has access to this type of curriculum.  But also brings up the issues - is a grading solution easy to use for folks new to the field, what can we do to lower the barriers to entry to instructors new to notebooks and autograding.  

Finally - I went to Japan to visit a colleague in early January and he was teaching a 100plus person class at scale using Otter within Lite/WASM, so hey - here's to the early adopters and what their experience might teach us!

Thanks
Eric 

ps adding a few relevant voices on CC

Jonathan McMenamin-Balano

unread,
Jan 30, 2023, 6:09:40 PM1/30/23
to Teaching with Jupyter Notebooks
Eric, 

As a professor at a CC looking to do exactly what both you and Phillip have discussed, I am wondering where you might point me and where I can point my adjuncts to get up to speed.  Additionality, if you were starting over tomorrow, what would you do differently/pay more attention too or for that matter / less attention to so that your students exit your classes with applicable skills? 

Great conversation, this made my day!

Jonathan 

T H

unread,
Jan 31, 2023, 12:22:52 AM1/31/23
to Teaching with Jupyter Notebooks
Hi,
Eric forwarded this thread where I was mentioned at the end of his message. So here I am.
I used  JupyterLite in two courses this academic year (Apr to Mar in Japan) with about 150 students in each. I started with JupyterLab Desktop first, but it did not work for some students using Windows (e.g. installation errors). So I created the following JupyterLite site for them.



Otter worked smoothly, which made my life much (much) easier than otherwise.

What I did:
* Use the template -> Create a new repository
* The newly created repo tries to deploy the site, but fails (do not worry about this).
* In the new Github repo, Settings -> Pages -> Choose GitHub Actions under Build and deployment
* In the new Github repo, Actions  -> Choose Initial commit (which failed) and re-run it. After successful deployment, you can find your URL.

The above is the basic setup for JupyterLite. 
In addition, for my class:
* added necessary modules like otter-grader (and py4macro, py4etrics, wooldridge which I created for class) in requirements.txt.
* added files/folder in content

Do not forget to re-run GitHub Actions to refresh the site if you make changes.

That's it!

Hope it helps.

Tetsu HARUYAMA
Kobe Univ (in Japan)
Economics

2023年1月31日火曜日 8:09:40 UTC+9 jmcmenam...@gmail.com:

Philipp Risius

unread,
Jan 31, 2023, 2:39:33 AM1/31/23
to Teaching with Jupyter Notebooks
Hi Jonathan,

if I were to start over tomorrow, I would give myself the following advice (first on infrastructure, below on teaching contents and formats):
  • use a grading framework like nbgrader or otter-grader from the start
  • (if teaching Python) use pytest for testing
  • keep separate git repos for source materials and released materials, and optionally submodules for different aspects (e.g. lectures / live exercises / homework)
  • distribute exclusively via gitlab / github, not your school's LMS. Write tools for interacting with your school's LMS where you must, e.g. nbgrader plugins to
    • import lists of students
    • collect submitted assignments
    • export reports via pdf or mail
  • look into tools like https://github.com/jmshea/jupyterquiz or classroom polling for additional interactive content
That's more on the infrastructure side. It's kinda obvious to me now, but wasn't when I first started. On the teaching side (to enable my students to learn applicable skills effectively), I'd recommend the following:
  • keep your test cases and testing process as transparent as possible
    • test cases should be self-explanatory, and report inputs, outputs, expected values, and how they're not equal
  • break tasks down into subtasks with their own specifications and tests
    • e.g. when asking to find anagrams, first demand and test only lowercase words, then mixed-case words, then sentences with special characters
  • don't worry too much about cheesing, giving the solution away, or running untrusted code
    • while it's prudent to run server-side grading in a container, your students don't need to set up one.
    • cheesing isn't worth it. If beginner Python students can cheese, they're typically well enough equipped to solve the simple tasks you give them
    • beginner students lack the skills for extracting meaningful solutions from tests. I had one case where I basically included the solution in a tests.py file (before I transitioned to yaml test cases). Nobody noticed. While the testing process (input/output/expected) should be transparent, the code which does the testing will be opaque for beginners
  • keep it simple, give individual feedback, and explain the testing process
    • coding against tests can be frustrating for beginners. Hence tasks should be relatively straightforward, the testing process walked through in the beginning, and reporting results transparently
    • I had one TA for giving individual written feedback for (randomly) selected students. That was quite helpful.
    • this year, I started giving small, ungraded introductory tasks using the same testing framework, to be solved during live exercises (not as homework). This helps prepare for the homework and give an "angle of attack".
  • finally, you can save yourself and the students lots of frustration if you set up a jupyterhub instead of depending on everyone setting up their own environment
    • student hardware and OSes are heterogeneous. You will not be able to provide instructions that work flawlessly on all machines
    • we had quite some attrition alone from that process, despite offering individual support
    • setting up environments is an important skill, but can be taught separately, in a later portion of the class. It's frustrating for beginners if this doesn't work right from the start
Keep in mind that I'm "just" in the third iteration of this module, and both I and my students have lots of thoughts on improvements to be done. Still I hope that some of this advice is helpful, and I wish you success on your own journey :)

Philipp
Reply all
Reply to author
Forward
0 new messages