Let AI take your exams

12 views

Skip to first unread message

Anand S

unread,

Jun 11, 2026, 10:48:32 PMJun 11

to s-a...@googlegroups.com

At 2 pm IST today (Fri 12 Jun 2026), I'm running a workshop at Paradox, IITM - at DOMS 101.

You can join online at https://meet.google.com/cpt-faee-ucx and ask questions on chat.

Agenda:

You've been told AI can pass your exams. But what happens when you actually watch it try — live, on your questions, in real time?

This workshop starts with a collective experiment: we ask coding agents to solve real exams (including IITM exams) and see how it solves them.

What follows isn't a tutorial on prompting — it's an autopsy that reveals what your exams are actually testing, where AI confidently hallucinates, and what that means for what's worth learning.

You'll leave with a reframed understanding of your degree (the goal isn't answers, it's the ability to catch wrong ones) and a concrete study rituals that uses AI as a Socratic sparring partner rather than an answer machine.

Come with a question you got wrong recently — it's going to be useful.

Real agenda: An ask-me-anything session plus real-life experiments.

Anand S

unread,

Jun 14, 2026, 9:50:47 AMJun 14

to s-a...@googlegroups.com

At 2 pm IST today (Fri 12 Jun 2026), I conducted a workshop at Paradox, IITM - at DOMS 101.

My core message is: "AI can solve exams and help you learn. Delegate what AI can do. Learn what AI can't do instead."

Video: Watch video

My talks page for "Let AI take your exams" includes:

The full story + transcript + audio
How Codex solved a real exam, live
My collection of AI-learning techniques - which was not covered in the workshop, but is a useful reference

Here are the takeaways from the workshop:

AI is more capable than you think — and getting smarter. Recalibrate constantly what it can and can't do. Note down what it can't, because that is precisely where your value lives.
Delegate first; learn the rest. Give everything to AI. Focus your learning on what it can't yet do — that's where the value will be. It's a moving filter; revisit it every quarter.
Always use the best model, turned up high. Reserve "fast and cheap" for the ~5% of moments you need a quick answer. And remember students get serious AI free via the GitHub Student Pack and Gemini.
Make the AI ask you for context. "If you need more information, ask me." You don't have to know what context it needs — push that burden back to the machine.
Beat hallucination with a maker-checker. Two independent models that must agree cut errors from 14% to under 4%. Tell the checker to "find the errors," not to grade.
Loop with feedback in verifiable environments. Point an agent at an exam, a codebase, anything that scores itself — let it try, submit, read the result, retry. This is the most powerful technique AI has.
Calibrate, don't just trust. Practise predicting whether AI will get something right — even on topics you don't know. Watch for base-rate traps and familiar problems with one changed premise.
Be lazy, productively. Don't read AI's 20-page output — train it to give you five words. Working well with AI is a management skill.
Learn from peers. Multiple people trying things is how you discover what works. Non-transactional relationships are the rising currency of the AI era.
Apply the scientific method to everything. Form a hypothesis, hunt for evidence, try to falsify yourself. And when a system blocks you unfairly — hack it, then publish what you learned.

Here are the takeaways from how Codex solved the exam:

An agent operates the environment; a chatbot answers the question. Codex read source, ran code, clicked Check, and looped on feedback. That's why it beat copy-paste — the exam was full of affordances a chatbot can't touch.
Verifiable environments favour AI. The more checkable the exam — validators, error strings, downloadable files, a live Check button — the more it helped the agent, not the student. "AI-proof" and "feedback-rich" are opposites.
Most failures were wording, not reasoning — and the source fixed them. The fix for a brittle validator was to read the validator. Pass the error message back to the agent and it converges.
Don't guess where attempts are limited. The network game punished early guesses. The recoverable mistakes had feedback; the costly ones didn't. Triage cheap-and-certain first.
The gap between 9 and 10 was a credential, not a brain. Same model, same skill. Anand's missing mark was an invalid token. In the AI era, "can it?" often means "does it have the keys?"
It cost about a coffee. ~$2–3 of tokens for the whole exam, ~96% cached. The real cost is the human judgment to know when the agent is plausibly, confidently wrong.

Let AI take your exams

Anand S

Anand S

Original announcement