Dave Touretzky
unread,Dec 4, 2025, 9:27:49 AMDec 4Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Dale Lane, Machine Learning for Kids
Thanks for this explanation, Dale. I can't find any fault with your
design choices. A two-hidden-layer neural net should do well on a wide
range of problems if sufficient training data are supplied. But it can
also fail in surprising ways if these expectations aren't met. Making
the model more transparent would help people understand what's going on.
I have a few suggestions...
First, let me be clear about how outrageous my little training set is.
The source code indicates there are 50 units per hidden layer. The
two-input, one-output problem I'm submitting will create a network with
(I+1)*H1 + (H1+1)*H2 + (H2+1)*O = 2,751 weights
You're holding out 20% of the data for validation, so that means I'm
trying to train a model with 2,751 weights on just 8 data points.
That's nuts. Using a saturating transfer function (sigmoid) and
normalizing the inputs and outputs will keep things from going
completely off the rails, but there's no reason to expect reasonable
results on this problem, expecially since my dataset is quite noisy.
We encourage our teachers to push the limits of a model to better
understand how it works and where it breaks. I tried the simplest
possible problem: y=x, trained on the integers from 0 to 9. A linear
regression would solve this trivially, and a k-nearest neighbor
approximator (what Code.org's AI Lab uses) would also do well, but the
two layer neural net does poorly. It's not designed for this. That's
fine; people just need to recognize that their use is inappropriate.
Similarly, supplying inputs well outside the training set distribution
will not produce the extrapolation people might expect -- which they
would get with a non-saturating model. That's okay, but it would be
helpful to let them know that they're violating the model's assumptions.
So some suggestions for increasing transparency:
1. Describe the model so people know what they're getting. Even kids
these days are learning about neural networks, so telling them that the
model is "a neural network with two hidden layers with 50 units each"
helps them form a mental picture of what's inside the black box. And
their teachers, if properly trained, will be able to appreciate what
this means, and why small training sets aren't appropriate.
2. After training, show a graph of the loss function vs. epochs. This
reinforces the point that they are training a neural network using an
iterative algorithm.
3. If the user enters an input value that is more than 10% outside the
range of training values, flag the input as "out of range" so they
understand that they're asking the model to do something it wasn't
trained on. I don't think the input should be rejected; it can be
instructive to see what the model produces. But flagging it will help
them understand why the results might not be what they expect. It might
also help catch input errors, e.g., if someone enters (age,weight)
instead of (weight,age). Kids are prone to errors like this.
4. It would be really helpful if, after users train their model, they
could see the predicted output next to the desired output, for each
training input. That would immediately help them assess how well the
model is doing. One way to do this would be to add extra columns to the
display on the "Train" page. If the user then clicks on "Download CSV"
they could choose to include the predicted values as well. For
two-input, one-output problems, this would allow users to generate a
surface plot of their data, and another plot of the model's predictions,
which would be very instructive. In the old days they'd have to use
Python or MATLAB or R to generate such a plot, but today we show our
teachers how to ask ChatGPT to do it. Brave new world!
Best,
-- Dave