predicting numbers produces bad results

30 views
Skip to first unread message

David Touretzky

unread,
Dec 1, 2025, 3:35:53 AMDec 1
to Machine Learning for Kids
When I train a "predicting numbers" project to predict "required calories" from "weight" and "age" using the csv dataset below, the result is always a number of form 2312.4xxxxxx, no mwatter what the input values.

"weight","age","required calories"
58, 10, 2400
110, 15, 3000
70, 12, 1800
140, 16, 2200
180, 18, 2600
115, 14, 2350
150, 17, 2100
85, 13, 2050
95, 14, 2150
105, 15, 2250

Dale Lane

unread,
Dec 1, 2025, 3:50:04 AMDec 1
to Machine Learning for Kids
Sorry about that! 

I assume that something I've done is preventing the service from adequately handling such a small training set (primarily because I don't see that behaviour with the larger training sets that users normally have). 
I've got a very busy couple of days coming up, so won't be able to look at this now - but I'll pick this up on Wednesday to see what I could do to more appropriately handle tiny data sets. 

Kind regards

D

Dale Lane

unread,
Dec 3, 2025, 6:38:14 PMDec 3
to Machine Learning for Kids
I've made some changes that I think should help. I was only normalizing the inputs before - I've updated this so now I normalize both inputs and outputs, and I hope this will give better results with small datasets. 

Kind regards

D

Dave Touretzky

unread,
Dec 3, 2025, 7:12:38 PMDec 3
to Dale Lane, Machine Learning for Kids
It appears this fix wasn't tested on the little 10-example "calorie"
dataset I supplied. It's still failing spectacularly.

How are you doing the regression? What algorithm is used?

-- Dave

Dale Lane

unread,
Dec 3, 2025, 7:27:47 PMDec 3
to Machine Learning for Kids
"spectacularly" feels a little over the top as a reaction, don't you think? 

Considering I took the time to reply to you, to investigate the issue you raised, to implement a fix, and yes - to test the fix, and then reach out to you again to let you know... how likely do you really think it is that I'd do all of that if I hadn't tried it with your data? 

Did you perhaps consider that you're using a web app and try doing things like refreshing the page a few times, disabling your cache, that sort of thing? And even if you did try that, and there really is some other issue I haven't considered, there are still nicer ways to raise an issue. 

At this point, it feels a little ridiculous to have to prove that I really did try your data but given that's where we seem to be now, here you go: https://youtu.be/CU2S4rrONOU 

As for what algorithm I'm using, I literally gave you a link to the implementation in my last post, so perhaps take 30 seconds to go look? 

Sorry if my reaction seems over the top, but there are more courteous ways to ask for things. 

Dave Touretzky

unread,
Dec 4, 2025, 3:36:04 AMDec 4
to Dale Lane, Machine Learning for Kids
I apologize for offending you, Dale. I have great admiration for what
you've accomplished with this site, and am grateful that you continue to
provide this important resource.

But I still can't make sense of the model's behavior. The behavior
changed, so I believed I was getting the updated version of your code.
But just to make extra sure, I deleted my Chrome browser's cache,
deleted the model I had trained, closed the browser tab, and started
from scratch. I also tried it in another browser (Edge). I'm still
getting results I didn't expect.

For example, hold the second input (age) constant at 16, and let the
first input (weight) vary from 140 to 250. Then the result varies from
2367 to 2388. It's not perfectly linear, though. Something nonlinear
is going on. What if we go further? A weight of 300 produces 2390, and
400 produces 2391. But 500 produces a lower value, 2389, and 600
produces 2387. How about some really far out values? 6000 produces
2347, 60,000 produces 2335, and 600,000 also produces 2335.

Okay, so this is clearly not a linear model. What's it doing? Is this
some kind of radial basis function model?

I did click on the link you supplied in your reply, but it's showing me
a Github change log, and I didn't feel like digging through the source
code to try to figure out the model when I could just ask you.

Again, I'm sorry for my rude reply. We have lots of teachers using
ML4K, and I want to make sure I understand it properly and can explain
its behavior.

-- Dave

Dale Lane

unread,
Dec 4, 2025, 3:43:56 AMDec 4
to Machine Learning for Kids

Thanks for the apology, it’s appreciated.

I’m using a neural network implemented in TensorFlow.js, using sigmoid activation functions in the two hidden layers. (model definition)

As you observe, this isn’t linear regression, but I feel like it should do a reasonable job with relatively complex, non-linear patterns.

As I mentioned before, I’m normalizing inputs, and now output values as well. This does mean that I’m implicitly expecting inputs that are within the training data range. (normalization impl, normalization usage). This will have the impact that extreme inputs, not represented in the training, will be normalized to some stable value. That feels like an okay behaviour to me.

It’s obviously super challenging coming up with a single generic implementation that can generalize to a wide variety of use cases for a user group who can’t be expected to describe what they need in any technical detail. What I’ve got seems effective for the kinds of use cases people are using it for. While I don’t think linear regression would work for most use cases I’ve seen students come with, that's not to say that some of the decisions I’ve made aren't more debatable (e.g. Should I have gone with ReLU rather than sigmoid? I keep going back and forth on that - and ReLU would likely have been a better fit for your use case in particular).

Kind regards

D

Dave Touretzky

unread,
Dec 4, 2025, 9:27:49 AMDec 4
to Dale Lane, Machine Learning for Kids
Thanks for this explanation, Dale. I can't find any fault with your
design choices. A two-hidden-layer neural net should do well on a wide
range of problems if sufficient training data are supplied. But it can
also fail in surprising ways if these expectations aren't met. Making
the model more transparent would help people understand what's going on.
I have a few suggestions...

First, let me be clear about how outrageous my little training set is.
The source code indicates there are 50 units per hidden layer. The
two-input, one-output problem I'm submitting will create a network with

(I+1)*H1 + (H1+1)*H2 + (H2+1)*O = 2,751 weights

You're holding out 20% of the data for validation, so that means I'm
trying to train a model with 2,751 weights on just 8 data points.
That's nuts. Using a saturating transfer function (sigmoid) and
normalizing the inputs and outputs will keep things from going
completely off the rails, but there's no reason to expect reasonable
results on this problem, expecially since my dataset is quite noisy.

We encourage our teachers to push the limits of a model to better
understand how it works and where it breaks. I tried the simplest
possible problem: y=x, trained on the integers from 0 to 9. A linear
regression would solve this trivially, and a k-nearest neighbor
approximator (what Code.org's AI Lab uses) would also do well, but the
two layer neural net does poorly. It's not designed for this. That's
fine; people just need to recognize that their use is inappropriate.
Similarly, supplying inputs well outside the training set distribution
will not produce the extrapolation people might expect -- which they
would get with a non-saturating model. That's okay, but it would be
helpful to let them know that they're violating the model's assumptions.

So some suggestions for increasing transparency:

1. Describe the model so people know what they're getting. Even kids
these days are learning about neural networks, so telling them that the
model is "a neural network with two hidden layers with 50 units each"
helps them form a mental picture of what's inside the black box. And
their teachers, if properly trained, will be able to appreciate what
this means, and why small training sets aren't appropriate.

2. After training, show a graph of the loss function vs. epochs. This
reinforces the point that they are training a neural network using an
iterative algorithm.

3. If the user enters an input value that is more than 10% outside the
range of training values, flag the input as "out of range" so they
understand that they're asking the model to do something it wasn't
trained on. I don't think the input should be rejected; it can be
instructive to see what the model produces. But flagging it will help
them understand why the results might not be what they expect. It might
also help catch input errors, e.g., if someone enters (age,weight)
instead of (weight,age). Kids are prone to errors like this.

4. It would be really helpful if, after users train their model, they
could see the predicted output next to the desired output, for each
training input. That would immediately help them assess how well the
model is doing. One way to do this would be to add extra columns to the
display on the "Train" page. If the user then clicks on "Download CSV"
they could choose to include the predicted values as well. For
two-input, one-output problems, this would allow users to generate a
surface plot of their data, and another plot of the model's predictions,
which would be very instructive. In the old days they'd have to use
Python or MATLAB or R to generate such a plot, but today we show our
teachers how to ask ChatGPT to do it. Brave new world!

Best,

-- Dave

Dave Touretzky

unread,
Dec 4, 2025, 10:05:04 AMDec 4
to Dale Lane, Machine Learning for Kids
I just realized that another reason why the model doesn't do well on the
simple linear problem y=x with integer training points from 0 to 9 is
that the training is using early stopping based on a 20% validation set,
which means just 2 numbers. So the stopping point is going to be
somewhat random, whereas if learning ran until the training loss reached
asymptote, the model could memorize the training set and do perfectly on
at least those 8 points -- and perhaps generalize acceptably if the
model used some regularization method, such as weight decay.

Suggestions for transparency:

1. If the dataset is small or the validation set is small, warn the
user. For example, if there are fewer training instances than model
parameters, this is troubling. If the validation set size is smaller
than the size of a hidden layer, this is also troubling.

2. After training, tell the user the size of the training and validation
sets that were used.

3. Plot the validation loss on the same graph as the training loss.
Little kids will happily ignore this, but we can teach our middle school
kids what it means.

And a possible mitigation strategy: if the validation set is small,
instead of using early stopping you could let the model overfit the
training set, but use weight decay to encourage it to generalize better.

Best,

-- Dave

Dale Lane

unread,
Dec 10, 2025, 5:25:51 AMDec 10
to Machine Learning for Kids
Thanks for the suggestions - some of these are obviously significant bits of work, so I’m not sure when I will get to them. But I suppose I’ve got a Christmas break coming up, so plenty of time for side-project coding, right? :) 

Dave Touretzky

unread,
Dec 11, 2025, 5:17:03 AMDec 11
to Dale Lane, Machine Learning for Kids
I appreciate your willingness to consider these suggestions. This will
make the "predicting numbers" feature a truly awesome teaching tool.

Enjoy your Christmas holiday. I'll be doing a bunch of coding of my own
once the semester is over and I finally get some free time.
Reply all
Reply to author
Forward
0 new messages