Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Q confidence intervals for model parameters and future predictions

16 views
Skip to first unread message

Cosine

unread,
Apr 16, 2023, 6:06:50 PM4/16/23
to
Hi:

Often we want to build a model to predict the population. To do that, we need to draw a set of samples and then determine the parameters of the model in some sense, e.g., least-squares sense. Having the model, we could use it to predict future outcomes. However, as we are dealing with random variables, the obtained model parameters have uncertainty, i.e., their values would be different when we draw another set of samples to determine them. Therefore, we need to determine the confidence intervals of there parameters. Due to the same reason, the future outcome of the model also needs such a confidence interval.

We have explicit expressions for these confidence intervals when we use the linear least-squares model. The question is, how do we determine these confidence intervals when using a model other than the linear least-squares?



David Jones

unread,
Apr 17, 2023, 1:42:48 PM4/17/23
to
The question is answered by the theory of maximum likelihood. You might
find the details already worked-out for some specific models.In
particular, see https://en.wikipedia.org/wiki/Generalized_linear_model

Cosine

unread,
Apr 17, 2023, 11:45:20 PM4/17/23
to

What if we use the method of cross-validation, e.g., the k-fold method?

Then we will have k sample values for each of the parameters and the predicted value.

We could then calculate the sample mean and standard error for each of them to build the corresponding confidence interval.

However, this requires the assumption that the parameter and predicted value are normal distributions or student distributions.

Rich Ulrich

unread,
Apr 18, 2023, 12:55:09 AM4/18/23
to
On Mon, 17 Apr 2023 20:45:18 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6191021/

Here is a long article from a generally good site, discussing their
own proposal and earlier ones. They are using k-fold plus bootstrap,
and intend to remove the biases for parameter-estimates (and their
errors) inherent in the simple applications of k-fold or bootstrap.

In the early fraction of it that I read, it does mention CIs as
product.

--
Rich Ulrich

David Jones

unread,
Apr 18, 2023, 4:33:49 AM4/18/23
to
Some of the ideas here relate to the now-old idea of balanced
bootstrapping: see
https://mathweb.ucsd.edu/~ronspubs/90_09_bootstrap.pdf
for example.

I have seen early work on cross-validation for model-selection in
multiple regression where a typical suggestion was to work with
leaving-out 20% of the samples at a time, but that may relate to the
context of overall sample-size and having data that is not from
designed experiments.

But the joint questions "balance" and of "designed experiments" raises
the question of whether any of the considerations of partially-balanced
factorial designs can be employed or extended so as to provide a scheme
to provide slices of the data for treating as units in some
cross-validation or other analysis.

The OP says "However, this requires the assumption that the parameter
and predicted value are normal distributions or student distributions."
This may indicate that the plan would be to do multiple analyses on
small sections of the data, in contrast to doing multiple analyses on
nearly-complete versions of the data where only a small part is
left-out each time. The possible benefits of either approach would
depend on what is being attempted. In theory, if all the usual
assumptions apply, the best answers come from a single analysis of the
complete dataset. That one contemplates doing something else suggests
that there are worries about the assumptions: not having a fixed model
in mind, not having Gaussian random errors, or not having independence
between observations.

Rich Ulrich

unread,
Apr 18, 2023, 6:13:51 PM4/18/23
to
On Tue, 18 Apr 2023 08:33:45 -0000 (UTC), "David Jones"
<dajh...@nowherel.com> wrote:

> In theory, if all the usual
>assumptions apply, the best answers come from a single analysis of the
>complete dataset. That one contemplates doing something else suggests
>that there are worries about the assumptions: not having a fixed model
>in mind, not having Gaussian random errors, or not having independence
>between observations.

Nicely put.

"All the usual assumptions" must include having the proper
model, scales of measurement, and suitable sample.


--
Rich Ulrich


0 new messages