stat/base regression disappointment

rsy...@disroot.org

unread,

May 17, 2026, 4:17:19 PMMay 17

to fo...@jsoftware.com

Dear list,

today I tried to use J for some statistics and wanted to
use the stat/base regression only to find out that:

-- the verb can only do regression of the y = ax + b
kind, ie, not y = ax,

-- the output of the regression has the precission
(format of the output) hardcoded (and thus not useful
in many cases, e.g., when a coefficient comes out, say,
1e-7.

It was a surprise and is a pity.

NB., one is (at least in physics) often interested in
various qualities such as the estimate of the std.
deviation of the coefficients, which is why one
reaches for a prepared verb rather than designing ones's
own (which is, of course, possible but does not really make
much sense, unless one wants to reinvent a wheel, repeatedly).

Ruda

PS.: It turned out also surprisingly difficult to get from
the jsoftware.com page to some documentation of regression,
and then the doc itself does not help -- one has to read the
source code to see what things really calculate...

Ben Gorte

unread,

May 18, 2026, 4:24:52 AMMay 18

to fo...@jsoftware.com

Concerning the PS, to be fair,

https://code.jsoftware.com/wiki/Vocabulary/percentdot#dyadic (the page about %.) says:

------

To calculate a linear regression:

indep =. 1 2 3 5 6 NB. independent variable

dep =. 3 4 5 8 9 NB. dependent variable

dep %. (indep ,. 1) NB. calculate line of best fit

1.24419 1.56977

------

These are the a and b that make

b + a * indep

closest to dep.

From there you easily get to:

dep %. indep

1.6

(just a, no b)

To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Rudolf Sykora

unread,

May 18, 2026, 10:27:05 AMMay 18

to fo...@jsoftware.com

Ben Gorte <bgo...@gmail.com> wrote:
> Concerning the PS, to be fair,
> https://code.jsoftware.com/wiki/Vocabulary/percentdot#dyadic (the page
> about %.) says:
> ------
> To calculate a linear regression:
>
> indep =. 1 2 3 5 6 NB. independent variable
>
> dep =. 3 4 5 8 9 NB. dependent variable
>
> dep %. (indep ,. 1) NB. calculate line of best fit
>
> 1.24419 1.56977
>
> ------
>
> These are the a and b that make
>
> b + a * indep
>
> closest to dep.
>
>
> From there you easily get to:
>
> dep %. indep
>
> 1.6
> (just a, no b)

In my PS I spoke about the 'regression' in stat/base, not
about %., which is not that useful as soon as you also need
some estimate of the statistical error of 'a' and/or 'b'
(which is, or should be, always when analyzing data)... as I
also said.

RS

Henry Rich

unread,

May 18, 2026, 10:30:57 AMMay 18

to fo...@jsoftware.com

Out of curiosity, what are you modeling that leads you to discard the
intercept term?

Henry Rich

Michael Day

unread,

May 18, 2026, 11:21:41 AMMay 18

to fo...@jsoftware.com

I haven't thought this through thoroughly, but it looks as if you
can force a regression with no constant term by changing
the first line of fn "regression" in multivariate.ijs, v=. 1,.x , to v =. x
I've called my amended fn "regression0".
Applied to Ben's data, variables "indep" and "dep", we get:
,.(,.indep) (regression; regression0) dep NB. compare the resulting tabulations:
┌─────────────────────────────────────────────────────────┐
│ Var. Coeff. S.E. t │
│ 0 1.56977 0.22517 6.97│
│ 1 1.24419 0.05814 21.40│
│ │
│ Source D.F. S.S. M.S. F │
│Regression 1 26.62558 26.62558 457.96│
│Error 3 0.17442 0.05814 │
│Total 4 26.80000 │
│ │
│S.E. of estimate 0.24112 │
│Corr. coeff. squared 0.99349 │
├─────────────────────────────────────────────────────────┤
│ Var. Coeff. S.E. t │
│ 0 1.60000 0.10000 16.00│
│ │
│ Source D.F. S.S. M.S. F │
│Regression 0 23.80000 _ _│
│Error 4 3.00000 0.75000 │
│Total 4 26.80000 │
│ │
│S.E. of estimate 0.86603 │
│Corr. coeff. squared 0.88806 │
└─────────────────────────────────────────────────────────┘

Yes, these results are the same as Ben's, but with error estimates
as well. The 2nd result is what Rudolf appears to require, also with
error estimates, presumably appropriate to a model
Y = b X
which I suppose is acceptable if you're sure that no constant term is
involved.

Personally, I'd be inclined to test the hypothesis that a is zero in
Y = a + B X
by examining the results of the original version of the regression
function, checking that a's estimate is not significantly different
from zero.

Mike

Virus-free.www.avast.com

Rudolf Sykora

unread,

May 25, 2026, 2:03:42 PMMay 25

to fo...@jsoftware.com

Dear Michael,

Michael Day <mike_l...@tiscali.co.uk> wrote:
> I haven't thought this through thoroughly, but it looks as if you
> can force a regression with no constant term by changing
> the first line of fn "regression" in multivariate.ijs, v=. 1,.x , to
> v =. x

thanks, that seems to be a way. I hacked up some lines to be
able to switch to zero-b code, essentially using some
global-variable switcher. But I bumped into the mentioned
'formatting part', and that is that in the code there
are lines like (line 99 at
https://github.com/jsoftware/stats_base/blob/master/multivariate.ijs
)

r=. r, 15 15j5 15j5 12j2 ": (i.>:k),.b,.seb,.b%seb

which stop to be useful when numbers get small and 15j5
format effectively makes 0 out of them (15j5 ": 1e_6 ==>
0.00000). Just changing 15j5 to 15j_5 is not quite enough,
because such a change also (!) changes justification from
right to left, and suddenly the justification is spoiled.
I then have no good idea as to how to change the line
above: I need to insert some space between the first
column (which is right-justified) and the second (which
is now left-justified), but I guess this is not feasible
with just ":.

Thanks.

Ruda

Henry Rich

unread,

May 25, 2026, 2:06:31 PMMay 25

to fo...@jsoftware.com

consider

load 'format/printf'

What are you modeling that makes you want to remove the constant term?

Henry Rich

Michael Day

unread,

May 25, 2026, 3:57:48 PMMay 25

to fo...@jsoftware.com

Thanks

First, I think it's worth repeating that I, and Henry, and others
perhaps,
have commented that you should at least consider the possibility of a
non-sero constant term.
Second, perhaps we've all ignored or overlooked your gripe about the
stats addon's presentation of the output. You don't have to use the
regression function as is; one of J's (and APL's etc) great strengths
is that
it's so easy to bend functions to one's own wishes. Those 15j5s etc are
only there for your convenience. You can capture the output of the
library routine in various ways, perhaps by merely inserting some global
results, using eg R =: r to catch a function's local variable a and
examine
it at your leisure. The library function is just a useful tool; you
can always
write your own.

Also, if your RESULT is a value for a in Y = a X, and you can't see
its value
in the function's display because a is of order 10^_15 , then just change
your units (if it's Physics or Applied Maths) from amps (say) to
femto-amps.
(You haven't shown us any example data!)

If you're talking about vanishing values for t values, standard errors
or the
like, you should either be grateful for a very good model, or extremely
doubtful about the validity of your data!

Hope that helps,

Mike

> To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.
>

--
This email has been checked for viruses by Avast antivirus software.
www.avast.com

Rudolf Sykora

unread,

May 25, 2026, 7:18:32 PMMay 25

to fo...@jsoftware.com

Michael Day <mike_l...@tiscali.co.uk> wrote:
>
> First, I think it's worth repeating that I, and Henry, and
> others perhaps, have commented that you should at least
> consider the possibility of a non-sero constant term.

I am aware of that and I did so. I am even aware that Henry
asked the curiosity question before (and repeated it today).
I just have not answered it yet.

> Second, perhaps we've all ignored or overlooked your gripe
> about the stats addon's presentation of the output.

Yes, I noticed that nobody reacted.

> You don't have to use the regression function as is; one
> of J's (and APL's etc) great strengths is that it's so easy
> to bend functions to one's own wishes. Those 15j5s etc are
> only there for your convenience.

The only thing that you probably really have to (so far) is
to die. All the rest usually has some alternatives (unless
super-determinism rules the world), one is, indeed, to read
the source and do something with is.

> Also, if your RESULT is a value for a in Y = a X, and you
> can't see its value in the function's display because a is
> of order 10^_15 , then just change your units (if it's
> Physics or Applied Maths) from amps (say) to femto-amps.

That's exactly what I did when I saw the thing happening.

> If you're talking about vanishing values for t values,
> standard errors or the like,

No.

It unfortunately takes time to get some wisdom. Had I known
what I know today, say, I would not have used linux (not to
speak of M$oft) before I switched to OpenBSD/Plan9, I would
not have spent time trying to use emacs and vim instead of
just nvi/acme/sam/ed, and I would have not spent so much
time with J, either. Sometimes, I do not wish to program
everything myself, but want to use something elegant,
ready-to-be-used and solid (as well as libre; I know I want
a lot).

Regarding the question about forcing b=0... If you have a
machine that is known to produce data 'y' upon inserting 'x'
and it just does so by multiplying the x by a to-be-found
constant followed by blurring the result with some noise of
a given and x-independent variance, then the task of
estimating the unknown constant, in my opinion, leads to
enforcement of b=0. (And although ax + b could lead to a
better fit of the data, it probably would not give you what
you desire.)

More towards 'real' physics, say you want to calibrate a
force gauge (imagine one hanging from a support). Its
starting zero may vary depending on some pre-load (you hang
a mass on it, let it stabilize, and take this state as your
reference zero (ie, you call this moment a zero by
definition and you precisely align the zero of your scale
beside the gauge). Then you hang extra (and very-well
defined, call them calibration) masses and note down the
prolongation; but be prepared: the scale is intentionally
fairly rough so at some moments it is difficult to decide
what the number really is... And that's it. Again, with a
limited number of points, it seems better to enforce b=0.

NB. The two previous paragraphs are related.
NB. My thing was a bit more specialized, but similar
in its nature.

... So I am truly convinced that there are cases when b=0
enforcement makes sense. However, this is secondary to the
fact that regression in stats/base is just not ready to be
used by the 'general public', imho.

Best regards,
RS

Reply all

Reply to author

Forward