stat/base regression disappointment

31 views
Skip to first unread message

rsy...@disroot.org

unread,
May 17, 2026, 4:17:19 PM (8 days ago) May 17
to fo...@jsoftware.com
Dear list,


today I tried to use J for some statistics and wanted to
use the stat/base regression only to find out that:

-- the verb can only do regression of the y = ax + b
kind, ie, not y = ax,

-- the output of the regression has the precission
(format of the output) hardcoded (and thus not useful
in many cases, e.g., when a coefficient comes out, say,
1e-7.

It was a surprise and is a pity.

NB., one is (at least in physics) often interested in
various qualities such as the estimate of the std.
deviation of the coefficients, which is why one
reaches for a prepared verb rather than designing ones's
own (which is, of course, possible but does not really make
much sense, unless one wants to reinvent a wheel, repeatedly).


Ruda


PS.: It turned out also surprisingly difficult to get from
the jsoftware.com page to some documentation of regression,
and then the doc itself does not help -- one has to read the
source code to see what things really calculate...

Ben Gorte

unread,
May 18, 2026, 4:24:52 AM (8 days ago) May 18
to fo...@jsoftware.com
Concerning the PS, to be fair, 
------
To calculate a linear regression:

indep =. 1 2 3 5 6 NB. independent variable

dep =. 3 4 5 8 9 NB. dependent variable

dep %. (indep ,. 1) NB. calculate line of best fit

1.24419 1.56977

------

These are the a and b that make

b + a * indep

closest to dep.


From there you easily get to:

dep %. indep

1.6

(just a, no b)

To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Rudolf Sykora

unread,
May 18, 2026, 10:27:05 AM (8 days ago) May 18
to fo...@jsoftware.com
Ben Gorte <bgo...@gmail.com> wrote:
> Concerning the PS, to be fair,
> https://code.jsoftware.com/wiki/Vocabulary/percentdot#dyadic (the page
> about %.) says:
> ------
> To calculate a linear regression:
>
> indep =. 1 2 3 5 6 NB. independent variable
>
> dep =. 3 4 5 8 9 NB. dependent variable
>
> dep %. (indep ,. 1) NB. calculate line of best fit
>
> 1.24419 1.56977
>
> ------
>
> These are the a and b that make
>
> b + a * indep
>
> closest to dep.
>
>
> From there you easily get to:
>
> dep %. indep
>
> 1.6
> (just a, no b)


In my PS I spoke about the 'regression' in stat/base, not
about %., which is not that useful as soon as you also need
some estimate of the statistical error of 'a' and/or 'b'
(which is, or should be, always when analyzing data)... as I
also said.

RS

Henry Rich

unread,
May 18, 2026, 10:30:57 AM (8 days ago) May 18
to fo...@jsoftware.com
Out of curiosity, what are you modeling that leads you to discard the
intercept term?

Henry Rich

Michael Day

unread,
May 18, 2026, 11:21:41 AM (8 days ago) May 18
to fo...@jsoftware.com
I haven't thought this through thoroughly,  but it looks as if you 
can force a regression with no constant term by changing 
the first line of fn "regression" in multivariate.ijs,  v=. 1,.x   , to    v  =. x
I've called my amended fn "regression0".  
Applied to Ben's data,  variables "indep" and "dep",  we get: 
   ,.(,.indep) (regression; regression0) dep   NB. compare the resulting tabulations:
┌─────────────────────────────────────────────────────────┐
│             Var.       Coeff.         S.E.           t  │
│              0        1.56977        0.22517        6.97│
│              1        1.24419        0.05814       21.40│
│                                                         │
│  Source     D.F.        S.S.          M.S.           F  │
│Regression    1       26.62558       26.62558      457.96│
│Error         3        0.17442        0.05814            │
│Total         4       26.80000                           │
│                                                         │
│S.E. of estimate         0.24112                         │
│Corr. coeff. squared     0.99349                         │
├─────────────────────────────────────────────────────────┤
│             Var.       Coeff.         S.E.           t  │
│              0        1.60000        0.10000       16.00│
│                                                         │
│  Source     D.F.        S.S.          M.S.           F  │
│Regression    0       23.80000              _           _│
│Error         4        3.00000        0.75000            │
│Total         4       26.80000                           │
│                                                         │
│S.E. of estimate         0.86603                         │
│Corr. coeff. squared     0.88806                         │
└─────────────────────────────────────────────────────────┘
   
Yes,  these results are the same as Ben's, but with error estimates 
as well.  The 2nd result is what Rudolf appears to require,  also with 
error estimates,  presumably appropriate to a model 
   Y = b X
which I suppose is acceptable if you're sure that no constant term is 
involved.  

Personally,  I'd be inclined to test the hypothesis that a is zero in 
Y = a + B X 
by examining the results of the original version of the regression 
function,   checking that a's estimate is not significantly different 
from zero.


Mike


Virus-free.www.avast.com

Rudolf Sykora

unread,
May 25, 2026, 2:03:42 PM (13 hours ago) May 25
to fo...@jsoftware.com
Dear Michael,


Michael Day <mike_l...@tiscali.co.uk> wrote:
> I haven't thought this through thoroughly,  but it looks as if you
> can force a regression with no constant term by changing
> the first line of fn "regression" in multivariate.ijs,  v=. 1,.x   , to 
>   v  =. x

thanks, that seems to be a way. I hacked up some lines to be
able to switch to zero-b code, essentially using some
global-variable switcher. But I bumped into the mentioned
'formatting part', and that is that in the code there
are lines like (line 99 at
https://github.com/jsoftware/stats_base/blob/master/multivariate.ijs
)

r=. r, 15 15j5 15j5 12j2 ": (i.>:k),.b,.seb,.b%seb

which stop to be useful when numbers get small and 15j5
format effectively makes 0 out of them (15j5 ": 1e_6 ==>
0.00000). Just changing 15j5 to 15j_5 is not quite enough,
because such a change also (!) changes justification from
right to left, and suddenly the justification is spoiled.
I then have no good idea as to how to change the line
above: I need to insert some space between the first
column (which is right-justified) and the second (which
is now left-justified), but I guess this is not feasible
with just ":.

Thanks.


Ruda

Henry Rich

unread,
May 25, 2026, 2:06:31 PM (13 hours ago) May 25
to fo...@jsoftware.com
consider

   load 'format/printf'

What are you modeling that makes you want to remove the constant term?

Henry Rich

Michael Day

unread,
May 25, 2026, 3:57:48 PM (11 hours ago) May 25
to fo...@jsoftware.com
Thanks

First,  I think it's worth repeating that I, and Henry,  and others
perhaps,
have commented that you should at least consider the possibility of a
non-sero constant term.
Second,  perhaps we've all ignored or overlooked your gripe about the
stats addon's presentation of the output.   You don't have to use the
regression function as is;  one of J's (and APL's etc) great strengths
is that
it's so easy to bend functions to one's own wishes.   Those 15j5s etc are
only there for your convenience.  You can capture the output of the
library routine in various ways,  perhaps by merely inserting some global
results,  using eg R =: r to catch a function's local variable a and
examine
it at your leisure.    The library function is just a useful tool;  you
can always
write your own.

Also,  if your RESULT is a value for a in Y = a X,   and you can't see
its value
in the function's display because a is of order 10^_15 ,  then just change
your units (if it's Physics or Applied Maths) from amps (say) to
femto-amps.
(You haven't shown us any example data!)

If you're talking about vanishing values for t values,  standard errors
or the
like,  you should either be grateful for a very good model,   or extremely
doubtful about the validity of your data!

Hope that helps,

Mike
> To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.
>

--
This email has been checked for viruses by Avast antivirus software.
www.avast.com

Rudolf Sykora

unread,
May 25, 2026, 7:18:32 PM (8 hours ago) May 25
to fo...@jsoftware.com
Michael Day <mike_l...@tiscali.co.uk> wrote:
>
> First,  I think it's worth repeating that I, and Henry,  and
> others perhaps, have commented that you should at least
> consider the possibility of a non-sero constant term.

I am aware of that and I did so. I am even aware that Henry
asked the curiosity question before (and repeated it today).
I just have not answered it yet.


> Second,  perhaps we've all ignored or overlooked your gripe
> about the stats addon's presentation of the output.

Yes, I noticed that nobody reacted.


> You don't have to use the regression function as is;  one
> of J's (and APL's etc) great strengths is that it's so easy
> to bend functions to one's own wishes.   Those 15j5s etc are
> only there for your convenience. 

The only thing that you probably really have to (so far) is
to die. All the rest usually has some alternatives (unless
super-determinism rules the world), one is, indeed, to read
the source and do something with is.


> Also,  if your RESULT is a value for a in Y = a X,   and you
> can't see its value in the function's display because a is
> of order 10^_15 ,  then just change your units (if it's
> Physics or Applied Maths) from amps (say) to femto-amps.

That's exactly what I did when I saw the thing happening.


> If you're talking about vanishing values for t values, 
> standard errors or the like, 

No.


It unfortunately takes time to get some wisdom. Had I known
what I know today, say, I would not have used linux (not to
speak of M$oft) before I switched to OpenBSD/Plan9, I would
not have spent time trying to use emacs and vim instead of
just nvi/acme/sam/ed, and I would have not spent so much
time with J, either. Sometimes, I do not wish to program
everything myself, but want to use something elegant,
ready-to-be-used and solid (as well as libre; I know I want
a lot).

Regarding the question about forcing b=0... If you have a
machine that is known to produce data 'y' upon inserting 'x'
and it just does so by multiplying the x by a to-be-found
constant followed by blurring the result with some noise of
a given and x-independent variance, then the task of
estimating the unknown constant, in my opinion, leads to
enforcement of b=0. (And although ax + b could lead to a
better fit of the data, it probably would not give you what
you desire.)

More towards 'real' physics, say you want to calibrate a
force gauge (imagine one hanging from a support). Its
starting zero may vary depending on some pre-load (you hang
a mass on it, let it stabilize, and take this state as your
reference zero (ie, you call this moment a zero by
definition and you precisely align the zero of your scale
beside the gauge). Then you hang extra (and very-well
defined, call them calibration) masses and note down the
prolongation; but be prepared: the scale is intentionally
fairly rough so at some moments it is difficult to decide
what the number really is... And that's it. Again, with a
limited number of points, it seems better to enforce b=0.

NB. The two previous paragraphs are related.
NB. My thing was a bit more specialized, but similar
in its nature.

... So I am truly convinced that there are cases when b=0
enforcement makes sense. However, this is secondary to the
fact that regression in stats/base is just not ready to be
used by the 'general public', imho.


Best regards,
RS

Reply all
Reply to author
Forward
0 new messages