Problem with nub ..

17 views
Skip to first unread message

James Patrick Harrington

unread,
Aug 23, 2025, 10:42:23 AMAug 23
to fo...@jsoftware.com
I'm having a problem with ~. I am importing data from a huge fortran program called synspec which generates the observable spectrum from a model of a star's atmosphere of a given temperature. It generates the emergent intensity at a huge list of wavelengths: between 1200A and 4000A in the ultraviolet, the wavelength list w is $w1 = 132160 values. The spacing of these points is not regular, it is determined by the synspec program based on the wavelength of the various atomic spectral lines. The issue is that if you run the program for a different temperature, the list of wavelengths, say w2, will be slightly different (new lines appear, some disappear). But I need to combine data from 90 different atmospheres, so I need to interpolate this data to a common wavelength list. So I can ravel all the lists: ww=. w1,w2,w3, ... ,w90 and then sort the bunch: www=. /:~ ww. But mostly, the lists have wavelengths in common, so we then use nub to make a master list: W=. ~. www This list is quite a bit longer, $W = 209941 points. Now every point in w1 or w2 or w3 ,etc. should be in the master list W. I can then extend w1 by interpolating the values of the parameters at the wavelengths of W that are not in w1. Doing this with each w1, w2, w3, etc. puts all the lists on the same list W and I can then combine their parameters. So what's the problem?

It turns out that the master list W has values that are considered unique in the whole list:
W = ~. W but a substring behaves thus: let WW=. (i. 10000){ W so that $WW --> 10000. But
if U=. ~. WW then we find $U --> 9904. It appears that 96 values are not unique! This effect appears early in the list: if WW=. (i, 120) {W so $WW -->120, still $ ~. WW --->119. This must be
some sort of comparison tolerance issue.

Has anyone encountered a similar problem? The wavelength lists w1, w2, Were written to a file using ((1) 3!:1 [) 1!:2 [: < ] and read back with (1) 3!:1^:_1 [: 1!:1 < The numbers look like this example of the beginning of list w1

1200 1200.01 1200.01 1200.02 1200.02 1200.02 1200.02 1200.03

but displaying more digits with ( ":!.16) shows they are distinct
1200 1200.006 1200.011 1200.015 1200.018 1200.02 1200.022 1200.025

and still more with (":!.20) shows

1200 1200.0060000000000855 1200.0109999999999673 1200.0150000000001 1200.0180000000000291 ...

So where does ~. draw the same/different line? And why is a substring different than the whole list?

This is a real problem because any interpolating program divides by the space between adjacent numbers in the list, and if that number -->0 we crash.

Thanks for your patience if you only read this :-)

Patrick




robert therriault

unread,
Aug 23, 2025, 12:18:09 PMAug 23
to fo...@jsoftware.com
Hi James Patrick,

I think that you can use Fit (!.) to set the comparison tolerance for Nub (~.) https://code.jsoftware.com/wiki/Vocabulary/bangdot

If you set the tolerance to 0 ie. (~.!.0) then each value should be unique.

Cheers, bob

Henry Rich

unread,
Aug 23, 2025, 12:21:15 PMAug 23
to fo...@jsoftware.com
You are saying that

   (-: ~.) W
1
   (-: ~.) 1000000 {. W
0

That has to be a bug.  Please send me W.

Henry Rich
To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

James Patrick Harrington

unread,
Aug 23, 2025, 1:55:21 PMAug 23
to fo...@jsoftware.com
Yes, that is the behavior I was seeing last night. However, in a fresh session today, I cannot reproduce that behavior. It was not a one-off thing, but so clearly wrong that I repeated it several times. One problem with this program is that the files (each wavelength is linked to several multidimensional arrays) become so large that it is easy to get "out of memory" messages. But I now have a clue -- although the master list has been culled with nub, the individual lists (w1, w2, etc.) are later read in again and not treated with ~.  So that could be why the interpolation code crashes.   
If this should reappear, I'll preserve the files so I can document it properly.
Thanks,  Patrick

bill lam

unread,
Aug 24, 2025, 7:44:37 PMAug 24
to fo...@jsoftware.com
I suspect ~. on floating point numbers is implementation dependent because  comparison is non-transitive 
a=b and b=c doesn't imply a=c

You may clean up data to precision ,eg. 1e_10 so that comparison becomes transitive.

Henry Rich

unread,
Aug 24, 2025, 8:04:45 PMAug 24
to fo...@jsoftware.com
The result of ~. is specified in the definition of the verb.  Moreover, it is vital that the result of ~. match the result of ([/..~ y) and ({./.~ y).  That leaves little leeway in the implementation.

It is true that the result of ~. depends on the floating-point architecture, but all the machines we support today follow the IEEE754 standard.

The idea of adjusting the values to floating-point buckets is interesting.  That could be done pretty easily using bit operations on the floating-point representations.  JE does something like that in the implementation of (~. y).

hhr
Reply all
Reply to author
Forward
0 new messages