Performance regression in J9.7

28 views
Skip to first unread message

Remington Furman

unread,
Sep 5, 2025, 1:21:14 AMSep 5
to fo...@jsoftware.com
I wrote the attached script to sonify tide simulations using NOAA data.  It generates 0.1 seconds of audio .wav files for each of the 1274 NOAA tide stations.  More info at https://remcycles.net/blog/tides.html.
I noticed a rather large performance regression between J9.6 and J9.7:
Single threaded 9.6: 0m3.980s
Single threaded 9.7: 3m42.671s
Seven threads 9.6: 0m1.444s
Seven threads 9.7: 0m45.124s
I haven't yet determined the source of the slow down, but will hopefully have time for that this weekend.
A useful tip for the t. conjunction: the verb ]&.> waits for every pyx to be populated.
Without it, J9.6 crashes because threads are still running during an exit. J9.7 exits gracefully even if threads aren't finished.  Nice!
Thanks,
Remington
tides.ijs
harcon_amp.csv.gz
cpuinfo.txt

bill lam

unread,
Sep 5, 2025, 2:30:47 AMSep 5
to fo...@jsoftware.com
the csv addons failed on my computer. so I just read data using

dat=: dltb&.> ','&cut;._2 fread 'harcon_amp.csv'
and tested timing by

  tm=. 6!:1''                    
  if. 1 do.
    NB. Single threaded.        
    DUR_SAMP (synth_station >)"0 stations
  else.
    NB. Multi-threaded.
    NB. Initialize threadpool 0 with N-1 threads.
    {{0 T.0}}^:] <: {. 8 T. ''
    ]&.> DUR_SAMP (synth_station >)t.''"0 stations
    NB. The above ]&.> waits for every pyx to be populated, but is
    NB. otherwise a no-op.
    NB. J9.6 will crash if exit is called and tasks are still running.
    NB. J9.7 won't crash, but also doesn't wait for unfinished threads.
  end.                          
  echo tm -~ 6!:1''

J9.7
Single threaded version  : 2.5s
7 threaded version : 0.9s

I didn't run test for j9.6

BTW your script doesn't  ensure the subfolder named tides is there.




To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Remington Furman

unread,
Sep 7, 2025, 3:24:40 PMSep 7
to fo...@jsoftware.com

Oops, I sent my first response to Bill only.

My initial suspicion was wrong.  I tracked the performance change down to 1&o. :

tm=: 6!:1''
1 o. i.1000*1000
echo tm-~6!:1''

j9.6: 0.0129981
j9.7: 2.32261

Almost 200 times slower.

I don't see anything about changes to o. in the release notes.  Haven't checked git.

I don't know if AVX is an issue here, but my CPU does not have AVX2.  The installer page says "The initial installation from zips is non-AVX and finishing the installation steps upgrades to AVX2 as appropriate for your hardware."

I'm not sure what installation steps it's referring to, but I'll note that I get the same performance before and after installing all packages:

load 'pacman'
'install' jpkg '*'

Please let me know if there's any other info I can provide to help debug this issue.

Thanks,

Remington

On 9/5/25 11:07 AM, Remington Furman wrote:

Thanks for testing, Bill.

Seems like you get okay performance with J9.7.

I forgot to say I'm running Manjaro Linux, kernel version 6.1.135.

I suspect it's related to file system code, but won't have time to profile until Sunday.  The 6!:1 session time trick will be helpful.

Thanks for the simplified parsing code.  I didn't know about dltb, etc, yet.

BTW your script doesn't  ensure the subfolder named tides is there.
Sorry.  I usually orchestrate that kind of thing from a Makefile.

-Remington

Gilles Kirouac

unread,
Sep 7, 2025, 3:59:08 PMSep 7
to fo...@jsoftware.com
Remington

I tried your expressions in jconsole on Win10 with a non-AVX2 processor.

My timings are very very similar. J9.7 about 3% faster on the fifth exec.

~ Gilles

Henry Rich

unread,
Sep 7, 2025, 4:22:40 PMSep 7
to forum
It sounds like you've found the problem. We ship 3 builds: one with no extra instruction support, one for 'old' hardware, one for 'new'hardware. We have made AVX2 the old version now, so you can't get only original AVX. 

Treat yourself to a new computer.  AVX2 is 12 years old. 

Henry Rich

Remington Furman

unread,
Sep 7, 2025, 10:37:21 PMSep 7
to fo...@jsoftware.com

Thank you, fair enough.  I think your explanation here is more clear than the release notes and install page.

I do have a newer desktop collecting dust because I haven't bothered to make it my primary machine yet.  Now I have another reason, but in the meantime I'll continue to treat myself to J9.6.  :)

-Remington

bill lam

unread,
Sep 7, 2025, 11:14:17 PMSep 7
to fo...@jsoftware.com
I checked J9.7 switched from Sleef 3 to Sleef 4. And the performance of non avx degraded dramatically. I suspect it tries to emulate hardware FMA by software. If the speed of trigonometric and exponential functions is critical, I suggest you upgrade to avx2 or arm64 machines.

Remington Furman

unread,
Sep 7, 2025, 11:31:18 PMSep 7
to fo...@jsoftware.com

Thanks Bill.  I'm primarily interested in writing signal processing code in J, so trig and exp performance is important to me.

I'll be writing a blog post about this script soon, and plan to include a slide or two with this code at a signal processing conference next month.  So upgrading is on the back burner for now.

-Remington

bill lam

unread,
Sep 8, 2025, 12:10:46 AMSep 8
to fo...@jsoftware.com
I disable SLEEF for all non-avx x86_64 platforms, and the performace of 1 o. becomes reasonable, about 20% slower than that in J9.6.
Thank you for the report. Fix will be in the next beta,

Henry Rich

unread,
Sep 8, 2025, 10:32:37 AMSep 8
to fo...@jsoftware.com
Thanks, Bill.  Perhaps SLEEF tries to be bit-perfect across platforms, and slowed down in emulation.  Given that kind of speed loss you are right to abandon cross-platform perfection.

Henry Rich

Remington Furman

unread,
Sep 15, 2025, 10:23:28 AM (11 days ago) Sep 15
to fo...@jsoftware.com

Thank you both.  I tried it with j9.7.0-beta8 and with SLEEF disabled the performance is much closer to j9.6.  About a 163% increase in runtime for the trig portions of my script, but overall only a 27% increase for my whole script.

Remington

Reply all
Reply to author
Forward
0 new messages