![]() | |
-rw-r--r-- 1 501 20 1067 Mar 16 01:24 LICENSE
-rw-r--r-- 1 501 20 3041 Mar 16 01:24 README.md
-rwxr-xr-x 1 501 20 1023 Mar 16 01:24 download.sh
-rw-r--r-- 1 501 20 1898 Mar 23 14:06 encoder.ijs
-rw-r--r-- 1 501 20 2270 Mar 16 01:24 gpt2.ijs
drwxr-xr-x 6 501 20 192 Mar 16 01:36 models
-rw-r--r-- 1 501 20 1757 Mar 16 01:24 utils.ijs
./models:
total 2936
drwxr-xr-x 4 501 20 128 Mar 16 01:25 124M
drwxr-xr-x 4 501 20 128 Mar 16 01:36 1558M
-rw-r--r-- 1 501 20 456318 Mar 16 01:25 merges.txt
-rw-r--r-- 1 501 20 1042301 Mar 16 01:25 vocab.json
./models/124M:
total 1070528
-rw-r--r-- 1 501 20 665 Mar 16 01:25 config.json
-rw-r--r-- 1 501 20 548105171 Mar 16 01:30 model.safetensors
./models/1558M:
total 12562176
-rw-r--r-- 1 501 20 689 Mar 16 01:36 config.json
-rw-r--r-- 1 501 20 6431829964 Mar 16 01:58 model.safetensors
The models get stored in a subdirectory that specifies its name.
Now I can just follow the first example:
j9.6src/jsource/jlibrary/bin/jconsole gpt2.ijs
Loading tokenizer...
Reading merges.txt
Reading vocab.json
Processing vocab
Building lookup verbs
Done.
Load (or switch) model with `model 'SIZE'` (124M, 355M, 774M, 1558M)
Then generate with [tokens to gen (default: 40)] gen 'PROMPT'
model '1558M'
Loading model: 1558M
Processing header
Reading data
Done.
gen 'Alan Turing theorized that computers would one day become'
so powerful that they would be able to think like humans.
In the 1950s, he proposed a way to build a computer that could think like a human. He called it the "T so powerful that they would be able to think like humans.
In the 1950s, he proposed a way to build a computer that could think like a human. He called it the “T
This was just an informational posting. I figured there may be others that would like to see a simple LLM in J.
Thanks for pointing me to this interesting experiment, and for the write up.
I'm very much interested in this, and will give it a spin on an AVX2 computer soon, and let you know how it goes.
Jan-Pieter.
To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.
encode =: {{ ; {{vocab_i bpe cs {~ (bs{a.)&i. y}} each pat rxall utf8 y }}
I have attached the whole file if you prefer
Tom McGuire
darwin/j64arm*) # darwin arm
TARGET=libj.dylib
CFLAGS="$common $macmin -march=armv8-a+crc -mno-outline-atomics -DC_CRC32C=1 -DSYSTEM_BLAS=1"
LDFLAGS=" -dynamiclib -install_name libj.dylib -lm -ldl $LDOPENMP $LDTHREAD $macmin -framework Accelerate"
OBJS_AESARM=" aes-arm.o "
SRC_ASM="${SRC_ASM_IOS}"
GASM_FLAGS="$macmin"
FLAGS_SLEEF=" -DENABLE_ADVSIMD "
FLAGS_BASE64=" -DHAVE_NEON64=1 "
;;
To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.
On Mar 24, 2025, at 2:24 PM, More Rice <mrmor...@gmail.com> wrote:> But with Elijah Stone’s patch to run> the matrix multiplication with the M2> accelerate framework so the multiplication> takes place with Apple’s specialized AMX> instructions.How may I apply this patch? (I’m on M4. I wanted to take a look at J’s Apple silicon optimization in general.)
ThanksSent from mobile
To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.
On Mar 24, 2025, at 2:24 PM, More Rice <mrmor...@gmail.com> wrote:
> But with Elijah Stone’s patch to run> the matrix multiplication with the M2> accelerate framework so the multiplication> takes place with Apple’s specialized AMX> instructions.How may I apply this patch? (I’m on M4. I wanted to take a look at J’s Apple silicon optimization in general.)
ThanksSent from mobile
a=. ?1e3 2e3$0
b=. ?2e3 3e3$0
100 timex 'a +/ . * b'
0.355397
NB. number of cores,maxthreads on my M2 macbook
8 T. ''
12 63
NB. spin up N-1 threads in threadpool 0
{{0 T.0}}^:] <: {. 8 T. ''
11
a=. ?1e3 2e3$0
b=. ?2e3 3e3$0
100 timex 'a +/ . * b'
0.361707
Just so you don’t think jqt is the problem here is a jconsole version:
$ ijconsole
a=. ?1e3 2e3$0
b=. ?2e3 3e3$0
100 timex 'a +/ . * b'
0.360285
{{0 T.0}}^:] <: {. 8 T. ''
11
100 timex 'a +/ . * b'
0.365402
I believe a while ago I tested this on an Intel Mac and saw a speed up with using threads in this example.
Just for completeness here is Bill Lam’s benchmark run on my machine:
j9.7.0-beta1/j64arm/darwin/commercial/www.jsoftware.com/2025-03-12T13:42:25/clang-15-0-0/SLEEF=1
threads 0
OMP_NUM_THREADS=
never blas 7.598 GFlop
always blas 33.678 GFlop
lapack 504.122 GFlop
{{0 T.0}}^:] <: {. 8 T. ''
11
t1''
j9.7.0-beta1/j64arm/darwin/commercial/www.jsoftware.com/2025-03-12T13:42:25/clang-15-0-0/SLEEF=1
threads 11
OMP_NUM_THREADS=
never blas 7.628 GFlop
always blas 32.778 GFlop
lapack 525.529 GFlop
Tom McGuire
m64.zip downloaded and replaced in my regular j9.7.0-beta1 as requested
j9.7.0-beta1/j64arm/darwin/commercial/www.jsoftware.com/2025-03-28T07:00:49/clang-15-0-0/SLEEF=1
threads 0
OMP_NUM_THREADS=
never blas 9.531 GFlop
always blas 581.905 GFlop
lapack 487.535 GFlop
{{0 T.0}}^:] <: {. 8 T. ''
11
t1''
j9.7.0-beta1/j64arm/darwin/commercial/www.jsoftware.com/2025-03-28T07:00:49/clang-15-0-0/SLEEF=1
threads 11
OMP_NUM_THREADS=
never blas 9.640 GFlop
always blas 601.690 GFlop
lapack 540.344 GFlop