testing LLM code generation

141 views
Skip to first unread message

Thomas McGuire

unread,
May 6, 2025, 12:46:23 AMMay 6
to fo...@jsoftware.com
I have been lurking on reddit r/llamalocal guaging the status of LLMs to run locally to perform coding. 
on my 32GB MacBook Pro M2max I can run many of the favorites on the reddit site. 

This begs the question what prompts would be useful in evaluating how well an LLM codes?

Many people hide their favorites close to the vest. The fear is if they publish online it will be trained on in a next generation of the LLM and no longer test capability. 

- One that I use because it’s easy to specify is: code a function that will sum the integers from 1 to the parameter given to the function. 

- Another I saw mentioned was to sum the even integers in a list of integers. 

- Another was a test of code completion, take a medium sized code base with good unit testing and gut portions of the function bodies. Have LLM complete the code and then run the unit tests. The total number of passed tests is a measure of the LLMs code completion capabilities.

I downloaded the following model from Hugging faces: qwen2.5-coder-32b-instruct-q4_k_m.gguf and am running it under llama.cpp. 
 
For the sum of integers from one the qwen2.5-coder LLM had trouble with the positioning of increment verb. Even when I told it twice what it should do it still insisted in reversing its position. Here is the pertinent exchange:

---------

> your placement of the the increment verb `>:` after the `i.` makes the `>:` appear dyadic instead of the proper use before `i.` in its monadic form as increment

You are correct, and I apologize for the mistake. The `>:` verb should be used before `i.` to increment the value, not after it. Let's correct the definitions.


### Using Tacit Definition


Here's the corrected tacit definition:


```j

sumToN =: 13 : '+/ i.>: y'

```


### Using Explicit Definition


Here's the corrected explicit definition:


```j

sumToN =: 3 : 0

  numbers =. i.>: y

  +/ numbers

)

```

---------

In the first exchange (that I didn’t show) the LLM did suggest 1+i. as a possible alternative. So when I asked it to code with that alternative I got the following: 

---------

> use the `1 +` version and don't use increment (>:)        

Certainly! Using the `1 + i. y` approach is a straightforward and clear way to generate the integers from 1 to `y`. Here are both the tacit and explicit definitions using this method.


### Using Tacit Definition


Here is the corrected tacit definition:


```j

sumToN =: 13 : '+/ 1 + i. y'

```


### Using Explicit Definition


Here is the corrected explicit definition:


```j

sumToN =: 3 : 0

  numbers =. 1 + i. y

  +/ numbers

)

```

---------

Here the LLM gets it right. Likely because `+` verb is used similarly in almost every programming language. 

I thought this was interesting that I am running a quantized model on my laptop and it’s getting close on the J code. I can get some considerable memory on a MacBook Pro. So the next one I buy I may be able to run a llama-4-maverick model. Since it is a mixture of experts LLM there is a way to split the MOE and the transformer across CPU and GPU to fit the whole thing IF you have enough VRAM. 

For the dull languages like C++, Java this LLM model had no trouble writing sumEven or sumIntegers. 

If anyone has some suggestions on what would be an easy to specify difficult for an LLM to program in J please let me know. It would be nice to collect some of these test so we can identify an LLM that may be truely useful to the J community. 

Thomas McGuire

unread,
May 6, 2025, 6:11:31 AMMay 6
to forum, Thomas McGuire
testing out Claude 3.7 sonnet online because I can't run it locally, it does a much better job. 
the chatter on reddit is that if your willing to get the paid service it is even better. 

sum integers 1 to N
NB. Verb that sums integers from 1 to n sumToN =: +/ @: >: @: i. NB. Alternative implementations sumToN2 =: [: +/ 1 + i. sumToN3 =: [: +/ [: >: i. sumToN4 =: [: +/ [: (1&+) i. NB. Mathematical formula version (n * (n + 1) ÷ 2) sumToNFormula =: [ * 0.5 * >:

sum even numbers:
NB. Verb that sums even integers in an array sumEvens =: +/ @: (] #~ 0 = 2 | ])



Thomas McGuire

unread,
May 7, 2025, 4:46:24 PMMay 7
to fo...@jsoftware.com

The article B. Johnson and T. Menzies, "AI Over-Hype: A Dangerous Threat (and How to Fix It)," in IEEE Software, vol. 41, no. 6, pp. 131-138, Nov.-Dec. 2024, doi: 10.1109/MS.2024.3439138, was looking at the short comings of AI LLMs. But their example is a good test to use for future AI LLMs. It is simple to propose and the human solution is easy to understand. In a nutshell you want to put numbers into bins. You do it fitted to a guassian distribution that is estimated. Their proposed distribution is as follows (transposed into J):


pecdf =: verb define

0 1 pcdf y

:

'mu sd' =: x

z =. sd %~ y - mu

1 - 0.5 * ^(_0.717*z)- 0.416*z*z

)


This is only good for z >: 0


The bin function is defined in the prompt below.


It turns out that chatGPT4o struggled with reworking this problem. The authors had to stop the chatbot from thinging about it. In my here in J (they were using python) Claude 3.7 Sonnet had trouble reworking the function even though it thought of the correct answer:

  • The current pecdf function provides an approximation of the CDF for z ≥ 0 using the formula:
    1 - 0.5 * ^(_0.717*z) - 0.416*z*z
  • For negative z values, we can use the symmetry property of the normal distribution:
    CDF(-z) = 1 - CDF(z)


The authors turned to StackExchange and the human solution to this is the following (that I have transposed into J):


cdf1 =: monad define

1 - 0.5 * ^(_0.717*y)- 0.416*y*y

)


ecdf =: verb define

0 1 ecdf y

:

'mu sd' =. x

z =. sd %~ y - mu

if. z < 0 do.

1 - cdf1 -z

return.

end.

cdf1 z

)



PROMPT:

the bin problem uses an estimate to the cumulative distribution function

in order to decide what “bin” (for some number of bins N) a number belongs in. The definition of the "bin"

function in the J programming language from jsoftware and a proposed but not

fully functional estimated cumulative distribution function follows the text

of this prompt delimited by "===". The proposed estimated cumulative distribution

(named pecdf in the code) function is only valid for z greater than or equal to zero.

Please rework "pecdf" to create the needed "ecdf" that works for any value of z.

both above and below zero.

===

bin =: verb define

15 0 1 ecdf x

:

'N mu sd' =. x

<. N * (mu,sd) ecdf y

)



pecdf =: verb define

0 1 pcdf y

:

'mu sd' =: x

z =. sd %~ y - mu

1 - 0.5 * ^(_0.717*z)- 0.416*z*z

)

===

Pascal Jasmin

unread,
Aug 6, 2025, 4:50:43 PMAug 6
to fo...@jsoftware.com


I haven't tried Claude yet.

A problem that is hard enough to test LLMs with is Solve Chess mate in X moves.

ninja.ai was pretty bad, but it did work for a sudoku solver.  Perhaps it has a J web solution access to solve it.  (Chess not done in J)

copilot is worse, but it is unlimited with windows 11, so I can teach it J, and it tells me the world benefits from my corrections.  Copilot did not understand amend function as an example, attempting to use {~ to implement it.

qwen 3-235b ... https://chat.qwen.ai/  (qwen 2.5 is said to be good local approachable LLM 24gb)

seems to provide a better starting point, and better improvements to code, though all of the LLMs understand the nature of the requests given.

Something relatively hard, that you have to help the AI with and deal with correcting them is the best way to test them, IMO.  You can also jump from limited free sessions to other sessions giving them input from other AI.  Copilot will rule the world because it trains from massaged corrections

all of the models seem to have organized the code well, just that the other 2 were far from working.  Enough problems with qwen 3 too, just closer to working.

kimi.com (released last week) is hyped from leaderboard scores, and I will try that next.
To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Thomas McGuire

unread,
Aug 10, 2025, 6:03:12 AMAug 10
to fo...@jsoftware.com
Yes I would be interested in anything you find out with these models in relation to J. Lately I have been trying to run models that I can run on my own hardware. 
I have a GTEK AMD Max 395 with shared GPU memory of 128GB. It is able to run the newest open source release from OpenAi: openai-gpt-oss-128b-experimental.

LMStudio worked with AMD to get this model to run. It is surprisingly quick though I don’t know where to find the stats for LMStudio to tell you how many tokens per second. 

I have found that for J many times a detailed prompt (I suppose the case with all good answers from LLMs) can be the key to getting working code. 

Most of these LLMs have no clue how to do any complex tacit programming. They spit out an answer quickly but it is just delusional J programming. 

However on my Kadane test case I tweaked the prompt as follows and got what appears to be a valid implementation:

PROMPT:
the J language is an array programming language from jsoftware. You are a helpful J language programming assistant. Write the code for a verb (function in normal programming parlance) that will find the maximum subsequence sum of an array of numbers. You don't have to keep track of the indices of the subsequence just return the maximum sum. Use explicit definition to define the verb.

THE ANSWER:

NB. maxSubSum y ← maximum sum of any contiguous sub‑array of y

maxSubSum =: 3 : 0 NB. explicit definition

if. 0 = #y do. 0 return. end. NB. empty argument → 0


NB. If every element is negative the classic Kadane algorithm

NB. would return 0 (the empty subsequence). Often we want the

NB. largest (least‑negative) element instead, so handle that case.

maxNeg =. >./ y NB. greatest element of y

if. maxNeg < 0 do. maxNeg NB. all numbers ≤ 0 → return it

else.

p =. 0 , +/\ y NB. prefix sums with a leading 0

m =. <./\ p NB. running minimum of the prefixes

>./ p - m NB. maximum (current‑prefix – min‑so‑far)

end.

)

maxSubSum _2 1 _3 4 _1 2 1 _5 4

6

maxSubSum 2 3 _8 7 _1 2 3

11

maxSubSum 5 4 1 7 8

25


————————

An aside on all of this. I have been thinking that the JHS provided by J would be a good Agentic platform. These LLMs that code can give a fairly consice JSON encapsulated answer to run a function or perform shell commands and provide that as a sequence of commands. Running JHS from a local account on your machine you should be able to create a web interface that lets you ask the LLM to set up an entire project and, by creating a J agent to listen in, respond to the JSON that’s returned. You can run straight forward J verbs to create directories, files, etc. 

Tom McGuire

Devon McCormick

unread,
Aug 12, 2025, 8:22:30 PMAug 12
to fo...@jsoftware.com
We talked about this topic extensively at today's NYCJUG meeting.  Take a look at the notes - https://code.jsoftware.com/wiki/NYCJUG/2025-08-12.  Let me know what you think and let's keep the discussion going.

To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.


--

Devon McCormick

Flâneur


Jose Mario Quintana

unread,
Aug 13, 2025, 4:30:56 PMAug 13
to fo...@jsoftware.com

SuperGrok...

C
code a function that will sum the integers from 1 to the parameter given to the function in the J programming language

A
sum_to_n =: +/ @ (i. @ >:)

C
code a function that sums even integers in an array

A
sum_even =: +/ @ (2&| # ]) @ ]

This function:

Uses 2&| to test for even numbers (returns 0 for even, 1 for odd).
# filters the array to keep only elements where 2&| is 0 (even numbers).
+/ sums the filtered array.
@ ] applies the operation to the input array.

Example usage:
j
sum_even 1 2 3 4 5 6
12  NB. Sums 2 + 4 + 6

C
sum_even 1 2 3 4 5 produces 9 (sums 1 3 5). Try again?

A

Apologies for the confusion. The previous function summed odd numbers instead of even numbers due to an error in the filtering logic. In J, 2&| returns 0 for even numbers and 1 for odd numbers, so we need to filter where 2&| is 0. Here's the corrected function to sum even integers in an array:

sum_even =: +/ @ (#~ 0 = 2&|)

This function:

2&| checks if numbers are even (returns 0 for even, 1 for odd).
0 = 2&| creates a mask where 1 indicates even numbers.
#~ filters the array to keep only even numbers.
+/ sums the filtered even numbers.

Example usage:
j
sum_even 1 2 3 4 5
6  NB. Sums 2 + 4

Pascal Jasmin

unread,
Aug 13, 2025, 6:22:39 PMAug 13
to fo...@jsoftware.com
read your notes, and was waiting to respond to Tom until more progress made, but an update.

I asked LLM's to make a "solve in X mate chess program".  My successful workfolw to get legalmoves generation started with kimi K2, claude sonnet 4 made a partially working previous file, working fairly easily. Qwen 3 245 has been useful in expanding the program including a capability of understanding my kv.ijs library, and generating good board scoring functions.  I could have probably used Qwen 3's output into Claude's input to get a working program.  I just tried K2 and it was good enough.  My experience is based on models having some understanding of Chess.  Qwen 3 often makes suggestions compared to stockfish, and so may have especially trained on chess models.

Only the above models limited the hairpulling experience of not understanding } or cell { from tables, and not understanding calling dyads and decomposing arguments inside the dyad functions.  Asking models for coding style calling conventions is a hairpulling experience.  Open AI's open source 120g model, and copilot were bad for my task.  Sonet 4 is good and fast, partly because it separates code file output into a separate pane, and updates just parts of it based on requests, instead of regenerating whole file.

Even bad LLMs were good at organizing code, and having an understanding of prompt requests, and ignoring them after the general skeleton would have completed basic functionality quicker.  My interest in chess is tangential to creating a generic search function in J, and I did not have the patience to generate a "legalmoves" from chess position from scratch, and no one before me has ever published one.  Part of my process is learning how to talk to LLMs though all of them seem to understand my prompts even if they are bad at J.

Some workflow advice in this late stage of working with Qwen 3 is that instead of testing the LLM for how stupid its response could possibly be, just give it corrected code if you are done testing its stupidity.  It is still not 100% at remembering 100% of the file, which is hairpulling later on.

I find the "vibe coding" workflow useful now.   After several days of fighting/testing stupidity, and moving away from models that were too frustrating, this suits my concentration commitment level.  Qwen 3 can provide genuinely useful insights.  Qwen 3 gets slower and slower the longer you use it, which is great, because it considers your full past history before opening its mouth.  thoughtfulness > speed 100% of the time.

For local LLM sizes, afaik everything is written in python with either pytorch (no native C api, and hard to find) or tensorflow (2.0) (has C API).  Though I haven't looked, modifying the models to call J with generated files, and then fix their output based on bugs, before ever spitting out a token to the user would be a big improvement.

Having a TDD framework (or just any extra J script do testing) and teaching another model to understand framework and generate tests on function signatures would be a big productivity boost

Pascal Jasmin

unread,
Nov 21, 2025, 1:19:10 PM (12 days ago) Nov 21
to 'Pascal Jasmin' via forum
The magic LLM question that has an incredibly high failure rate:

In J language, what is -/ 1 2 3

Devon McCormick

unread,
Nov 21, 2025, 1:29:43 PM (12 days ago) Nov 21
to fo...@jsoftware.com
We talked about this in an NYCJUG meeting in reference to a large study done at Stanford: https://code.jsoftware.com/wiki/NYCJUG/2025-08-12#Stanford_Study_on_Suitability_of_LLMs_for_Coding .
The TL;DR is that LLMs do better on "green field" applications (brand new from scratch) using more widely-used languages (because there's a larger training set).

Pascal Jasmin

unread,
Nov 22, 2025, 12:31:55 AM (12 days ago) Nov 22
to fo...@jsoftware.com
some models have the ability to write J code, with various levels of mistakes. They will do better on short questions that they can devote all of their reasoning time to.  Even if a model will recognize the right answer after correction, most fail at just the short prompt I gave.  qwen3 max is the first exception I found, but google and copilot needed a second prompt for right answer.

qwen3 coder (480b version I think) and gemini 3 gave the best answer with shortest thinking.
qwen3 max had to explore reasoning a bit more, but still good.
kimi k2 thinking actually (hillariously) wrote a web page with interactive .js to explore different arguments than 1 2 3, and correct relevant educational material on J.  It got it right, including the interactive results I tried.  But it takes absurdly long to generate.

google AI, copilot, minmax m2, glm 4.6 needed a second prompt for right answer.

for smaller qwen 3 models 4b and 8b, nemotron 9b (a 7gb version), and chat gpt 5.1 pro, adding constraints into the prompt (below) did give right answer. (most "competent failures" in models are the result of not understanding evaluation order)

"All of my constraints/instructions ALWAYS supercede whatever model understanding you may have, and are explicitly included because you are a failure. Do not explore reasoning contradicting instructions. In J language, it is parsed right to left. Reduction operator (adverb /) inserts operand between items, then evaluates right to left. What is result of -/ 1 2 3"

chatgpt was very succinctly correct in answering.  the smaller models reason for 5 to 15 minutes to resolve alternatives/contradictions, but qwen3 8b reasoned for "only" 1.46 minutes. qwen3 1.7b took extra prompts to get it right, but used less thinking time. (the b parameters are billions of parameters.  smaller is usually faster)

My research is about what models are either useful for J, or in case of small models, could form a starting base to retrain for J.

I believe a very short prompt with a clear answer is the best initial evaluation of a model.  Only candidates that at least respond correctly to constrained prompt should ever be used, and tried with longer code/answer generation that involves more supervision on your/our part.  

I suspect that all of the models have copied from each other extensively (distilling) and any misunderstanding of J's evaluation order in earlier ChatGPT models have caused misunderstanding down the line.  Even though larger Qwen 3 models get it right.

Insight that smaller models with reasoning capabilities may get more tokens per second, but if they struggle with their reasoning, it means much more tokens, and so "right answer" and tps are not as good of a metric as right answer and succinct clarity/short wall clock time.

Thomas McGuire

unread,
Nov 22, 2025, 8:47:25 AM (11 days ago) Nov 22
to fo...@jsoftware.com
I’ve been in conference hell for a while and my experimentation has gone way down LLMs and J. So first, thanks Pascal for the information on model performance. The internet is getting so hard to navigate because half of the stuff out there is AI generated marketing. Every new release of an LLM is accompanied by a slew of "this is the best model ever” and "you will never have to code again” released to Medium and reddit. 

So my interest has changed on LLMs somewhat to what I can run locally. I bought an AMD 395 with 128GB of shared ram and the AMD 8060s GPU and have been experimenting with various models. Right now I am running ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4 quantization of the gpt-oss-120b model. It is able to run fairly quickly and with your prompt was able to come up with the right answer:

-----
Reasoning: about 2 paragraphs and a few single lines reiterating the prompt and speculating about the answer. I did not copy it here

Answer:
In J, the reduction adverb / inserts the verb between the items of the argument list and then evaluates the resulting expression right‑to‑left.

For the expression

-/1 2 3

Insert the verb - between the items: 1 - 2 - 3.
Evaluate right‑to‑left:

First evaluate 2 - 3 → -1.
Then evaluate 1 - (-1) → 2.
So the result is:

2

It accomplished this at about 34 tokens/sec not too bad a speed for a $2200 machine.
-----

All of this is a round about way of adding another model to the mix and agreeing with Pascal’s statement: “. . . in case of small models, could form a starting base to retrain for J.”

Using an open source model it looks like you can do a LORA finetune for less than $100. But now we have gone out of my area of knowledge. How do you set up documents and code for LORA fine tune? Which documents and which code would be best to include?

Tom McGuire

LdBeth

unread,
Nov 22, 2025, 9:25:40 AM (11 days ago) Nov 22
to fo...@jsoftware.com
>>>>> In <91AC9B77-3558-4C1A...@yahoo.com>
>>>>> "'Thomas McGuire' via forum" <fo...@jsoftware.com> wrote:
> [1 <text/plain; UTF-8 (quoted-printable)>]
> [2 <text/html; UTF-8 (quoted-printable)>]

> So my interest has changed on LLMs somewhat to what I can run
> locally. I bought an AMD 395 with 128GB of shared ram and the AMD
> 8060s GPU and have been experimenting with various models. Right now
> I am running ggml-org_gpt-oss-120b-GGUF_gpt-oss-120b-mxfp4
> quantization of the gpt-oss-120b model. It is able to run fairly
> quickly and with your prompt was able to come up with the right
> answer:

> ...

> Using an open source model it looks like you can do a LORA finetune
> for less than $100. But now we have gone out of my area of
> knowledge. How do you set up documents and code for LORA fine tune?
> Which documents and which code would be best to include?

> Tom McGuire

For gpt-oss you could look for ways to RAG such as
https://github.com/sureshram/gpt-oss-rag because

1. fine-tuning is not necessary, and even with LoRA your hardware is
at most only capable of fine tuning the 20b model. fine-tuning the
120b model need a cluster.
2. there probably isn't enough text material about J for get good result
with fine-tuning.

Pepe

unread,
Nov 24, 2025, 8:39:38 PM (9 days ago) Nov 24
to fo...@jsoftware.com

Grok 4.1 Thinking

In the J language, what is -/ 1 2 3?

2

In J, -/ is the insert (reduce) adverb applied to the subtraction verb -. It folds right-to-left over the list: 1 - (2 - 3) = 1 - (-1) = 2. For confirmation, -/ 1 2 3 4 yields -2, matching 1 - (2 - (3 - 4)).



Sent from my iPhone

Thomas McGuire

unread,
Nov 26, 2025, 12:44:11 AM (8 days ago) Nov 26
to fo...@jsoftware.com
So I got to thinking what about the large context on gemini 3 pro. I have an openrouter account so I can trial some of these pay for LLMs this way I have 1 account with money and I don’t have to open accounts with each provider (quite nice really if you have some ADHD on the next latest greatest LLM to come along). So point the account to gemini 3 pro. I converted some of the documents that come with a J installation to mark down format (to speed processing) and added Keith Smillie’s jstats.ijs script from J6 days I had lying around. It amounted to about 120k of tokens and Gemini 3 pro has a 2 million context window. I guess this is a known technique called “context stuffing”. I can upload those files to openrouter so they get included in the chat automatically. I added the following prompt:

“you are a helpful J programming language coding assistant. J programming language is an array language developed by Ken Iverson. He originated the APL language while at IBM back in the 1960s. That was the original array programming language. J programming language like its predecessor shares the unique style of parsing expressions from right to left. There is no operator precedence accept for parenthesis. Expressions enclosed in parentheses are parsed first before operators. One trick if you are unsure of how an expression will parse is to use parentheses to guarantee a particular parsing of an expression. I have provided four (4) J language programming reference documents in mark down format for you to use. I have also provided the jstats.ijs script in text format. This is a statistical package written entirely in J and will give you good examples of how J verbs are combined to create mathematical operations on arrays. With this information write a J Language script to convert a comma separated value file (a .csv file) into J boxed format. For now keep numbers as strings don't add any conversion.”

The last sentence is the payload for the script I wanted written. I got back the following: 

NB. ====================================================================

NB. csv_loader.ijs

NB. A script to load a comma-separated value file into a J boxed table.

NB.

NB. Usage:

NB. load 'csv_loader.ijs'

NB. table =: read_csv 'c:/path/to/mydata.csv'

NB. ====================================================================


NB. Define strictly necessary constants if simpler profile isn't loaded

CR =: 13 { a.

LF =: 10 { a.


NB. --------------------------------------------------------------------

NB. Verb: read_csv

NB. Purpose: Reads a file, splits by lines, then by commas.

NB. Arguments: y = file path (string)

NB. Returns: A rank-2 array of boxes (Result). Data remains strings.

NB. --------------------------------------------------------------------

read_csv =: 3 : 0

NB. 1. Read the file from disk (Chapter 28: Data Files)

NB. fread returns _1 if the file cannot be read.

raw_text =. 1!:1 < y

NB. Check for read errors

if. raw_text -: _1 do.

'Error: Could not read file.' return.

end.


NB. 2. Normalize broken line endings (Chapter 28.3.2: Text Files)

NB. Remove Carriage Returns (CR) to handle Windows format.

clean_text =. raw_text -. CR

NB. Ensure the text ends with a Line Feed (LF) so the Cut command works

NB. on the last line. {: takes the tail (last item).

clean_text =. clean_text , (LF ~: {: clean_text) # LF


NB. 3. Cut text into rows (Chapter 17: Cutting)

NB. <;._2 cuts y based on the last character of y (which is now LF).

NB. It boxes the result and removes the delimiter.

rows =. <;._2 clean_text


NB. 4. Define a local verb to split a single string by comma

NB. We append a comma to the end of the row so ;._2 works properly.

split_by_comma =. 3 : '<;._2 y , '','''


NB. 5. Apply the split verb to Every row (Chapter 25: Each)

NB. &.> is the 'Each' adverb (derived from Under Open).

NB. It applies split_by_comma to the contents of each box in 'rows'.

table_of_lists =. split_by_comma &.> rows


NB. 6. Convert list of lists into a Matrix (Chapter 26: Open)

NB. Monadic > (Open/Mix) turns a list of boxed lists into a table.

NB. If rows have different lengths, J automatically pads with empty boxes.

> table_of_lists

)


NB. A brief demo function (Optional: Usage example)

demo_csv =: 3 : 0

smoutput 'To use this script, type:'

smoutput ' data =: read_csv ''your_file.csv'''

''

)


This cut and pasted into JQt without a problem. It loaded without error. I ran it on a csv file in one of the labs in J:


data =: read_csv '/Applications/j9.7/addons/labs/labs/examples/data/tdata.csv’


it runs and produces a J boxed table with the only problem being the double quote delimiters have been left in place.


I asked the Gemini 3 pro to fix that and got back this:


NB. ====================================================================

NB. csv_loader_v2.ijs

NB. Loads a CSV file into a boxed table and strips enclosing quotes.

NB.

NB. Usage:

NB. load 'csv_loader_v2.ijs'

NB. data =: read_csv 'c:/data/test.csv'

NB. ====================================================================


CR =: 13 { a.

LF =: 10 { a.

Q =: '"' NB. Define the double quote constant


NB. --------------------------------------------------------------------

NB. Verb: strip_quotes

NB. Purpose: Removes the first and last character if they are double quotes.

NB. Used as a helper for read_csv.

NB. --------------------------------------------------------------------

strip_quotes =: 3 : 0

NB. If string is too short (0 or 1 char), return it as is

if. 1 >: #y do. y return. end.


NB. Check if first ({.) and last ({:) characters are quotes

is_quoted =. (Q = {. y) *. (Q = {: y)


if. is_quoted do.

}. }: y NB. }. (Behead/Drop First) and }: (Curtail/Drop Last)

else.

y NB. Return original if not quoted

end.

)


NB. --------------------------------------------------------------------

NB. Verb: read_csv

NB. Purpose: Reads file, splits by LF then comma, then strips quotes.

NB. --------------------------------------------------------------------

read_csv =: 3 : 0

NB. 1. Read file

raw_text =. 1!:1 < y

if. raw_text -: _1 do. 'Error: File not found' return. end.


NB. 2. Normalize line endings (remove CR, ensure final LF)

clean_text =. raw_text -. CR

clean_text =. clean_text , (LF ~: {: clean_text) # LF


NB. 3. Cut into rows

rows =. <;._2 clean_text


NB. 4. Define split verb: split by comma, then apply strip_quotes

NB. We use Each (&.>) so strip_quotes applies to every cell.

NB. Note how we compose operations:

NB. a. Append comma (y,',')

NB. b. Cut (<;._2)

NB. c. Apply strip_quotes to each resulting box (strip_quotes&.>)

split_and_clean =. 3 : 'strip_quotes &.> <;._2 y , '','''


NB. 5. Apply to all rows

table_of_lists =. split_and_clean &.> rows


NB. 6. Open into matrix (handles ragged rows by padding with empty boxes)

> table_of_lists

)


This now runs as you would expect.


Not too bad but probably not a tough J program to complete. But it does show that “context stuffing” may be a way to provide enough information on the J language for the LLM to have a chance at producing correct J code.


Tom McGuire





For gpt-oss you could look for ways to RAG such as
https://github.com/sureshram/gpt-oss-rag because

1. fine-tuning is not necessary, and even with LoRA your hardware is
  at most only capable of fine tuning the 20b model. fine-tuning the
  120b model need a cluster.
2. there probably isn't enough text material about J for get good result
  with fine-tuning.

Pascal Jasmin

unread,
Nov 26, 2025, 9:20:05 PM (7 days ago) Nov 26
to 'Thomas McGuire' via forum
The terseness of J, and terseness of its parsing rules and mathematical notation, makes it an ideal target for LLM coding assistance.  Fewer tokens.

my (patience resistant) review of open models that would fit under 9gb (before context) of ram are that the qwen based models are the only candidates with pre-existing awareness of J, and from 8b to 32b (quantized to fit), the original qwen 14b is best at coherent thought.  Though it is still a nightmare at evaluating the following painstakingly crafted prompt of


Evaluate the expression */(+:@) 1 2 3 exactly as follows:

ALL instructions override your model assumptions and are never typos.
The adverb / inserts a dyadic verb to apply to y.
Adverbs only have left operand. No right operand. A sequence of adverbs is evaluated leftmost first.
(verb conjunction) when parenthesized results in an adverb. It is a partial application of a conjunction. v(u@) or u(@v) both will resolve to u@v
u@v composes u after v, where u is a monadic function.


with small models, even the best one, it will still get it wrong.  The 14b model has the most coherent thinking, and eventually understands future corrections.  But the small models will spend close to 2000 tokens thinking through all of the constraints.  So "stuffing context" with rules does not lead to an efficient wrong answer.

There are medium sized open models that understand J.  minimax, glm, qwen have versions that will fit in under 128gb (under 300b parameters).  These are better than gpt oss 120, for J knowledge and for reasoning, and gpt is very resistant to being corrected.  

But the above long prompt version also causes long thinking time in glm4.6, and a wrong answer. Still, it is easy to correct its mistakes with general principles.  Far fewer tokens are consumed if you simply ask it without constraints, then add specific constraints  based on what they got wrong.

State of the art models, (open source inclusion of kimi k2, and qwen max) are noticeably, but still marginally, better than the open medium sized models.

When asked to write a J program, such as finding legal chess moves, however, they will make extensive mistakes.  The constraints that could be added to prevent the most common ones are:

You may not define multiline functions inside other multiline functions.
N N or A N expressions are invalid J syntax
when crafting a dyadic verb application, you should parenthesize the intended x argument

For advanced models, it is possible to simply post correct their mistakes instead of adding constraints.  The same would apply to the small models that have partially coherent J thought.  You get shorter thinking the first time, and they just think a little harder on specific corrections without needing to be overly concerned about what they got right "internally".

regarding the "greenfield" recommendation for LLM code generation, it would be more accurate that they do better on shorter code requests.    Though using the now archaic (4 months old) qwen 235b version, I was able to get coherent use of my kv library, and good incremental code generation.  Later attempts with minimax m2 and glm 4.6 ended in frustration without additional libraries.  Reverting to past mistakes is a deal breaker.

For state of the art models, there is a massive gap between thinking results and non-thinking.  Non-thinking is often free.  Instead of  wasting "$/thinking" on my "qualifying prompts"  use the thinking mode for full  program requests, but include the constraints for common mistakes, and catalog remainining mistakes for future constraints on "thinking" prompts.  Non thinking mode can still solve for generated mistakes.

Not that I'm an expert but some worthwhile J projects would be:

make a Jget/Jset "server" equivalent for python.
There are many hybrid/ML alternatives to LLMs that can be better explored in J, or at least be coordinated from multiple python instances that execute unmodified but regularly updated loaded code.
Some of these hybrid/alternatives are could an LLM represent language understanding with J's parsing rules, and functions?  Couldn't an LLM encode only reasoning and J coding rules, and then look up knowledge from an expert system (J could implement PROLOG  functionality has always been an approachable project)?  Then looking up reasoning rules for a domain would also expand a small model?

A simpler project would be to just announce a J benchmark for LLMs, so that they can all tune to it :P
Reply all
Reply to author
Forward
0 new messages