Literate J and a JLang parser in TypeScript (aided by LLM)

LdBeth

unread,

Mar 31, 2026, 9:08:50 AMMar 31

to forum

I had the idea of literate programming tool for J for a while, last
time I was able to write a grammar for tokenizing J, but it wasn't for
serious jobs.

When programming in J, I often start with a coarse, longer J program
and refine it to terse, refined J sentence. I want to have a tool to
capture that process, but there isn't any yet. And that is not
literate programming in any transitional sense.

Anyhoo, in my plan there needs a tool that can rewrite a tacit
function, which still needs to be implemented.

Before that, I used Claude Code to write the tool and parser for J in
TypeScript (because it turns out for many reasons LLMs still cannot
write complicated J program in my favored style, and JavaScript with
static type checking can help the coding agent correct some obvious
errors).

Still one cannot yet say "I want the J parser" and expect Claude Opus
to write a working one.

For the tokenizer, Claude by itself figured out the grammar for
quoted string, number, and direct definition (although the JWiki
got recently updated so its knowledge is outdated.)

However, Claude cannot figure out that in J one can have `+::....::.`
be a token and hard coded a lot of the primitives, so I summarized
there core rules and tell it to check tokanization with ;: in J console.

1. One graphic character (or standalone `.` / `:`), then consume any trailing `.`
and `:` characters.
2. One alpha character, then `[a-zA-Z0-9_]*`, then any trailing `.` and `:`.
3. One digit or `_`, then `[a-zA-Z0-9_.]*`, then any trailing `.` and `:`.

and told it let me to maintain the table on which primitives J
currently has, then it get almost right. Except me also forget that
array of number is one token in J, but once been told that Opus
quickly fixed that.

On the parser, it seems Claude had some of the knowledge, about the
parsing table in https://www.jsoftware.com/help/dictionary/dicte.htm
but it turns out not only forget how to handle the parentheses, it
also totally missed the "moving successive elements from the queue to
a stack" in the document, directly used the input of tokens as a
stack, and when I pointed out the problems it took forever to produce
any response and wasted lots of token.

So I turned off the "thinking" mode, and seems it tripped on something
fairly obvious to me, confused by the 4 hidden markers not shown in the
diagram of section E:

| ❯ why you need 4 marks?
|
| ⏺ Looking at the spec again:
|
| ▎ "initially four markers"
|
| But that's for the full J parser which handles names and assignments. Rule 7
| (Is) needs to inspect 4 elements deep: NAME+N ASGN CAVN any. With 4 marks,
| even a 1-token sentence has enough padding for the 4-element window.
|
| However, parsePrimTokens skips rule 7 entirely (no names/assignments). The
| rules it handles need at most 4 elements, but rule 6 (Bident) is EDGE CAVN
| CAVN any — the EDGE guard is at position w (deepest of top 4) and z (any) just
| needs to exist.
|
| For +/ with only 1 mark: stack after pushing both = [M / +]. Only 3 items —
| the top-4 window can't fill. That's why it fails.
|
| With 4 marks: stack = [M M M M / +], top 4 = M M / +. The question is which
| rule matches. Let me re-check — I think my column mapping might be wrong. Let
| me verify with the dictionary example and the 2+1 case first. Which
| column-to-stack mapping do you use?
|
| ❯ what is a marker?
|
| ⏺ A marker is a sentinel value representing the edge of the sentence. Looking at
| the parse table legend:
|
| ▎ EDGE denotes MARK + ASGN + LPAR
|
| So MARK is one of the things that counts as an EDGE. The purpose of marks is
| to serve as guards — they tell the parser "this is the boundary, nothing
| further to the left."
|
| In the full J parser with 4 marks, the 4 marks ensure the top-4 window always
| has something to compare against even when the stack is nearly empty. But for
| parsePrimTokens which only handles primitive tokens (no names, no assignment),
| the only rules needing an EDGE guard are:
|
| - Rule 0 (Monad): EDGE in column 1
| - Rule 6 (Bident): EDGE in column 1
|
| Rules 1-5 need EDGE+AVN in column 1, which is a superset — any real token
| (adv/verb/noun) satisfies AVN, and marks satisfy EDGE.
|
| So the number of marks just needs to be enough to fill the 4-element window.
| For +/ (2 tokens + marks), with 4 marks the stack is M M M M / + giving a full
| 4-element window. With 1 mark it's only M / + — too few.
|
| But my real problem isn't the mark count — it's that I can't figure out the
| correct column-to-stack mapping. For +/, with top 4 = M M / +, I can't get
| rule 3 (EDGE+AVN V+N A any) to match regardless of which direction I read
| the columns. Can you clarify how the 4 columns map to the stack positions?
|
| ❯ § + / => § + / § § § § => $ + / § § § § now you understand?
|
| ⏺ Now I see. The queue starts as § + / and the stack starts as § § § §. Moving
| from the tail of the queue to the top of the stack:
|

then it proceed to produce the correct parser. Well, it did some amazing job.

LdBeth

unread,

Mar 31, 2026, 9:15:30 AMMar 31

to forum

The code is at https://github.com/LdBeth/litj/tree/main/src and a parsed example

```
> parseJ("9&o.(((6:o.])*1:o.[)j.(5:o.])*2:o.[)11&o.")
{
kind: "fork",
f: {
kind: "conj",
left: { kind: "num", nk: "integer", text: "9", pos: "noun" },
con: { kind: "prim", token: "&", pos: "conj" },
right: { kind: "prim", token: "o.", pos: "verb" },
pos: "verb"
},
g: {
kind: "fork",
f: {
kind: "fork",
f: {
kind: "fork",
f: { kind: "prim", token: "6:", pos: "verb" },
g: { kind: "prim", token: "o.", pos: "verb" },
h: { kind: "prim", token: "]", pos: "verb" },
pos: "verb"
},
g: { kind: "prim", token: "*", pos: "verb" },
h: {
kind: "fork",
f: { kind: "prim", token: "1:", pos: "verb" },
g: { kind: "prim", token: "o.", pos: "verb" },
h: { kind: "prim", token: "[", pos: "verb" },
pos: "verb"
},
pos: "verb"
},
g: { kind: "prim", token: "j.", pos: "verb" },
h: {
kind: "fork",
f: {
kind: "fork",
f: { kind: "prim", token: "5:", pos: "verb" },
g: { kind: "prim", token: "o.", pos: "verb" },
h: { kind: "prim", token: "]", pos: "verb" },
pos: "verb"
},
g: { kind: "prim", token: "*", pos: "verb" },
h: {
kind: "fork",
f: { kind: "prim", token: "2:", pos: "verb" },
g: { kind: "prim", token: "o.", pos: "verb" },
h: { kind: "prim", token: "[", pos: "verb" },
pos: "verb"
},
pos: "verb"
},
pos: "verb"
},
h: {
kind: "conj",
left: { kind: "num", nk: "integer", text: "11", pos: "noun" },
con: { kind: "prim", token: "&", pos: "conj" },
right: { kind: "prim", token: "o.", pos: "verb" },
pos: "verb"
},
pos: "verb"
}
```

Michal Wallace

unread,

Apr 1, 2026, 10:06:47 AMApr 1

to fo...@jsoftware.com

Interesting. I've been working on a tool that is also like literate programming, but not in the traditional sense.
I call it narrative programming - basically, instead of a dictionary mapping chunk names to texts, you use
a list of list of (chunk-name , "op" , text) tuples. Each op is a change to the chunks: create, replace, or append,
and each list of those changes is like a git "commit" after which the code is meant to be in a stable, runnable state.

To unsubscribe from this group and stop receiving emails from it, send an email to forum+un...@jsoftware.com.

Reply all

Reply to author

Forward