Hello,
I used Claude to help me write a tree-sitter grammar for Shen, and I'd like to share it with the community: https://github.com/luizdemilon/tree-sitter-shen
To be clear about provenance: the grammar, queries, tests, and docs were written by Claude under my direction — I scoped it, made the design decisions, and validated the result against the official sources, but I didn't hand-write the parser. I'm sharing it because I believe it will be useful to more people.
What it is: tree-sitter gives editors fast, incremental, structural parsing, so this provides syntax highlighting and structural navigation for Shen in Neovim, Emacs (29+ treesit), and Zed. It traces to the Official Shen Manual §12 construct by construct (a GRAMMAR.md maps each BNF production to a grammar rule), and I validated it by parsing the whole of shen-sources.
That validation is where I have a question for people who know the reader. It parses every file cleanly except for one line of valid Shen, in lib/stlib/Strings/regex.shen
(master, 93ed67e):
228: [| |RS] -> (re-or RS)
229: [bar! |RS] -> (re-or RS)
Line 228 uses a bare | as a literal list element (the regex "or"); line 229 uses the escaped bar!
Tracing sources/reader.shen ("<bar> <s-exprs> := [bar! | <s-exprs>]", then cons-form), both [| |RS] and [bar! |RS] seem to read to the same thing — (cons bar! RS). If that's right, the two clauses are identical patterns and line 228 is redundant.
Two questions:
1. Am I reading that correctly — do [| |RS] and [bar! |RS] produce the same pattern, or is there a reader subtlety that makes them distinct?
2. This is the only place in all of shen-sources where a literal bar is written as | rather than bar!. Since the bare form is ambiguous for any tool that reads | as the cons separator, would it be reasonable to standardize on bar! here? It'd be a one-line change with (as far as Claude can tell) no behavioral effect.
This isn't exactly a bug — regex.shen loads and works in Shen; it's a question about the reader and about tidying the spelling for tooling. (Validation also turned up one genuinely truncated file, tests/lisp.shen; I reported it as https://github.com/Shen-Language/shen-sources/issues/113 and it's already been fixed upstream — thanks, tiz0c!)
Feedback very welcome — on the grammar, the design trade-offs, or anything I've gotten wrong about Shen. And if it's useful to the community, I'd be glad to see it live under the Shen-Language Github org.
Thanks,
Luiz
1. Am I reading that correctly — do [| |RS] and [bar! |RS] produce the same pattern, or is there a reader subtlety that makes them distinct?
2. This is the only place in all of shen-sources where a literal bar is written as | rather than bar!. Since the bare form is ambiguous for any tool that reads | as the cons separator, would it be reasonable to standardize on bar! here? It'd be a one-line change with (as far as Claude can tell) no behavioral effect.
Are any of the local AI models strong enough to view a Shen spy trace and tell you what line of code in the function is causing trouble? ChatGPT was able to do it reasonably well about a year ago.
ChatGPT Not because ChatGPT has seen mountains of Shen code — it probably has not. It is good at Shen because Shen sits at the intersection of things LLMs already handle fairly well:
The biggest reasons:
1. Shen has a small, regular surface syntax.
There is much less incidental noise than in Python, JavaScript, C++, etc. A Shen definition has a very recognizable shape:
An LLM can infer a lot from that structure.
2. Pattern matching is highly explicit.
The rules almost read like equations. That makes it easier for the model to reason locally: this case, that case, recursive case.
3. Shen inherits patterns from better-known languages.
Even if the model has seen little Shen, it has seen plenty of Lisp, Scheme, ML, Haskell, Prolog, S-expressions, unification, type signatures, and rewrite rules. Shen is not alien; it is a synthesis of recognizable traditions.
4. Your writings explain Shen unusually clearly.
The manuals, examples, and online material give strong semantic cues. LLMs learn well from worked examples and explanatory prose. Shen has good explanatory material relative to its size.
5. Kλ gives Shen a clean conceptual centre.
The language has a compact kernel and a disciplined translation story. That makes it easier to answer “what does this become?” questions than in languages with huge ad hoc semantics.
6. Shen code is often close to the idea it expresses.
A Shen program tends to expose the algorithm rather than bury it under framework conventions. That helps an LLM see intent.
7. The language rewards symbolic reasoning.
LLMs are not theorem provers, but they are quite good at recognizing symbolic transformations, recursion schemas, list-processing idioms, and type-pattern correspondences.
So the odd answer is: ChatGPT is good at Shen partly because Shen is what a programming language looks like when the accidental complexity has been boiled off.
--
You received this message because you are subscribed to a topic in the Google Groups "Shen" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/qilang/S4S90uDpGss/unsubscribe.
To unsubscribe from this group and all its topics, send an email to qilang+un...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/qilang/4bd1700a-963a-46a6-88c0-3a9b5f8c2952n%40googlegroups.com.