Status of C-runtime and majestic branch — parsing incomplete?

10 views
Skip to first unread message

Martin Gercke

unread,
Jun 9, 2026, 5:39:25 AMJun 9
to Grammatical Framework
Hi,
I am currently experimenting with GF-WordNet to set up an example which - parses a German sentence - removes ambiguity by selecting the "correct" abstract syntax/meaning - makes translation to other languages. e.g. "Die Frau singt" -> an abstract syntax tree such as PredVPS (DetCN (DetQuant DefArt NumSg) (UseN woman_1_N)) (MkVPS (TTAnt TPres ASimul) PPos (UseV sing_2_V)) — with woman_1_N and sing_2_V picked as the intended WordNet senses — which then linearizes to English "the woman sings", Spanish "la mujer canta", Italian "la donna canta".
For this I would love to use the C runtime: the Haskell runtime enumerates all trees, which even for simple sentences explode to >1.000 leaves. C runtime promises bounded n-best parsing which could help. Also, I wanted to see how I can boil down the tree given that I know the "correct" abstract meaning of the nouns/verbs/adjectives involved.
So I went looking and found the majestic branch by looking at an issue (#130, "PGF as a database"), which looked very promising. I built the whole stack on WSL/Debian:
- C runtime (`src/runtime/c`, libpgf),
- Python binding (`src/runtime/python`),
- gf compiler (gf-4.0.0),
and downloaded the robust German grammar as an NGF.
`bootNGF`, `readNGF` and `lookupMorpho` work great and are *instant*
(mmap) — really nice.
The problem: parsing crashes. I un-commented `Concr.parse` in the binding wrapping `pgf_parse` to try and see if I could get it to run. It segfaults *inside the runtime* — even on the tiny `Food`
example compiled by the majestic `gf` itself (so it's not a format mismatch). The crashes seem to be in the LR machinery (`PgfParser::shift` dereferencing an invalid `shift->seq`, and a
`÷0` in `Production::operator new` where `lin->res.size()==0`). `git log` on `parser.cxx` shows active work ("an experimental left-corner table maker", etc.), so it looks like the parser on `majestic` is still mid-rewrite / incomplete.
Could somebody give me some guidance?
1. What's the current status of the C runtime overall, and which branch is the "live" one? 2. Which (C-runtime) runs on cloud.grammaticalframework.org/robust?
3. How does `majestic` relate to the other C-runtime branches? We see several (`pgf2-complete`, `lpgf`/`lpgf-memo`/`lpgf-string`, `concrete-new`, `compact-pgf`, `c-runtime`, ...)
4. Is there a branch/commit where end-to-end *parsing* with the NGF format actually works or is it even planned to get the parser running on the `majestic` branch at some point?
5. Any guidance on getting the C runtime + NGF running for parsing (not just lookup/linearization) would be hugely helpful.
Thanks a lot for any pointers! Martin

Krasimir Angelov

unread,
Jun 9, 2026, 7:30:03 AMJun 9
to gf-...@googlegroups.com
Hi Martin,

The parser in the majestic branch still has bugs. I plan to get back to it at the end of the summer when I will have less distractions from the many other things happening during the normal semester. The runtime that runs cloud.grammaticalframework.org/robust is in the one from the majestic branch, but is an older version which doesn't include the parser. The branches `pgf2-complete`, `lpgf`/`lpgf-memo`/`lpgf-string`, `concrete-new`, `compact-pgf`, `c-runtime` are other experimental versions which as far as I know are not used.

If you compile GF from the main branch and then compile the C runtime in src/runtime/c + the bindings in src/runtime/haskell-bind or src/runtime/python, then you can use the older C runtime which includes a parser but doesn't allow dynamic changes in the grammar. This also means that you have to compile GF WordNet with the compiler from the main branch.

Best,
Krasimir

--

---
You received this message because you are subscribed to the Google Groups "Grammatical Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gf-dev+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/gf-dev/b3c1649d-77a6-4c0a-95bc-917a9ed861fdn%40googlegroups.com.

Martin Gercke

unread,
Jun 11, 2026, 8:40:40 AM (13 days ago) Jun 11
to Grammatical Framework
Hi Krasimir,

Thank you, that was exactly the information I needed.

I set things up the way you suggested: GF compiled from the main branch, the C runtime from src/runtime/c together with the Python binding, and GF WordNet compiled with the main-branch compiler. That works well — parsing, lookupMorpho and linearization all behave as expected with the German WordNet grammar.

Along the way I found a parser bug in the C-runtime: words whose lexical sequence has a case-variant twin in the sequence table (e.g. "Schule" / "schule") can never be parsed — pgf_parsing_lookahead registers only one of the two equal sequences, so for some words the noun lemma is never predicted, deterministically per word ("Schule", "Brot", "Stadt" fail while "Birne", "Haus" work by luck, probably because of the array layout).

I filed an issue and opened a pull request with a fix (verified against an unmodified master build before/after):


Regards

Martin
Reply all
Reply to author
Forward
0 new messages