Hello SIGARAB members...
I am reaching out to the SIGARAB community with a focused question and a proposal.
Question:
Are there ongoing works that move beyond token-based representations toward meaning-grounded units—especially in Qur’an-centered studies or semantically anchored embedding models?
I am particularly interested in approaches where the representation unit is not a token/subword, but a stable semantic entity.
Context (brief):
Most current NLP systems (Word2Vec, GloVe, FastText, BERT variants) operate on tokens or subword units. Even when they capture semantic similarity, the representation layer itself is still fundamentally symbolic and surface-driven.
Our Direction (Marble Hypothesis):
We are experimenting with a shift from Token → Marble (meaning particle).
- A Marble is defined as a dynamic but stabilizing meaning node.
- Words do not directly represent meaning; instead, they activate candidate Marbles.
- Context (syntax, morphology, operators) collapses this “probability cloud” into a specific Marble.
- Morphological derivations (especially in Turkish) generate new Marble candidates rather than mere surface variants.
This leads to a three-layer interpretation model:
- A-field (Meaning): internal semantic core of a Marble
- B-field (Relations): interactions between Marbles
- C-field (State/Consciousness): temporal, modal, and operator-driven dynamics
We are currently testing this idea using:
- Multi-language Qur’an corpora (TR, EN, AR, etc.)
- Embedding baselines (FastText, Word2Vec, SBERT)
- Morphology-aware pipelines (Turkish morphological analysis)
- Anchor-based semantic clustering (recurrent verses, named entities, semantic motifs)
Why this matters:
Token-based systems approximate meaning.
We are trying to model meaning as a first-class computational object.
---
Open Questions to the Community:
1. Are there known frameworks that explicitly treat meaning units (not tokens) as the primary representation layer?
2. Any work combining scriptural corpora (like Qur’an) with embedding alignment or semantic topology analysis?
3. Has anyone attempted morphology-driven semantic unit construction at scale?
---
If there is relevant work, references, or even partial overlap, I would appreciate being pointed to it.
If not, I am open to discussion and collaboration.
Best regards,
Uğur Sürmeli
Independent Researcher — NeuroCosmology (NK)