From Tokens to Meaning Units: A Marble-Based Representation Proposal (seeking discussion)

32 views
Skip to first unread message

Uğur Sürmeli

unread,
Mar 26, 2026, 4:20:16 AM (7 days ago) Mar 26
to sig...@googlegroups.com
Hello SIGARAB members...

I am reaching out to the SIGARAB community with a focused question and a proposal.

Question:
Are there ongoing works that move beyond token-based representations toward meaning-grounded units—especially in Qur’an-centered studies or semantically anchored embedding models?
I am particularly interested in approaches where the representation unit is not a token/subword, but a stable semantic entity.

Context (brief):
Most current NLP systems (Word2Vec, GloVe, FastText, BERT variants) operate on tokens or subword units. Even when they capture semantic similarity, the representation layer itself is still fundamentally symbolic and surface-driven.

Our Direction (Marble Hypothesis):
We are experimenting with a shift from Token → Marble (meaning particle).

- A Marble is defined as a dynamic but stabilizing meaning node.
- Words do not directly represent meaning; instead, they activate candidate Marbles.
- Context (syntax, morphology, operators) collapses this “probability cloud” into a specific Marble.
- Morphological derivations (especially in Turkish) generate new Marble candidates rather than mere surface variants.

This leads to a three-layer interpretation model:

- A-field (Meaning): internal semantic core of a Marble
- B-field (Relations): interactions between Marbles
- C-field (State/Consciousness): temporal, modal, and operator-driven dynamics

We are currently testing this idea using:

- Multi-language Qur’an corpora (TR, EN, AR, etc.)
- Embedding baselines (FastText, Word2Vec, SBERT)
- Morphology-aware pipelines (Turkish morphological analysis)
- Anchor-based semantic clustering (recurrent verses, named entities, semantic motifs)

Why this matters:
Token-based systems approximate meaning.
We are trying to model meaning as a first-class computational object.

---

Open Questions to the Community:

1. Are there known frameworks that explicitly treat meaning units (not tokens) as the primary representation layer?
2. Any work combining scriptural corpora (like Qur’an) with embedding alignment or semantic topology analysis?
3. Has anyone attempted morphology-driven semantic unit construction at scale?

---

If there is relevant work, references, or even partial overlap, I would appreciate being pointed to it.

If not, I am open to discussion and collaboration.

Best regards,
Uğur Sürmeli
Independent Researcher — NeuroCosmology (NK)

 Uğur Sürmeli

Mohamed H.

unread,
Mar 26, 2026, 6:48:16 AM (7 days ago) Mar 26
to sig...@googlegroups.com

assalamu alaikum,

Perhaps you can ask the team at https://www.qurancomputing.org/

They are a Quran-focused research group.

Hope that helps.

Shukran,

--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/CAA9ianDsxAko8aEa-n-t2H3Z8HXK9PdWttc%3D-TR5gWxEao8EfQ%40mail.gmail.com.
-- 
Find me at:
https://www.kentoseth.com
https://fosstodon.org/web/@kentoseth

Ibad-ur-Rehman Rashid

unread,
Mar 27, 2026, 12:12:32 AM (6 days ago) Mar 27
to SIGARAB: Special Interest Group on Arabic Natural Language Processing
Assalam u Alaikum

There are several methodologies applied in different research.

Previously we also done nearly same kind of work you are suggesting, but for a specific Quranic domain.

Our focus is on Quranic Scientific Exegesis Ontology. We created this in three layers Ayah Ontology Layer, Exegesis Ontology Layer for Grounding, and Scientific Ontology layer for definition of Scientific concepts described in Exegesis. The relation between Scientific concepts is represented as structural, and causal-temporal relations.

So I guess you may find something meaningful from that : https://aclanthology.org/2026.abjadnlp-1.22/

Currently we are working on better retrieval of these nodes,  multihop analysis, and RAG approach to test usage of our dataset.

I will also look forward seeing your approach to that Inshaa Allah.

Best Regards!
Ibad-ur-Rehman Rashid
Reply all
Reply to author
Forward
0 new messages