Salam dear SIGARAB Community
I hope you are all doing well.
I am writing to share a piece of work I developed some time ago on Hadith isnad parsing, which I would like to publicize now. Isnad parsing converts plain hadith text into a structured representation of the chain of narration mentioned within the isnad portion of the text.
The goal is to produce a graph-like representation of transmission, where narrator-to-narrator relations are modeled as edges for analysis.
A key requirement is preserving multiple distinct narration paths of a given hadith, rather than merging them into a single sequence. These paths are often explicitly marked in the text, for example by phrases such as "وأخبرنا" or "في رواية أخرى" and symbols like "(ح)", indicating alternative transmission routes.
To address this, I built a pipeline that processes raw isnad text and outputs structured relation tuples using pretrained Arabic transformer models:
Models: CAMeLBERT-CA, AraBART, AraT5 (Github Repo)
Input: raw Arabic isnad text
Outputs:
Duplets (chain-blind): (head, tail) — capture direct transmission links between narrators without distinguishing between multiple chains within the same hadith
Triplets (chain-aware): (head, tail, chain_number) — preserve distinct narration paths within the same hadith, enabling separation and comparison of parallel transmission routes
The base data for this work was scraped from the Ifta’ Sunnah Hadith platform, which provides a large structured corpus of hadith texts, chains of narration, narrator information, and many more features.
From this source, I curated the following derivative datasets:
Isnad parsing dataset: includes NER tags for narrator and narration word extraction, along with relation tuples used for fine-tuning the isnad parsing models
Graph-focused dataset: Hadith with linked chains of narration and JSON-Cypher conversion scripts designed to build property graph databases, supporting analysis at the level of individual hadith, multiple hadith, and cross-collection narration networks
This work can be extended in several directions, including:
enriching graph edges with additional information in the form hyper-relational tuples, now we have (head, tail, chain_number) we can add for example:
Narrator IDs: extending parsing to include entity linking for narrator disambiguation, mapping variant surface forms of a narrator to a unified profile
Narration words: incorporating terms such as "حدثنا" and "أخبرنا" to analyze transmission styles and linguistic patterns
building interactive graph playground tools for exploring and analyzing isnad structures across mutiple hadiths and large collections
and of course re-evaluating the work at hand.
Resources:
I would appreciate any feedback, thoughts, or suggestions, and I am open to collaboration.
Best regards,
Jehad Oumer
Linkedin: https://www.linkedin.com/in/jehadoumer/
Masha Allah,
Wonderful work.
I would suggest taking it a step further and generating the asanid for comparison.
Perhaps using something like https://mermaid.ai/open-source/ to generate/visualize the hadith chains.
And if you'd like to build further upon that, you can add the biography details as popups for each narrator.
All the best.
--
You received this message because you are subscribed to the Google Groups "SIGARAB: Special Interest Group on Arabic Natural Language Processing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sigarab+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/sigarab/CA%2BWSfiOckcDjuiDR_kQjmU3FR8Fuid89xoWjdX7A7w6NupuQHQ%40mail.gmail.com.
-- Find me at: https://www.kentoseth.com https://fosstodon.org/web/@kentoseth