Language of the Qur'an

2 views
Skip to first unread message

klei...@astound.net

unread,
Sep 25, 2005, 4:53:50 PM9/25/05
to Islam_Origins
The Qur'an would seem to be a natural object of study for corpus
linguistics. But, so far as I can tell, such a thing has never been
undertaken. This thread is intended to explicate and perhaps even
create such a study.

Here I intend to describe the corpus I mean to study.

The corpus is the 114 surats of the traditional consonantal text of the
Qur'an. By consonantal I mean the text with the 28 consonants, ta
marbuta and a word divider - 30 symbols in all. The only recognized
punctuation is the separation into ayats.

The use of ta marbuta is a matter of taste. I have chosen to include it
in the consonantal text because the two dots used in ta marbuta are
clearly the same two dots used in ta itself and it is at least
plausible that they would have been introduced at the same time. If any
issues arise concerning whether or not some ta marbuta is a ha this
decision will have to be revisited.

Note especially that I do not include hamza in the corpus text.

It is my feeling that the Qur'an must be assumed to have been written
down in the script and dialect already in use among the persons who
wrote it down. These persons were, one assumes, the earliest Muslim
community.

We are quite sure that the original scipt did not include vowels and
the other diacritic marks that presently appear in the Qur'an. Hamza
was also added later, possibly at the same time. We are less convinced
that the dots which differentiate the basic character shapes were used
in the original text of the Quran primarily because so many manuscripts
have survived without these dots. I believe there is adequate evidence
that the dots were is use in pre-Islamic times and we should assume
they were used when the Qur'an was first written down. The
harder-to-read "Kufic" manuscripts appear to have been deluxe editions
intended for show rather than actual reading.

The ayat division in the Qur'an shows some variations in existing
manuscripts but nothing seems to hinge on exactly where the ayats are
divided. The ayats "rhyme" to some extent, but I intend that this
feature shuold be ignored.

Hence the target corpus is a text written in a twenty-nine letter
alphabet (twenty-eight characters and a word divider) that is
punctuated by being divided into short passages called ayats which are,
in turn, combined to larger units called surats of which there are 114.
I do not consider surat titles or ayat numbers part of the text.

The status of the word divider perhaps, deserves a comment. There are,
in the text, occurances of the character alif that function, according
to the traditional reading, as word dividers. We intend to treat them
as ordinary consonants. The division of text into words seems to be
very old, even pre-Islamic and I intend to accept the traditional
divisions. In the present day Arabic script many characters have
different shapes at the end of the word which makes word division
fairly easy. We do not know what the original script actually looked
like, but it seems reasonable to assume word divisions were marked.

Because I work with tools that work best with ASCII-coded texts I
intend to work in a version of the Arabic alphabet transliterated into
ASCII. In the transliteration I use the usual Arabic alphabet appears
as:

A B T Th G H Kh D Dh R Z S Sh C Ch X Xh O Gh F Q K L M N E W Y

and ta marbuta is Eh. The word divider is, of course, a space. This
particular transliteration is based on the ancestoral phonetic values
in Phoenician of the present day ASCII (meaning English) characters.
Note that this transliteration is designed to have vowels and other
diacritics added as lower case letters.

For example: BSM ALLE ALRHMN ALRHYM

I intend to separate morphemes with a period, B.SM ALLE AL.RHMN
AL.RHYM, and to use other special characters as seems convenient.

Just because one is doing corpus linguistics one does not have to
ignore the traditional meaning of the text. I propose an operating
strategy of accepting the traditonal meaning, whenever possible, as a
basis for proposing an analysis and testing that analysis against the
corpus as a whole.

Reply all
Reply to author
Forward
0 new messages