Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[NGL-project] Word frequency and word length

2 views
Skip to first unread message

Jack Durst

unread,
Mar 29, 2000, 3:00:00 AM3/29/00
to
Moderator's note: All NGL-project posts while I was in Seattle have been
lost. :( Use the archives at onelist.com

---------- Forwarded message ----------
Date: Wed, 29 Mar 2000 17:38:11 -0800 (PST)
From: Gerald Koenig <j...@NETCOM.COM>
Subject: Re: [NGL-project] Word frequency and word length

From: Gerald Koenig <j...@NETCOM.COM>


>Looks to me like there's no significant difference, or a slight difference
>in favour of NGL in terms of word lengths for the most common English words.
> As with all languages, there are places where one is better than the
>other, but on adverage NGL is slightly shorter in writing and about the same
>in speach as English.
>
>Sincerely,
> Jack Durst


When we discuss the differences in sylables-per-concept in natlangs and
conlangs, it is really hard to get a handle on them because of the lack
of good data and the extreme variations depending on context and
concept. Concerning conlangs, a look at the babel translations will
show the tendency for conlangs to get very wordy. I doubt that we will
be able to reduce word coinage to a rational formula, Steven mentioned
the difficulties. In spite of the fact that the Ogdens are no longer
than their english counterparts, the moment we leave the Ogdens we are
into "soundiness", ie moving our mouths more to say the same concept.
We should try to keep "soundiness" well within the range of major
natlangs, as a design goal, I believe. Personally I would like to see
a better "soundiness" than english, and I believe the bare VXT tense
system is slightly better in this regard.

If all applicable VXT contractions and english-equivalent nounlengths
were used, the rainwalk composition would have a length of 59 sylables,
instead of 71, a reduction of 27% over english. I believe the
composition is typical and that an accurate translation to Tokcir would
show a greater expansion from english.

This expansion calls for parallel vocabularys. On a TV quiz program
that was on as I wrote this, the question came up, what is the english
short word for a baby carriage. It's <pram> from periambulator. That
is an example of what I think would be a good thing in NGL; a short
word that some group uses a lot, and a long derived form for those who
don't deal with the concept frequently. We can have it both ways.

Summary for paninpu.

English version: 71 sylables
Tokcir version: 74*
Original VXT version: 78
Contraction VXT version: 75
xoyig + contractions VXT verson: 68*
VXT using english length nouns: 59

------------------------------------------------------------------
>
> Yas mi pa pas pasoem pir ku pu et tos ton fis kormarpa.16 Mip 'ec xiriem
> guibluos ke am pas zos ko et inko col ku xoyes.a. 18 Su am xir ikor ku
> ton, pIMirdo anal.11 Luoses mar am gom sievig et inpasig ton ku xucte.14
> Fuyo `ol luos pa *kari.7 *Siemael zonig pa *tertok, et mi pasliv. 12

16 18 11 14 07 12 =78

Rewritten like TVS:
a. zos ko et inko col ku xoyes 9 ---> xoyig. 2. -7--->71

*Note: <xoyig>, "wavely", doesn't capture the full meaning of the vxt
vector adverb series above, but in order to compare the VXT and TVS
versions I substituted <xoyig> in the VXT version to compare sylable
counts.

Contractions:
b. mi pa >mip; Su am-->Sam; ku xukte-->0, 5-->-3.----> 75

Words that expand leaving english:
c. thunder, niunfan -1
d. pier, kormarpa -2
e. greb, guibluo -2
f. food, anal -1
g. sand, xukte -1
h. bird, luo -1 ?
i. cry, kari -1
--------------------------
-9
Contractions, ku: -3
-------------------------------
-12
xoyig -7
=====================================
-19

> Today I walked through the rain and out onto the ocean pier.16 I saw
> swimming grebe moving up and down with the waves. 14 They were swimming
> below the surface looking for food. 13 Sea birds (gulls) sat silent and
> motionless on the sand. 12 Only one cried out.6 Thunder suddenly
> threatened, and I left. 10
16 14 13 12 6 10 =71

In Tokcir (translating from the english), this would be:

Yasig[1], paomnos u' q pu u' kuajvodkanto[2].13 'Ekomnot òl guribuloser
xiri[3] ke tibernos xoyig[4].17 Xirernos :under: q vod, 'a vupa sahoer
poiso.15 Gomes inyùgig & sievig luosesom mari u' xucte ace[5]; ol
fuyo niunfemot.24 Ku fan estànemno[6], loj atibeomnot 'enyo. 14
13 17 15 15 14= 74

Literal translation:
Todayly, walk-I-narrative-past-imperfect in the rain to big-boat-place.
Sense-I-narrative-past-perfect some grebe-paucal swim-Adj that
move-3p-paucal-narrative-past-imperfect wavely.
Swim-3p-paucal-narrative-past-imperfect under the water, past-imperfect
try find-they fish. Sit-they stilly & quietly birds-nominative sea-Adj at
sand there; one only sound-make-he-narrative-past-perfect. The
thunder/lightning was there, thus I went-from there.

Jerry

------------------------------------------------------------------------
Good friends, school spirit, hair-dos you'd like to forget.
Classmates.com has them all. And with 4.4 million alumni already
registered, there's a good chance you'll find your friends here:
http://click.egroups.com/1/2622/1/_/_/_/954380293/
------------------------------------------------------------------------


Jack Durst

unread,
Mar 30, 2000, 3:00:00 AM3/30/00
to NGL-p...@onelist.com
On Wed, 29 Mar 2000, Gerald Koenig wrote:

> From: Gerald Koenig <j...@NETCOM.COM>
>
>
> >Looks to me like there's no significant difference, or a slight difference
> >in favour of NGL in terms of word lengths for the most common English words.
> > As with all languages, there are places where one is better than the
> >other, but on adverage NGL is slightly shorter in writing and about the same
> >in speach as English.
>

> When we discuss the differences in sylables-per-concept in natlangs and
> conlangs, it is really hard to get a handle on them because of the lack
> of good data and the extreme variations depending on context and

This is something I can agree with you on 100%, not only are there extreme
varriations depending on context, but there are also language-dependant
differences. Phonology makes a *huge* difference in terms of adverage
number of sylables per concept. Hawai'ian takes an adverage of 3 sylable
per morpheme to express what English can in an adverage of 1.2; this is
entirely due to the restrictive phonology of the language.

Native languages vary significantly as to their adverage number of
sylables per concept, both due to phonological effects and to the
particular style of the language.

English is not the only language in the world, and NGL is similar to the
ones its closest to, phonologically and stylistically.

> concept. Concerning conlangs, a look at the babel translations will
> show the tendency for conlangs to get very wordy. I doubt that we will

Unfortunately, it's a useless comparison. Translations are almost
*always* longer than original texts. Compare the Heberew, Latin, and
English versions of the Babel texts for instance.

Even though of the three languages, English has the shortest adverage word
length, it is the longest of the three texts. This is partly because it's
a more distant translation.

Add to this the fact that most of the conlang versions of the babel text
are third-degree translations (translations of the English, itself a
translation of the Hebrew) and we've set an impossible task. Of all the
conlang translations fo the babel text I've seen, only one, speedwords (a
language designed as a shorthand system) beats the original.

Though mutual translations of the same text are convenient as tools for
comparing *between* conlangs, it's useless to compare to an original or a
translation which served as a source.

For example, take the following passage from Ku Volef, which was written
originally in NGL, and it's English translation (note that this is a loose
translation into English and looses most of the nuance of the original, a
strict translation would be longer.)

Ku inyŕs beh sugorjemno, Luna ta vuerjem & xufem cinig ezo mogfesnos u'
baeresac hibsig paemnos Martinom u' kuaje. Yasemno xietem col q ru'es
invůer fo inyŕs yex. 'Ekemnot ňl niunvolefem ke becfemnot u' q inyŕs
inyůgi, kivjanintofemnos & 'a ta paem cike 'ekemnot ňl pavintorem ke pa
wi' ru'; `lehjemzas fo dasoma yerman wi' moc´ Martin nemofemnos,
invňnifemnot q kuaj u' dosru'... (122 sylables)

The night was beautiful again, the moon was full and shining brightly
through the mist, Martin was walking through[?] the trees to his vehicle.
He began to drive along the empty streets because it was getting late. He
heard a wolf's howl cut into the still night, he was shaking and walked
when he saw a hitchhiker who walked alongside the road; "It may be good
for to have someone beside me." Martin, was thinking, (as) he pulled over
to the side of the road ... (108 sylables)

The translation is shorter than its source, but not nearly by as much as
the difference in adverage word length between the two languages would
seem to predict. By all rights, the NGL *should* be in the 150 sylable
range given the phonological and vocabulary size differences alone.
(compare the English as it would be if we imported the words straight into
NGL) The difference? A translation effect, a stricter translation would
be even longer.

> the difficulties. In spite of the fact that the Ogdens are no longer
> than their english counterparts, the moment we leave the Ogdens we are
> into "soundiness", ie moving our mouths more to say the same concept.

You're arguing against yourself here, your previous arguement was that
commonness was the criterion on which words should be judged. Now, it's
"soundiness" (which isn't defined very well)

> We should try to keep "soundiness" well within the range of major
> natlangs, as a design goal, I believe. Personally I would like to see

OK, but can we at least pick a natlang with a comparable phonology as our
basis for comparison? Spanish, for instance, which has a phonology and
stress system comparable to NGL.

Unscientifically comparing 7 major languages on this vague criterion
in my head, it comes out something like this:
1. Chinese
2. English
3. French
4. Spanish
5. Arabic
6. German
7. Japanese
NGL as it is, to me, sounds as if it's somewhere between Spanish and
German, depending on the text.

If you defined your terms, I could compare a little more easily.


> composition is typical and that an accurate translation to Tokcir would
> show a greater expansion from english.

I don't doubt it... Compare the Tokcir to the Japanese, however, and I'd
guess we'd come out ahead...

Setting sylable counts for translations from English is an unfair goal.
First of all, translation effects insure that the translation will be a
little longer than it rightly should be, secondly, we're not native
speakers and we don't have a full command of the language with which to
compensate in our writing, and thirdly, the phonology of NGL would have to
be *completely* exhausted to match the word lengths of English over a
similar vocabulary size (English dosn't use 30% of it's phonological
capacity.)

> This expansion calls for parallel vocabularys. On a TV quiz program
> that was on as I wrote this, the question came up, what is the english
> short word for a baby carriage. It's <pram> from periambulator. That
> is an example of what I think would be a good thing in NGL; a short
> word that some group uses a lot, and a long derived form for those who
> don't deal with the concept frequently. We can have it both ways.

In that case, why even have offical second words at all? Every field can
come up with its own unofficial slang when they need it.


Sincerely,
Jack Durst
Sp...@sierra.net
[this posting written in Net English]

Jack Durst

unread,
Apr 3, 2000, 3:00:00 AM4/3/00
to
---------- Forwarded message ----------
Date: Mon, 3 Apr 2000 02:13:47 -0700 (PDT)
From: Gerald Koenig <j...@NETCOM.COM>
Subject: Re: [NGL-project] Word frequency and word length

>From: Jack Durst <sp...@sierra.net>
>Subject: Re: [NGL-project] Word frequency and word length

>On Wed, 29 Mar 2000, Gerald Koenig wrote:
>> From: Gerald Koenig <j...@NETCOM.COM>
>>>=Jack
>>>Looks to me like there's no significant difference, or a slight difference
>>>in favour of NGL in terms of word lengths for the most common English words.
>>> As with all languages, there are places where one is better than the
>>>other, but on adverage NGL is slightly shorter in writing and about the same
>>>in speach as English.

>>=Jerry

>> When we discuss the differences in sylables-per-concept in natlangs and
>> conlangs, it is really hard to get a handle on them because of the lack
>> of good data and the extreme variations depending on context and

>=Jack


>This is something I can agree with you on 100%, not only are there extreme
>varriations depending on context, but there are also language-dependant
>differences. Phonology makes a *huge* difference in terms of adverage
>number of sylables per concept. Hawai'ian takes an adverage of 3 sylable
>per morpheme to express what English can in an adverage of 1.2; this is
>entirely due to the restrictive phonology of the language.
>
>Native languages vary significantly as to their adverage number of
>sylables per concept, both due to phonological effects and to the
>particular style of the language.

I found the following facts about Hawaiian in Comrie, the World's Major
Languages. I believe Julian once said that the NGl phonology had an
austronesian flavor, correct me if I'm wrong, Julian.
1.Hawaiian has just 13 phonemes.
2. Morphological complexity is likewise average to low.

I also found this wordlist for Futuna, a Polynesian language.

two lua
five; hand lima
eye mata
ear talinga
stone fatu
fish ika [ikan in a relative language]
louse kuto
weep tangi
die mate

So again here is the sylable expansion from english I refer to and we
agree exists.

I have used the word "soundiness" to describe this, but I want to
refine the concept in response to Jack's request. I equated the word
to sylables per concept. What I find I really have in mind is the work
expended to express a concept. The units would be for example Newton-meters
per concept. Don't laugh just because it's strange. I am thinking of the
actual muscular work needed to express a concept. An analogy would be
the energy spent swimming across a pool. My young grandsons spend a great deal
of energy dog-paddling across on their first test, with a lot of splashing.
Skilled swimmers easily move across using the "Australian crawl", and
are hardly out of breath.

Imagine a multisylable word that is hard to pronounce, and another of
equal length that rolls trippingly from the tongue. Obviously the second
consumes less energy. In the present case of the above wordlist, (it
seems to me) it is just as easy to say <ika>,2 sylables, as it is to say
<fish>, 1 sylable. But <talinga> takes more energy than <ear>. So work
expended does not exactly match a sylable count.

Given that Julian has given us an easy to pronounce phonology, sylable
counts alone are not an accurate gauge of energy costs of speaking. I
surmise that there is a law of language which states that energy costs
of words tend to be minimalized, accounting for Zipf's law and also for
(I guess) an approximate equivalence between languages of energy costs
of expression. I stress "approximate". I would like to see NGl
consciously designed to minimize energy costs per concept expressed.
If NGL could be designed with a streamlined and energy-efficient
lexicon, it would be one more advantage that would attract speakers.

To do this would require deep knowledge of the energy of sound
production combined with good word (concept) frequency tables. As far as
I know the data do not exist. But perhaps we could start by getting a
feel for it, as above. Cross language comparisons of concepts could
perhaps be helpful. If an energy cost of the IPA sounds could be
developed, it might help. Context would be a problem. But I appeal first
to the language instincts of the wordmakers, I think we could get a feel
for the energy and frequency of concepts, and factor in these
considerations, perhaps even unconsciously, as Jack mentioned.

>
>English is not the only language in the world, and NGL is similar to the
>ones its closest to, phonologically and stylistically.

Phonlogically it does seem like an expanded polynesian, the style is yet
to be discoverd as we only have three or four exponents. I do miss
Carlos, but he's still loaded up with schoolwork.

>
>> concept. Concerning conlangs, a look at the babel translations will
>> show the tendency for conlangs to get very wordy. I doubt that we will

>Unfortunately, it's a useless comparison. Translations are almost
>*always* longer than original texts. Compare the Heberew, Latin, and
>English versions of the Babel texts for instance.

I find that I can translate english tense to VXT with almost always a
shortening. It's not much, because I think english is one of the most
concise languages. And I have no doubt I could create an english--> NGL
translation that shrank if I had a derivation-free and frequency based
lexicon to work with. And since I worked from a very raw english
version given by a hebrew speaker, I think I could do it as well for
hebrew, but that will never be known as I don't intend to learn hebrew
just for the purpose.


>
>Even though of the three languages, English has the shortest adverage word
>length, it is the longest of the three texts. This is partly because it's
>a more distant translation.
>
>Add to this the fact that most of the conlang versions of the babel text
>are third-degree translations (translations of the English, itself a
>translation of the Hebrew) and we've set an impossible task. Of all the
>conlang translations fo the babel text I've seen, only one, speedwords (a
>language designed as a shorthand system) beats the original.
>
>Though mutual translations of the same text are convenient as tools for
>comparing *between* conlangs, it's useless to compare to an original or a
>translation which served as a source.
>
>For example, take the following passage from Ku Volef, which was written
>originally in NGL, and it's English translation (note that this is a loose
>translation into English and looses most of the nuance of the original, a
>strict translation would be longer.)
>

>Ku inyàs beh sugorjemno, Luna ta vuerjem & xufem cinig ezo mogfesnos u'


>baeresac hibsig paemnos Martinom u' kuaje. Yasemno xietem col q ru'es

>invùer fo inyàs yex. 'Ekemnot òl niunvolefem ke becfemnot u' q inyàs
>inyùgi, kivjanintofemnos & 'a ta paem cike 'ekemnot òl pavintorem ke pa


>wi' ru'; `lehjemzas fo dasoma yerman wi' moc´ Martin nemofemnos,

>invònifemnot q kuaj u' dosru'... (122 sylables)


>
>The night was beautiful again, the moon was full and shining brightly
>through the mist, Martin was walking through[?] the trees to his vehicle.
>He began to drive along the empty streets because it was getting late. He
>heard a wolf's howl cut into the still night, he was shaking and walked
>when he saw a hitchhiker who walked alongside the road; "It may be good
>for to have someone beside me." Martin, was thinking, (as) he pulled over
>to the side of the road ... (108 sylables)
>

A very neat NGL composition.

>The translation is shorter than its source, but not nearly by as much as
>the difference in adverage word length between the two languages would
>seem to predict. By all rights, the NGL *should* be in the 150 sylable
>range given the phonological and vocabulary size differences alone.
>(compare the English as it would be if we imported the words straight into
>NGL) The difference? A translation effect, a stricter translation would
>be even longer.

I agree there are general translation effects of lengthening from the
native language, but there are also inherent differences in the target
languages, otherwise all translations from a source would be the same,
and you point out they're not. And IMO, english is a very concise
language.

>
>> the difficulties. In spite of the fact that the Ogdens are no longer
>> than their english counterparts, the moment we leave the Ogdens we are
>> into "soundiness", ie moving our mouths more to say the same concept.
>You're arguing against yourself here, your previous arguement was that
>commonness was the criterion on which words should be judged. Now, it's
>"soundiness" (which isn't defined very well)

I don't feel any contradiction.

>
>> We should try to keep "soundiness" well within the range of major
>> natlangs, as a design goal, I believe. Personally I would like to see

>OK, but can we at least pick a natlang with a comparable phonology as our
>basis for comparison? Spanish, for instance, which has a phonology and
>stress system comparable to NGL.
>
>Unscientifically comparing 7 major languages on this vague criterion
>in my head, it comes out something like this:
> 1. Chinese
> 2. English
> 3. French
> 4. Spanish
> 5. Arabic
> 6. German
> 7. Japanese
>NGL as it is, to me, sounds as if it's somewhere between Spanish and
>German, depending on the text.

I agree, but there is no reason why NGL could not be moved up to the
top of the range with a carefully optimized high-frequency parallel
lexicon. Maybe not on the basis of sylable counts but rather on the
basis of work/concept, using the precise physical definition of work
above, instead of the vague sylable-count criterion I first offered.


>
>If you defined your terms, I could compare a little more easily.
>
>> composition is typical and that an accurate translation to Tokcir would
>> show a greater expansion from english.
>I don't doubt it... Compare the Tokcir to the Japanese, however, and I'd
>guess we'd come out ahead...
>
>Setting sylable counts for translations from English is an unfair goal.
>First of all, translation effects insure that the translation will be a
>little longer than it rightly should be, secondly, we're not native
>speakers and we don't have a full command of the language with which to
>compensate in our writing, and thirdly, the phonology of NGL would have to
>be *completely* exhausted to match the word lengths of English over a
>similar vocabulary size (English dosn't use 30% of it's phonological
>capacity.)

I agree that counting sylable expansions from english is not a perfect
yardstick, but there is something there we need to measure, and until we
make a better yardstick, that's all we have.


>
>> This expansion calls for parallel vocabularys. On a TV quiz program
>> that was on as I wrote this, the question came up, what is the english
>> short word for a baby carriage. It's <pram> from periambulator. That
>> is an example of what I think would be a good thing in NGL; a short
>> word that some group uses a lot, and a long derived form for those who
>> don't deal with the concept frequently. We can have it both ways.

>In that case, why even have offical second words at all? Every field can
>come up with its own unofficial slang when they need it.

I had more in mind here than specialized vocabularies. I have in mind
parallel vocabularies wherever the derived forms are cumbersome
compared to energy/frequency/concept optimized synonyms, just as in
VXT there are equivalent forms for the same tense construct, and I
expect subtle connotations to arise for each member of the pair, and
enrich the language. I do think we have enough phonological resources
to do both. We never set out to make NGL simple or easy to learn,
although its core would retain those qualities.

I know we're all just trying to make a better language in our individual
ways, and I do appreciate your feedback, Jack. There are so many
parameters to be taken account of, I hope we can reach the best
combination for the language as a whole.

Jerry

>Sincerely,
> Jack Durst
>Sp...@sierra.net
>[this posting written in Net English]
>
>
>

>------------------------------------------------------------------------
>You have a voice mail message waiting for you at iHello.com:
>http://click.egroups.com/1/2377/1/_/_/_/954405321/
>------------------------------------------------------------------------
>
>
>

------------------------------------------------------------------------
Play Games, Have Fun, Win a Trip - at pogo.com! What's the best game
you have ever played? Chances are it's at pogo.com. Visit today and
enter our $25,000 Games for Everyone sweepstakes! Refer your friends
& earn extra entries!
http://click.egroups.com/1/1470/1/_/_/_/954753229/
------------------------------------------------------------------------


Jack Durst

unread,
Apr 3, 2000, 3:00:00 AM4/3/00
to
---------- Forwarded message ----------
Date: Mon, 3 Apr 2000 06:41:10 -0500
From: Carlos Thompson <carlos_...@correo.javeriana.edu.co>
Subject: Re: [NGL-project] Word frequency and word length

Gerald Koenig wrote:

> Phonlogically it does seem like an expanded polynesian, the style is yet
> to be discoverd as we only have three or four exponents. I do miss
> Carlos, but he's still loaded up with schoolwork.

Well, i'm lurking for a while, still stuck in my schoolwork and now I'm
begining a job, so I sparcely have time to skim messages...

The thread is interesting and I have some ideas I would like to explore when
I have little more time.

-- Carlos Th
=================================================
Carlos Eugenio Thompson Pinzón
Bogotá, Colombia
ICQ: 19156333
URL: http://www.geocities.com/Paris/Rue/9028/

------------------------------------------------------------------------
Save up to 57% on Electronics!
Find incredible deals on overstocked items with Free shipping!
http://click.egroups.com/1/2710/1/_/_/_/954761942/
------------------------------------------------------------------------


0 new messages