Paraphrasing

Mok-Kong Shen

unread,

Sep 20, 2009, 6:06:38 AM9/20/09

to

Hi,

I suppose everyone has in his school days done quite some excercises
in paraphrasing. Though being a layman in linguistics, I am very
interested to know whether (and how) the diverse techniques/principles
employed by humans in doing good paraphrasing could somehow be
systematically analysed/characterized and hence eventually be
profitably used in automatic machine processing.

Being non-native and not very good in English, I'll constrain myself
to pointing out the fact that there is apparently a fairly wide
range of possibilities of obtaining variants of given pieces of texts,
because there seems to be, in my understanding, good and rather
general exchangeability in e.g. the types of constructs shown below:

UN vs. United Nations

cheap vs. inexpensive

my friend vs. a friend of mine

The dog chases the cat. vs. The cat is chased by
the dog.

I need your help. vs. Could you help me?

Thanks,

M. K. Shen

Joe Devin

unread,

Sep 20, 2009, 2:01:37 PM9/20/09

to

Mok-Kong Shen wrote:

This problem would seem to be very similar to the problem of machine
translation but more difficult. Why? Because in machine translation, as
soon as the machine finds any single way of outputting the meaning in the
target language, it is finished, whereas what you are asking for is really
how to exhaust all the ways the machine knows of generating text for the
same meaning. Also, a translator probably does not need to be able to
switch back and forth between active and passive voice and a few other
things, since all it is trying to do is to get as close to the original
form for the meaning as possible. In other words, if in the original text
the active voice was used, then all the MT system needs to do is find the
active-voice construction in the target language, and it is done.

So it would appear that for this kind of system a second level of internal
representation is required--one in which all meanings are always encoded in
the same way regardless of their original form (regardless of whether in
the original form the passive or active voice were used, etc.). And a text
generator would have to be built that would generate all the surface
(textual) forms available to the system (known by the system) for every
meaning encoded in this fashion (in this second level of Internal
representation).

And this question is a good one because it at once answers the question,
"If there is a generic subsurface language underlying all human languages,
then is there also a deeper sub-subsurface language underlying that one?"
and the answer would seem to be an unequivocal YES. So first you get to
the subsurface language (our universal grammar) by reducing every word in
the text to nothing but a semantic link and a syntactic link emanating from
the same node (recall my theorem stating that:)

THEOREM: Every word of every coherent phrase can be reduced to a semlink
and a synlink emanating from a common point, or node, after which the
external symbol, be it written or spoken, can be safely discarded.

But it will be obvious that the constructs built from the raw materials
provided by this universal subsurface language will closely mirror the
original surface text from which it has been "lifted." So how do we get
from the subsurface language to the deeper sub-subsurface representation?
This is a great mystery, and one that I am as yet incapable of answering
at this time. Of course it would be possible for us to design a deep
sub-subsurface language built from the same links and nodes used in the
subsurface one but employing strict arbitrary rules designed to ensure that
every deep construct having the same meaning would always come out exactly
the same, but this would be a nontrivial undertaking requiring extensive
linguistic knowledge, and might take a massive investment of time and
expertise. My other complaint is that from my experimental acquaintance
with language, I have never observed nature doing anything like this
because this would involve RULES, whereas nature seems to avoid rules and
operate by means of precedent.

So the question is then not whether a deeper sub-subsurface can be
developed, but, "How does nature do it?" because if we have found anything
in all of our tedious research, it is that (in linguistics at least) nature
always does things right, and we always run into trouble as soon as we
deviate in any significant way from nature. And this is no real wonder,
because even a physical phenomenon like human flight could never be
achieved till men started carefully observing nature, understanding what
was happening in nature, and emulating it.

So how can we be so sure that we have the subsurface language right (the
universal grammar) and not have a clue about the sub-subsurface one?
Because of my theorem, above. With this theorem we are able to determine
that words hang together syntactically, that we can recognise compound
words when there is no dependency between them, that every word must link
to a meaning in the ontology, etc., leaving only minor questions not
completely resolved. For example, it may not be immediately apparent that
auxiliary verbs are really just verbs being used as adverbs, and we might
make the mistake of interpreting them as top verbs. For example, take the
phrase, "Birds can fly." At first glance it might seem like "can" is the
top verb, "birds" is the subject, and "fly" is a verb being used as object
(a THING that birds can do). But after we come to a deeper understanding
of language, and understand the fact that every word must have two roles
(semantic and syntactic), and that these two roles need not necessarily
"agree," we will probably see that it is better to make "fly" the top verb,
and assume that "can" is really just a verb being used as an adverb in
order to tell us more about birds flying. Nevertheless, we find that our
automated systems will in fact continue working just fine no matter how we
choose the dependency structure for "birds can fly" as long as we always do
it in exactly the same way--in other words, so long as the internals remain
self consistent.

Thus it may be that word dependency is an arbitrary choice that goes on
subconsciously and may differ from individual to individual. If this is
so, then maybe the way one comes up with a sub-subsurface representation is
also an arbitrary choice made subconsciously by individuals, and we are
free to design our own deep meaning representations without getting into
trouble by violating some law of nature. And then another question arises,
namely whether the marked differences we observe in human intelligence
might be the result of subconsciously but arbitrarily choosing one kind of
dependency structure over another, or whether one individual can show such
a marked "gift for languages" above his/her fellows because of some
accidental good choice for the method of deep representation.

Needless to say, mountains more research and careful observation are
required in all these crucial areas. But meantime what are humans doing?
Ha-ha, you guessed it! Quibbling over the Loebner prize, for which people
should just shut up and be thankful. Hating people for making discoveries
that, had they been on their toes, they themselves might easily have made.
Hating people for disagreeing with them when the facts are right there
staring them in the face. Etc., etc., etc., ad infinitum. This is the
hateful, petty, beggarly, miserable, disgusting, shameless human race from
which I have sprung, and I am a part of it. Can I also find truth by going
for its essence and then discarding these outward symbols of human
depravity just as the parser goes for the elegant but hidden linkages of
language after which it can safely dispense with symbolic and irregular
outward word forms?

--Chaumont Devin.

http://panlingua.net
http://witchit.com
http://chaumontdevin.com
http://oldmaluku.com

Mok-Kong Shen

unread,

Sep 21, 2009, 5:55:10 AM9/21/09

to

Joe Devin wrote:

[snip]

Thank you for your valuable comments which I snipped for space reasons.

I just like to add a point to my original post: If one restricts
oneself to those types of replacements examplified by the first
three I indicated, namely:

UN vs. United Nations

cheap vs. inexpensive

my friend vs. a friend of mine

would that be fairly doable at present? If yes, how should one best
proceed?

Thanks,

M. K. Shen

Ted Dunning

unread,

Sep 21, 2009, 3:42:25 PM9/21/09

to

Generally, the most usable state-of-the-art in document summarization
is based on snipping verbatim pieces of text out and presenting them
as a synopsis of the longer document. These snipped bits can be full
sentences or phrases and anaphoric resolution is sometimes done to
improve clarity. In some domains, this can work really, really well
because there are strong conventions about putting good summary
sentences into certain kinds of text. For example, scientific and
longer news stories are both pretty amenable to sentence level
summaries. Scientific or technical books (and other non-fiction)
books are often well described by phrase level snippets. Clusters of
technical documents often are well described by phrase level snippets
as well. Many other kinds of text are very poorly summarized using
these techniques.

What you are suggesting is more far reaching. There has been quite a
bit of work on more interesting (academically) approaches to
summarization with only very modest results. Methods such as you
describe are easy to implement at first, but rapidly run into one or
more of the typical NLP issues:

a) large lexical resources are required

b) a few narrow domains work well but not others

c) the system can be tuned for any particular set of examples, but
whenever it is let loose on novel input you get ludicrous results.

In terms of cost per impact to actual usable systems, the simplest
text snipping approaches dominate completely. They are relatively
easy to implement and give pretty good results. The more advanced
approaches quickly lead to very difficult implementations and very
commonly give dramatically worse results.

One opportunity you might have for improving things would be to use
techniques somewhat like what you are suggesting to determine a set of
snipped descriptors that have maximal coverage of a topic with a
minimal number of descriptors while maximizing some sort of estimate
of how understandable (or common or familiar) the terms in the snippet
might be. You might look at the carrot2 package for an example of
pretty good summarization of cluster contents.

Likewise, David Blei and John Lafferty have done a very nice job of
extending my work on this from the early 90's. See this paper for
details: http://arxiv.org/abs/0907.1013

pipeDream

unread,

Sep 22, 2009, 7:39:57 PM9/22/09

to

How can you be so sure there is a "deeper sub-subsurface language
" ? , May be there is no language at all at that level (leaving out
metaphors). Is is not possible that it simply an association with the
facts of the world?

While "How does nature do it?" might be interesting, it is not
relevant to the nlp task, except in an academic way. Sitting here in
India I am often told " they don't do like that in America" or that
"The Americans haven't done it" and my stock reply is that We should
be more worried about getting the end results than worrying about
Americans or Chineese.

"we always run into trouble as soon as we deviate in any significant

way from nature" is quite obviously patently false in many
technological areas. Latest I read about is a new way of making bricks
invented by a company in silicon valley. Tell me how we might benefit
by making bricks the way stones of caves are made.

Human flight is hardly based on emulating nature - the way birds and
bees fly is different from the way a moden plane flies.

As to linguistics ( not my prime concern ) there is no way you will
ever know
"How does nature do it?", Questions such as whether there is a
Universal grammar , wether Humans come with prewired grammar network
or wether they think in a language can never be answered with any
degree of certainity, You are free to vote for your pet theory. This
is because the working of the mind till date is a mystery. To say that
I observe a certain part of the brain is active (Using any form of
imaging of your choice)in language tasks is not the same as
understanding how nature does it. Perhaps it will remain a mystery
forever. In my not so humble opinion cognitive science is far too
wooly.

This leaves us with a very simple criterion for any theory. Does it
work? All else is secondary.

Question is not "How does nature do it?", but "In how many ways it
can be done and out of these which is the best?"

I actually had a look at your sites as I am on the look out for better
ways of doing things. I wasn't too happy with it .
I think "SLIM theory of language" -Prof. Dr. Roland Hausser is a
better bet.

> http://panlingua.nethttp://witchit.comhttp://chaumontdevin.comhttp://oldmaluku.com- Hide quoted text -
>
> - Show quoted text -

Joe Devin

unread,

Sep 21, 2009, 8:45:54 PM9/21/09

to

Mok-Kong Shen wrote:

> If one restricts
> oneself to those types of replacements examplified by the first
> three I indicated, namely:
>
> UN vs. United Nations
>
> cheap vs. inexpensive
>
> my friend vs. a friend of mine
>
> would that be fairly doable at present? If yes, how should one best
> proceed?
>
> Thanks,
>
> M. K. Shen

Surely. Simply use the global replacement feature on most any text editor.
The first examples you gave are simply interchangeable synonyms. The last
one ("my friend" vs "a friend of mine") can also be accomplished by global
replacement, but the meaning will end up slightly different.

As an example, you might first check for the existence of some character
string in your text, say "kkk." If "kkk" exists nowhere in your text file,
then you might convert (setting conversion to word-boundary mode) all
occurrances of "UN" to "kkk." Then you might convert all "United nations"
to "UN." And at last, you might convert all "kkk" to "United Nations." We
call this "swapping." When you are done, you will end up with every
occurrence of "UN" replaced by "United Nations," and vice versa.

You see? I am your knowledgeable friend. Stop seeing me as stupid enemy
and I will always love you.

--Chaumont Devin.

http://witchit.com
http://panlingua.net
http://chaumontdevin.com
http://oldmaluku.com

Joe Devin

unread,

Sep 22, 2009, 5:52:15 PM9/22/09

to

pipeDream wrote:

> "we always run into trouble as soon as we deviate in any significant
> way from nature" is quite obviously patently false in many
> technological areas.

You are 100% right, the critical phrase being, "in many technological
areas."

After 23 years of off-and-on research in computational linguistics
(modeling human languages and making them work on computers), I feel
confident in asserting that language is a special case.

Here in America sometimes people like to sound macho by saying things in
"mountain man" talk. So I can well remember my old instructor from the
late 1970s telling us that there is always more than one way of writing
some computer program or other because "there's always more than one way of
skinning a cat." And all during the years that I wrote business software,
this philosophy held. But when I started working with advanced linguistic
analysis, I chanced upon a very different world--one in which I became
convinced that the problem of language is quite different from anything
else I had ever dealt with, and that it is impossible to model language on
computers unless you get it over, say, 95% right. I have grabbed 95% out
of the air, of course, but what I am saying is that, yes, there is a tiny
margin of flexibility in language processing, but that it is very small.
This is NOT to infer that different styles of coding won't work for
natural language, but that one must get the theory exactly right or your
linguistic machine will stick and balk. Therefore this matter of
understanding just how NLP works in humans is of critical importance, and
not just an intellectual exercise as I had also imagined before. It was
only after this realization, and a long time after this realization, that I
was able to do what I did.

>Latest I read about is a new way of making bricks
>invented by a company in silicon valley. Tell me how we might benefit
>by making bricks the way stones of caves are made.

But you see, modeling language on computers in a way that really works is
much different from making bricks in the same way that, say, baking bread
is different from electronics manufacture. When a baker bakes bread,
he/she can add a pinch more salt or sugar and change the recipe slightly
here or there and the result will still be bread; but change the smallest
component in an electronic circuit and it may not work at all. So how does
the engineer make sure that his/her electronic circuit will actually work?
The only way is by knowing how the physics involved work beforehand.

>Human flight is hardly based on emulating nature - the way birds and
>bees fly is different from the way a moden plane flies.

Not true. Study history carefully and you will find that the first
significant contributor to the science of aviation was Leonardo, who gained
his knowledge by observing birds. But after Leonardo many more people
tried to fly without paying attention to nature and lost their lives for
their pains. One of the more successful designers was a seaman who spent
many years at sea and modeled an apparatus after the albatross, but he
failed to get it quite right. Later there was a German gentleman who
builtÀgliders from willow branches, got it right, and made many successful
flights before losing his life in one of them. At last came the Wright
brothers, who very carefully studied bird-wing structure and tested
emulations in improvised wind tunnels and managed to build the first
reliable aircraft.

Now people are trying to build tiny flying spy robots, which they are
learning to do by carefully observing the flight of insects.

In no case is mathematics sufficient for these tasks. In all cases they
are first understood by a close observation of nature and the math follows
successful experimentation. But once the math is right, it is possible for
others to USE the math to design new aircraft. Why? Because math is only
a way to DESCRIBE physical phenomena and no determiner of physical
phenomena as many modern scientists have been taught to believe.

s hardly based on emulating nature - the way birds and
bees fly is different from the way a moden plane flies.

>As to linguistics ( not my prime concern ) there is no way you will
>ever know "How does nature do it?"

You fail to give any reason why not.

>Questions such as whether there is a
>Universal grammar , wether Humans come with prewired grammar network
>or wether they think in a language can never be answered with any
>degree of certainity,

Again, and after huge scientific progress in this direction of which you
are obviously ignorant, you repeat your same pronouncement without saying
why. May I remind you that before Sputnik millions of people said that men
would never be able to put anything in earth orbit?

>You are free to vote for your pet theory.

Er, God hath not made all pet theories equal.

>This is because the working of the mind till date is a mystery.

This newsgroup was not designed as a place to tell people that natural
language and human intelligence are insoluble mysteries. For that you
might want to consider starting a new one called something like
comp.ai.religion, where I am confident you would get many more postings.

>To say that
>I observe a certain part of the brain is active (Using any form of
>imaging of your choice)in language tasks is not the same as
>understanding how nature does it.

I think that how nature has made you think this way is worth some further
study, remains a major mystery, etc.--or was it just your mother?

>Perhaps it will remain a mystery
>forever.

It may be a mystery to us, but it is no mystery to our designer or
designers, therefore you may safely consider it already solved and waiting
to be solved again.

>In my not so humble opinion cognitive science is far too
>wooly.

IMHO YOU may be too wooly for cognitive science. Have you considered
religion--especially Islam?

>This leaves us with a very simple criterion for any theory. Does it
>work? All else is secondary.

I quite agree, and therefore dare to say that God hath not created all pet
theories equal (see above).

>Question is not "How does nature do it?", but "In how many ways it
>can be done and out of these which is the best?"

Both questions interest me immensely, but not in a rhetorical fashion.

>I actually had a look at your sites as I am on the look out for better
>ways of doing things. I wasn't too happy with it .

I think I now be judged and hanged, or what think you?

>I think "SLIM theory of language" -Prof. Dr. Roland Hausser is a
>better bet.

Slim chance. How many functioning systems has HE done, or do functioning
systems somehow bother you?

Attitude, attitude! Pray when will science replace attitude? Not in my
lifetime--unless they quickly find the drug!

--Chaumont Devin

http://witchit.com
http://panlingua.net
http://chaumontdevin.com

http://oldmaluku.net

hh

Mok-Kong Shen

unread,

Sep 23, 2009, 10:09:03 AM9/23/09

to

pipeDream wrote:

> Human flight is hardly based on emulating nature - the way birds and
> bees fly is different from the way a moden plane flies.

I am afraid the books on bionics would oppose you. At the end of
the wing of a modern airplane there is a tiny stick. Even that is
inspired from a certain bird wing, if I remember correctly.

M. K. Shen

Mok-Kong Shen

unread,

Sep 23, 2009, 10:34:34 AM9/23/09

to

Joe Devin wrote:

> Surely. Simply use the global replacement feature on most any text editor.
> The first examples you gave are simply interchangeable synonyms. The last
> one ("my friend" vs "a friend of mine") can also be accomplished by global
> replacement, but the meaning will end up slightly different.

Synonyms like 'cheap'/'inexpensive' seem to have senses that are
particularly close to each other and therefore higher chances
of exchangeability. Are there listings of such materials?

Thanks,

M. K. Shen

pipeDream

unread,

Sep 23, 2009, 8:43:32 PM9/23/09

to

Studying nature is fine. Copying nature is not so fine as the
disastrous results of people who tried to it the bird way show.

> Now people are trying to build tiny flying spy robots, which they are
> learning to do by carefully observing the flight of insects.

Heard that one.

> In no case is mathematics sufficient for these tasks. In all cases they
> are first understood by a close observation of nature and the math follows
> successful experimentation. But once the math is right, it is possible for
> others to USE the math to design new aircraft. Why? Because math is only
> a way to DESCRIBE physical phenomena and no determiner of physical
> phenomena as many modern scientists have been taught to believe.
> s hardly based on emulating nature - the way birds and
> bees fly is different from the way a moden plane flies.
>
> >As to linguistics ( not my prime concern ) there is no way you will
> >ever know "How does nature do it?"
>
> You fail to give any reason why not.

Here is the first reason.
is it ever possible to read what happens in the others mind.?
This is more a philosophical position and has little to with relegion.
You might want to take a look at the views of Daniel Dennet and Karl
Popper. Philosophical positions are based on critical analysis and use
the distinction between "technically feasible" and logically
possible". Some conclusions apply regardless of technical progress.
Quite obviously You take postions closer to Daniel Dennet (even if you
are unaware of it).
You can ofcourse rubbish it and say this is not applicable in
linguistics or that it is irrelevant.
Each time you conduct an experiment to determine such as whether
syntax is more important or reasoning is more important by Imaging the
brain of volunteers you are trying to figure out "what's going on" .
This is not imaginary I did come across such a study. Language
abilities are very much a part of the mind. You may not agree. But
that is my view

> >Questions such as whether there is a
> >Universal grammar , wether Humans come with prewired grammar network
> >or wether they think in a language can never be answered with any
> >degree of certainity,
>
> Again, and after huge scientific progress in this direction of which you
> are obviously ignorant, you repeat your same pronouncement without saying
> why. May I remind you that before Sputnik millions of people said that men
> would never be able to put anything in earth orbit?

I think I have indicated reasons for my beliefs. They are not
traceable to Islam but to western thought.
As to the sputnik thing I can claim with equal justification ( from my
point of view ) that my views will prevail.

> >You are free to vote for your pet theory.
>
> Er, God hath not made all pet theories equal.
>
> >This is because the working of the mind till date is a mystery.
>
> This newsgroup was not designed as a place to tell people that natural
> language and human intelligence are insoluble mysteries. For that you
> might want to consider starting a new one called something like
> comp.ai.religion, where I am confident you would get many more postings.

By no means am I saying that "language and human intelligence are
insoluble mysteries". All I am saying is that we might as well be
aware of the limitations of what we are doing . You may have decades
of experience researching language but there are people whose decades
of expertise in their fields does have a bearing on the issue.

Whether you like it or not you cannot ignore the limitations of our
approach. These limitations manidest themself in many ways. Consider
Godel, think of the Lobner prize and if you can still talk glibly of "
huge scientific progress"., I think you are being less than honest.
There are many in the nlp community who beleive that a mind sitting in
the software is the ultimate answer.

> >To say that
> >I observe a certain part of the brain is active (Using any form of
> >imaging of your choice)in language tasks is not the same as
> >understanding how nature does it.
>
> I think that how nature has made you think this way is worth some further
> study, remains a major mystery, etc.--or was it just your mother?
>
> >Perhaps it will remain a mystery
> >forever.
>
> It may be a mystery to us, but it is no mystery to our designer or
> designers, therefore you may safely consider it already solved and waiting
> to be solved again.

If you are referring to god then he seems to have marked certain zones
as prohibited

> >In my not so humble opinion cognitive science is far too
> >wooly.
>
> IMHO YOU may be too wooly for cognitive science. Have you considered
> religion--especially Islam?
>
> >This leaves us with a very simple criterion for any theory. Does it
> >work? All else is secondary.
>
> I quite agree, and therefore dare to say that God hath not created all pet
> theories equal (see above).

Some accord after lot of discord. Thank god!!

> >Question is not "How does nature do it?", but "In how many ways it
> >can be done and out of these which is the best?"
>
> Both questions interest me immensely, but not in a rhetorical fashion.

They interest me too. Otherwise I would not take the trouble of even
looking at these messages.
In relation to language I think the more important question is What
empirical language behaviours should a Theory cater to. Obviously this
again depends on the application. I don't beleive in Theory of
everything

> >I actually had a look at your sites as I am on the look out for better
> >ways of doing things. I wasn't too happy with it .
>
> I think I now be judged and hanged, or what think you?

When you go to a shop you say this is what I am looking for.
So my judgement is based on my shopping list and not absolute.
So I was trying to figure out
0. Thing that turns me on is Natural Language Understanding as opposed
to Natural language Interfaces for databases or merely a two way human
interface.
My impression was that it fell in the latter category.

1. If it has a way of incorporating context. I know it's dicy. But
humans can understand several things from context even if the grammar
is bad. As a matter of fact in high school arranging a garbled list of
words into a grammatical sentence is a grammar excercise.

2. It incorporates a mechanism to use common sense. For example
ConceptNet does this.

3. Computational complexity -Lower the better

4. Less dependency on grammar. My view is the less we depend on it the
better of we are.

5. Integrated logic and language-- I tend to see them as two sides of
the coin

> >I think "SLIM theory of language" -Prof. Dr. Roland Hausser is a
> >better bet.

It comes closer to my shopping list.

> Slim chance. How many functioning systems has HE done, or do functioning
> systems somehow bother you?

A theory or blueprint is the only thing one starts with before a
system is built. So before dirtying the hands and spending time one
reviews the blue print. I am not really worried how many "functioning
systems HE has done" but How many functioning systems I can do with
that theory.( Certainly it is not mainstream theory). I hope to build
a better google with it. ( I can hear you say 'SLIM' chance.Well I
beleive in checking for myself and would rather spend time in
producing working stuff than writing a scholarly article( needs lots
of time). The latter can wait and that is why my reasons appear
'SLIM'.

Joe Devin

unread,

Sep 23, 2009, 7:57:01 PM9/23/09

to

Mok-Kong Shen wrote:

> Synonyms like 'cheap'/'inexpensive' seem to have senses that are
> particularly close to each other and therefore higher chances
> of exchangeability. Are there listings of such materials?
>
> Thanks,
>
> M. K. Shen

In my work, I have done most things from scratch and not by copying
synonyms from other works. The reason is that many entries in standard
dictionaries and thesauruses may not work as expected under my system. It
is safer always to let the system grow up naturally so that anything that
would cause problems gets dealt with as the system develops. So, to put it
bluntly, I simply don't know.

But if you are interested in how my systems work, and these work the same
way for many languages, then it is as follows:

The place my system checks for synonymy is in a kind of "black box" called
the ontology. The nodes, or connecting points in the ontology can be
thought of as representing meanings, but they are not symbols. These nodes
in the ontology are called "semantic nodes," or just "semnods." Each such
semnod is usually linked to one or more nodes connected to actual English
words in another "black box" called the lexicon. These nodes within the
lexicon are called "lexical nodes," or just "lexnods." So each semnod is
generally linked to one or more lexnods, whereas each lexnod can only link
to a single English word. Thus there is a different lexnod linked to
"run," "runs," "ran," and "running," but the semnod for the particular
meaning of run in question (and there are several such meanings or semnods
for different kinds of "run") links to each of these lexnods. And not only
this, but ALL the semnods for the various meanings of "run" link to this
selfsame set of lexnods.

And now, if I haven't confused you enough, to return to the ontology.
There can only be one lexlink (link from semnod to lexnod) of a particular
type emanating from any semnod. Thus for special cases such as English
"am" and "be," which would have an identical lexlink type (that type being
"present tense verb"), separate semnods are required even though the
meanings remain the same. So I arbitrarily use the semnod linking through
to "be" as the main meaning, and then set up a separate semnod linking
through to "am" as a synonym. So the semnod linking to "am" as a present
tense verb also has a link of type "synonym" to the semnod linked to "be."

It was hard to see how all of this would work at first, but I found that
this had to be the rule, else during text generation the system might grab
"am" one time and "be" another time at random.

I am explaining all of this because, as I have written before, the old
axiom, "There's always more than one way of skinning a cat," won't seem to
work for linguistics. This process of much trial and error using computer
systems is therefore probably a very good guide to what really happens
inside human heads. Right now we understand part of these processes, but
we keep always learning more, one little piece at a time, and as we learn,
our systems keep getting better.

regards,
Chaumont Devin.

witchit.com
panlingua.net
chaumontdevin.com
oldmaluku.net

Joe Devin

unread,

Sep 23, 2009, 8:18:15 PM9/23/09

to

I am continually amazed by the power of psychological denial. Sometimes I
do something, publish my results, and years later get told that if I wanted
people to believe what I was telling them, then I should just DO it. And
being without the love of women, my only consolations are the Wright
brothers, who kept hearing that humans would never fly longyears after they
had already been flying, and George Norry, who keeps getting told we never
made it to the moon on Coast to Coast AM.

Meantime <gopalara...@gmail.com> writes:

> You might want to take a look at the views of Daniel Dennet and Karl
> Popper.

No time. I proved the existence of God in only three steps. If any of
these people had anything relevant to say you would have been able to
repeat it here. I am not interested in hearing about books people have
read, authoritarian figures they admire, etc. Just give me the facts.

> Philosophical positions are based on critical analysis

Bah. What philosophical positions, and why are anybody else's
philosophical positions any better than mine?

> Quite obviously You take postions closer to Daniel Dennet (even if you
> are unaware of it).

I am not interested in philosophical positions, but only in results.
Nevertheless I will say that in order to get results, people need to
change their philosophy of life, start divesting themselves of preconceived
notions, andÀget real about seeing reality just as it really is. It is
easy for people to believe that they are really thinking, but herein lies
the danger: it is precisely when we THINK that we are smart that we get
stupid. So we need to examine ourselves without mercy day by day lest
being unaware of it we still harbor foolish and unfounded notions.

> You can ofcourse rubbish it and say this is not applicable in
> linguistics or that it is irrelevant.

So what have you told me that is new or relevantÀor applicable? Can you be
clear or will you just hide yourself behind references to books and
authoritarian names, etc., etc., ad infinitum?

> Each time you conduct an experiment to determine such as whether
> syntax is more important or reasoning is more important

Impossible. How can you presume to separate syntax from reasoning? This
is a prime example of what I just said--another unexamined, preconcieved,
cut and dried notion. If you are really interested in philosophy, then
please go back to Socrates.

> All I am saying is that we might as well be
> aware of the limitations of what we are doing .

Then all I am asking is, "Who are you to place limitations on what we are
doing since God Himself hath not yet struck us dead? Doth He need help
from thee, and if so, why?" There is a certain philosophy called Islam in
which Allah is adulated but held to be an incompetent whimp who continually
needs much bold human help to blow people to smitherines for disagreeing
with Him, cut off people's hands for stealing, cut off girlish clits for
causing female erection, etc. If God be truly Almighty and Merciful, then
why is he unable to do these things Himself, and why is he forced ever to
rely upon his frail human servants? Be He on drugs or what else aileth
Him?

> Whether you like it or not you cannot ignore the limitations of our
> approach.

I like it not, so you keep right on hugging your limitations while I ignore
them, and we will see who first gets the Nobel Prize for linguistic
whatever if you don't blow me up first in your religious fervor to help
Allah.

> These limitations manidest themself in many ways. Consider
> Godel, think of the Lobner prize and if you can still talk glibly of "
> huge scientific progress"., I think you are being less than honest.

I am honestly amazed at your benighted pessimism that persists in the light
of scientific progress. Why can't you see what has been happening?

> There are many in the nlp community who beleive that a mind sitting in
> the software is the ultimate answer.

Interesting. And I thought I was having a hard time persuading them that
it is even possible. But beware, that mind within the software may be
Satan or Allah, either one of which could easily kill billions either for
good or evil.

> If you are referring to god then he seems to have marked certain zones
> as prohibited

If this be true, then why am I still alive? Doubtless because His servants
are lax in carrying out their sacred duty of helping the Poor Old Fellow in
His Weakness and Infirmity. But be thou careful, I have a good supply of
rocks up here on this here mountain, and I will not be reluctant to use
them.

Ha-ha, just kidding--unless ...

> I don't beleive in Theory of
> everything

A good first step, but if you really want to discover ANYTHING, then I
would recommend believing in NOTHING, because it is only by divesting
ourselves of all preconceived notions that we begin to "see."

To use an allegorical example, by lowering the temperature of space
telescopes as nearly as possible to absolute zero, we begin to make out
distant stars.

> 0. Thing that turns me on is Natural Language Understanding

Please define "understanding."

> as opposed
> to Natural language Interfaces for databases

Natural-language interfaces for databases are not opposed to understanding.
In fact they work much better when machines understand what people are
asking them, hence the research pursued by people like myself.

> or merely a two way human
> interface.

If natural-language interfaces were "merely" anything, then people wouldn't
have to waste years working on them.

> My impression was that it fell in the latter category.

If you are talking about Brainchild 5, then you have definitely missed
everything I have been saying.

Chaumont Devin

witchit.com
panlingua.net
chaumontdevin.com
oldmaluku.net

Mok-Kong Shen

unread,

Sep 24, 2009, 3:16:33 AM9/24/09

to

Joe Devin wrote:

If I don't err in reading the above, you do everything, so to
say, by hand. I wonder whether it wouldn't eventually be possible
to get certain machine support. Suppose, for example, there were
a software that could extract from a large corpus sets of words
that are somehow synonymous, it would be possible to incorporate
that data into your system, through careful human screening,
of course.

Thanks,

M. K. Shen

Mok-Kong Shen

unread,

Sep 24, 2009, 3:56:42 AM9/24/09

to

Joe Devin wrote:

> ......... If God be truly Almighty and Merciful, then

> why is he unable to do these things Himself, and why is he forced ever to
> rely upon his frail human servants?

Yes, this is a "logical" problem for His existence. With the premise
above, He shouldn't have tolerated the sins of the mankind, including,
in particular, wars, genocide, political and economical exploitation
of the poor and breach of human rights (though perhaps the human
doesn't "have" any rights from the very beginning!), and would have
eliminated all the evils through a single waving of His hand. This
problem is understandably ignored, avoided or even vehemently
suppressed through some means in all religions (excepting possibly
in each individual's own religion) in my humble view.

M. K. Shen

Mok-Kong Shen

unread,

Sep 24, 2009, 3:56:28 AM9/24/09

to

pipeDream wrote:

> is it ever possible to read what happens in the others mind.?

Devices using signals from the brain to help disabled persons
to manipulate with robotic hands and certain studies in linguistics
with the help of MRI, etc. indicate that some, though yet humble,
progress has been made in that direction.

M. K. Shen

Joe Devin

unread,

Sep 24, 2009, 5:24:39 PM9/24/09

to

Mok-Kong Shen <mok-ko...@t-online.de> wrote:

>> ......... If God be truly Almighty and Merciful, then
>> why is he unable to do these things Himself, and why is he forced ever to
>> rely upon his frail human servants?

>Yes, this is a "logical" problem for His existence.

No problem for his EXISTENCE, but only for his INTERFERENCE.

If what we call God is defined as that age-old source of life and
intelligence that exists outside the universe and infuses life and
intelligence into the universe, then the proof of his/her/its/their
existence is quite elementary, as I have shown.

And we need not search far for damning evidence against those who to deny
the existenceÀof one or more creators. For example, just take this little
matter of language. Do you really believe that such a thing could evolve
on its own? It is clearly quite improbable that any such thing could
happen even in 444444.4 billion years, let alone 4.4 billion. We also know
clearly that Darwinism works, else we wouldn't be going in droves for flu
vaccine. but why should this be a problem when even demonstrably
simpleminded computer programmers like myself know how to create software
capable of adjusting to new environments, learning, and evolving. It is
inconceivable, therefore, that any being smart enough to create life would
create some kind of rigid, stupid life that can't evolve. People should
just get over this nonsense on both sides and move their buts along.

So everybody knows at some level or other that some creator or creators
must exist. The question is then why this creator or creators would let
things get so bad.

To assume a personal Christian God and call him "good" would seem utterly
impossible. How could any God who is any kind of "good" stand by and let
people calling upon Him to be slaughtered by mad Arabs? But people
steadfastly refuse to face really and the utter bankruptcy of their own
beliefs, so they keep clinging to them in the face of everything with wacky
rationalizations like, "God knows best." "God is testing our faith." "God
is allowing these things to happen in order to test our faith." "God is
almighty and in perfect control, so He is not doing these things, it is the
Devil." Etc., etc., etc. The last attempt at rationalization, of course
being the most ludicrous, and the great weakness of the entire argument,
since it is impossible for someone all-powerful and all-knowing and
all-caring to be in control and not be in control of Satan, whom the Book
of Job names one of the "sons of God."

And the God of Islam is many orders of magnitude worse as evidenced by the
unbroken tide of mayhem that has poured over the earth since its inception
in the 7th century. But for some obscure reason, it is manifestly harder
for a Moslem to shed the preconceptions of his forebears than it is for
Christians, hence the great advances of science in the West while
stagnation grips the Moslem world.

So what does all this prove? Human intelligence is a very recent
development along the long path of evolution fraught with many weakness and
dangers, or (in computer speak) many "bugs." It is apparently designed to
debug itself unless it inadvertently destroys itself and everything around
it beforehand.

But here is what the evidence will tell us:

1. Although shaped by evolution, we are definitely the product of creation
and by no means any kind of "accident."

2. Nobody is watching us all the time, loving us, and making sure that all
goes well.

3. If our creator or creators do continue to exist, then they must be
outside this universe and not in a position to interfere in the affairs of
(micromanage) Planet Earth.

4. Although this vast communications gap exists between us and our creator
or creators, it is possible for them to bridge this gap from time to time
in order to add a little this or that to life on Planet Earth, hence the
totally unexplainable Cambrian Explosion and other evolutionary phenomena.

5. The sudden appearance of human intelligence and language may constitute
the last such interference or "trimming of the evolutionary sails."

The evidence for my last statement (item #5, above) is powerful and clear
for reasons I have already given, and would seem to constitute an
evolutionary "smoking gun." It may be okay (ha-ha) to conjecture how
insect flight evolved by bugs jumping on water, but I challenge ANYONE to
show ANY WAY that language could spontaneously evolve. Go ahead and shut
your eyes in ignorance, invoke authoritarian scientific names, books,
libraries, "science," or whatever you like, but the smoking gun keeps right
on smoking in our faces. Language is NOT evolving, and there is therefore
no reason other than the human frailty of preconception and superstition to
believe that it ever evolved at all.

Sincerely,
Chaumont Devin.

Joe Devin

unread,

Sep 24, 2009, 5:17:16 PM9/24/09

to

Mok-Kong Shen <mok-ko...@t-online.de> wrote:

> If I don't err in reading the above, you do everything, so to
> say, by hand. I wonder whether it wouldn't eventually be possible
> to get certain machine support. Suppose, for example, there were
> a software that could extract from a large corpus sets of words
> that are somehow synonymous, it would be possible to incorporate
> that data into your system, through careful human screening,
> of course.
>
> Thanks,
>
> M. K. Shen

I fail to understand the advantage, since it goes from current human
intervention to further human intervention instead of to the promised
automation.

To me it remains a mystery exactly how humans are able to determine things
like synonymy and hypernymy and holonymy, especially if they are blind and
unable to take visual cues, and yet there never seems to be a problem. I
have yet to ever build a system capable of determining these relationships
between meanings without being explicitly told. So how do humans do it?
If they can see, then they should be able to observe similarities or
dissimilarities in form and function, so this must probably part of it, but
I have never met a blind person who wouldn't be able to tell you that a cat
is a mammal, a Chevrolet is an automobile, etc. Yet I think we may
explicitly tell children such things all the time and just not be aware of
it. For example, "What's a maple, Mommy?" "A kind of tree, Darling." etc.
A lot more recording work on childhood is required, say with kids in the
back seat of the car.

But many other semantic relationships are easy to determined from parsed
texts, for example potential agency (in English, what meanings for things
can function as the subjects of what meanings for verbs), potential
patiency (what can be object), potential state (what adjective can modify
what noun), etc. And because it is SO easy to determine these relations
from parsed texts, it is a waste of time putting them in the ontology,
where they are redundant. This is because the corpus (all parsed text) and
the ontology (all known meanings) are permanent features of the human
linguistic apparatus--in other words, both are always instantly available
to the automated linguistic system.

Regards,
Chaumont devin.

witchit.com
panlingua.net
chaumontdevin.com
oldmaluku.net

Brian Martin

unread,

Sep 25, 2009, 9:53:46 AM9/25/09

to

Wordnet - http://wordnet.princeton.edu/
has a comprehensive data set of synonyms, hypernyms, hyponyms, etc.

This is a highly detailed data set, with some convenience wrapper API's
for simplicity, and the raw data format is documented if you prefer as I
do to "just read the raw data" and build your own model.

Brian Martin

unread,

Sep 25, 2009, 9:59:06 AM9/25/09

to

I will applaud whoever first wins the Loebner Prize, or the Nobel Prize
in NLP, be they Gentile, Jew, Islamic, or even, God help us, a Baptist.

Pax Vobiscum, Shalom, and Ins'Allah.

Brian Martin

unread,

Sep 25, 2009, 10:05:01 AM9/25/09

to

The key to "how we think" is not in which areas of the brain show
activity under various input, but in simply becoming attuned to how each
of us thinks internally, i.e. our sub-vocalisation stream.

This becomes more exposed when we learn a new language or more so, when
we are immersed and "thinking" in that new language, as opposed to
thinking in our native language and then translating.

Basically that is closely parallel to one possible solution of the whole
AI/NLP problem.

There may well be some better more efficient solution, but our best
chances at this stage is to apply our awareness of how our brains
already do this task, i.e. with an underlying semantic model of the
world of objects and events, and 2 mappings of language syntax onto
that semantic model.

>> built�gliders from willow branches, got it right, and made many successful

Brian Martin

unread,

Sep 25, 2009, 10:06:29 AM9/25/09

to

Wordnet (Princeton) have already done the hard work of extracting
synonym sets. I suggest we just use their work rather than reinvent the
wheel.

Joe Devin

unread,

Sep 26, 2009, 7:52:34 AM9/26/09

to

Brian wrote:

>Wordnet - http://wordnet.princeton.edu/

I downloaded wordnet many years ago and benefitted from it as a learning
tool. I have no idea what it is like now, but at that time, here were its
fatal flaws:

1. Its construction was not guided by any coherent underlying theory, and
it showed it.

2. Instead of finding a way to include all meanings in the same ontology
(or box of meanings), because wordnet was not based upon a theory that
could do this, they had to put different parts of speech in different
files.

3. The people who did wordnet were not computer savvy, so they made choices
that gobbled up computer resources and made things to slow for real
applications.

4. A lot of the work on wordnet was evidently done by volunteers in their
spare time, and so it has (or had) a lot of errors which made it too
unreliable to use on real systems.

Nevertheless I strongly recommend wordnet as a valuable learning tool.

--Chaumont Devin.

Joe Devin

unread,

Sep 26, 2009, 7:57:45 AM9/26/09

to

Bryan Martin wrote:

>Wordnet (Princeton) have already done the hard work of extracting
>synonym sets. I suggest we just use their work rather than reinvent the
>wheel.

Synonymy is really quite trivial and not worth spending much time on in the
larger scheme of things. I am always setting it up and revising it with
just a few keystrokes using Semlex, which is tightly coupled to my larger
system. That way I can be immediately certain that not only am I getting a
synonym link to the right English word, but also to the right word sense or
meaning.

--Chaumont Devin.

Joe Devin

unread,

Sep 26, 2009, 7:55:48 AM9/26/09

to

Brian Martin <brian...@futuresoftware.com.auNOSPAM> wrote:

>The key to "how we think" is not in which areas of the brain show
>activity under various input, but in simply becoming attuned to how each
>of us thinks internally, i.e. our sub-vocalisation stream.

Mapping out the way things happen inside human brains under linguistic
stimuli is important for several reasons. It tells us, for example,
whether the connections for verbs are kept in a different part of the brain
than that for nouns, etc., and these things will probably all turn out to
be very important in the design of true AI.

As an example, I think the work on human brains helped me to formalize the
theory covering the structure of ontologies. One of the things it must
have helped me to understand is that the ontology can be made 100% self
contained if we grasp what we are doing.

To help visualize what I am saying in the following, imagine that the
ontology were the skin on your back and that the nodes of the ontology were
like tiny points that could be stimulated on your skin. Now supposing all
of the skin on your back were treated equally and nouns and verbs could
just be anywhere, and supposing you felt a tiny stimulus somewhere. You
would have no idea whether the stimulus were for some kind of object or for
some kind of action or for some kind of state of being. But if these three
areas had been mapped out beforehand, you would be able to tell right away.
And in fact this is how things work in the real world. Take a totally
illiterate person and explain carefully to him/her what a noun is, what a
verb is, etc., and before you know it you will have him/her recognizing
nouns and verbs for him/her self without even paying attention.

So now imagine that every linguistic link, besides having a source and a
destination also has a type associated with some pinpoint on the skin of
your back. Then every time that link were activated (used), you would feel
a tiny stimulus in the region associated with that link type on your back.

So language provides a window into the mind from outside by which, if we
are clever like Allen Turing was in WW2, we can deduce a lot of what is
going on inside. But don't forget the scientific method. First we observe
(as by analyzing spoken or written language), then we hypothesize (oh yeah,
if this is happening, then it probably means such and so), and then we
test. And it is in this testing stage when empirical input from the people
scanning brains can make a difference, because they may be able to confirm
or deny the hypotheses we come up with, enabling us to know which ones to
trash, which ones to hang onto, and move ON.

And as regards thinking, you raise a complicated issue because thinking
involves traversal of the links (relations) found in both the corpus AND
the ontology at the same time in complex ways we only partly understand.
All I feel confident in saying is what I just said. Thinking requires an
ontology and a corpus of parsed sentences, but beyond that the process is
only vaguely understood. Visualize these two black boxes (the corpus and
the ontology) as if you could hold them in your hands. Both of them would
be hot, so you would know that they were active, but there would be no
obvious way of knowing what was going on inside. Nevertheless if you had
an oscilloscope and the ability to disconnect and reconnect the wires one
by one, you would surely see some patterns beginning to emerge, and
steadily but surely you might figure out all of the details. So far right
now our instruments are not precise enough to take these oscilloscopic
readings on live people, and killing them to do it wouldn't work because
they would be dead. Nevertheless if Charley Darwin were around, he might
just find some way of reviving them in warm ponds--one never knows, and one
must always keep an open mind where science is concerned.

--Chaumont Devin.

Brian Martin

unread,

Oct 17, 2009, 7:37:15 AM10/17/09

to

I've found Wordnet data files useful in their raw form, though I don't
use their provided API's which are query/response based.

I prefer to just load all their datafiles & parse them directly into an
internal format, sidestepping the Wordnet API's for efficiency. i.e.
parse & remap the full dataset rather than use millions of API calls.
The raw datafile format is well documented, though I agree it's a bit
convoluted.

Brian Martin

unread,

Oct 17, 2009, 7:39:06 AM10/17/09

to

Wordnet, while oriented to synonym sets, also includes hypernym /
hyponym links, and other ontology frameworks.

Joachim Pimiskern

unread,

Oct 17, 2009, 12:57:15 PM10/17/09

to

Mok-Kong Shen schrieb:

> Synonyms like 'cheap'/'inexpensive' seem to have senses that are
> particularly close to each other and therefore higher chances
> of exchangeability. Are there listings of such materials?

A couple of links to resources:
http://pespmc1.vub.ac.be/CLUSTERW.html
http://pespmc1.vub.ac.be/SPREADACT.html
http://homepages.cwi.nl/~paulv/papers/amdug.pdf
http://zhouzhuang.livejournal.com/1458.html
http://en.wikipedia.org/wiki/Moby_Project
http://conceptnet.media.mit.edu/
http://www.mpii.de/yago
http://www.newscientist.com/article/dn9997-software-learns-new-words-from-wikipedia.html
http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html
http://www.physorg.com/news87276588.html
http://www.biomedcentral.com/1471-2105/9/159
http://research.microsoft.com/en-us/projects/mindnet/default.aspx
http://www.openthesaurus.de/about/download
http://www.dwds.de/kollokationsgraph
http://www.augos.com/ki/semnet_en.html

Regards,
Joachim