Suggestions on better contextual lookup algorithm?

chadmaester

unread,

Dec 2, 2009, 3:14:52 PM12/2/09

to

I am developing an application that provides contextual knowledge
lookup. The application is in its design phase.

I need to know whether a simple graph structure and a traversal
algorithm would be sufficient or whether I should go with a neural
network. I want to go with the most long-term solution.

I am thinking of representing individual concepts with simple nodes.
Suppose I want to lookup John's hair color (which is black). I think I
need four concept nodes: John, hair, color, and black. Here are two
algorithms--which one is best suited for my task?

**Traversal lookup algorithm**

See this diagram for reference: http://chadjohnson.ath.cx:8080/static/concept_map.png

1. Input (something like this), in order:
1. person
2. john
3. hair
4. color
2. Find the graph node corresponding to 'person'.
3. Look at all nodes adjacent to the 'person' node, and find the
'john' node.
4. Look at all nodes adjacent to the 'john' node, and find a node that
has links to both the 'hair' node and the 'color' node.

Another option would be to represent 'hair color' as its own concept
node, making it so 'blond hair' is a 'hair color' concept. Then step
(4) would become

Look at all nodes adjacent to the 'john' node, and find a node that
is a 'hair color' concept.

**Neural network algorithm:**

1. Input, in no particular order:
1. person
2. john
3. hair
4. color
2. Train the network to map these particular inputs to the concept
'blond hair'.

Any feedback would be appreciated. Thanks!

Message has been deleted

Ian Parker

unread,

Dec 3, 2009, 9:06:04 AM12/3/09

to

I have 2 suggestions. The first would be to have classes like Java or C
++

class person{
bool sex;
double height;
double weight;
color skin;
color hair;
int age;
};

person john();

I could add a few more things.

The second thing is Latent Semantic Analysis. I take a word, but to
find out the meaning I need to employ LSA to get al the possible
meanings. LSA vectors are normally computed by looking at the
surrounding words in a passage, but a "chat" can have an overall
context.

Does this help?

- Ian Parker

chadmaester

unread,

Dec 3, 2009, 2:38:45 PM12/3/09

to

I think a hard-coded OO approach would be too simple a mechanism for
achieving the milestones described, as everything would then be hard-
coded and symbolic, which is inherently limited.

I'll research LSA. Thank you for the suggestion.

A milestone goal would be the ability to answer the question, "What
color is John's hair?"

So, I want to be able to store the concept of 'person,' concepts of
different people, the concept of hair, the concept of color, concepts
of multiple different colors, and then I want to link different people
to different hair colors.

Once this knowledge and their associations are in place, I'd like to
be able to look up a given person's hair color, given (inputs) a
reference to a person, the concept of hair, and the concept of color.

That is where my question comes into play: traversal vs. a neural
network.

In case it changes the way you answer, future milestones would be:
1) the computer learning a language's grammar from scratch</li>
2) a computer to answer questions using its knowledge of a language's
grammar</li>
3) a computer to translate sentences from one language to another,
correctly, based on context and existing knowledge. In its *most
simple form*, it would store a concept, associate a word for a given
language with the concept, and associate another word for another
language with that same concept. To translate, the word for the target
language associated with the concept would be found and returned (see
http://chadjohnson.ath.cx:8080/static/language_translation.png).

chadmaester

unread,

Dec 4, 2009, 12:22:48 PM12/4/09

to

On Dec 3, 6:06 am, Ian Parker <ianpark...@gmail.com> wrote:

Any thoughts?

Ian Parker

unread,

Dec 4, 2009, 5:19:00 PM12/4/09

to

Nothing else is any better. You can expand class requirements ar will.

>
> I'll research LSA. Thank you for the suggestion.
>
> A milestone goal would be the ability to answer the question, "What
> color is John's hair?"

john.hair.color is the color of John's hair. This is indeterminate
when we form the class. As we proceed to chat we fill in more details.

>
> So, I want to be able to store the concept of 'person,' concepts of
> different people, the concept of hair, the concept of color, concepts
> of multiple different colors, and then I want to link different people
> to different hair colors.
>
> Once this knowledge and their associations are in place, I'd like to
> be able to look up a given person's hair color, given (inputs) a
> reference to a person, the concept of hair, and the concept of color.
>

What you are thinking about is a database. I need to search for all
males with red hair. I can either do this by direct searching, or
faster by the use of a hashing algorithm.

> That is where my question comes into play: traversal vs. a neural
> network.
>
> In case it changes the way you answer, future milestones would be:
> 1) the computer learning a language's grammar from scratch</li>

I am going to use a pseudonym not my real name! I want you to realize
exactly what this would mean. If it were possible for a computer to
learn grammar from scratch it would also be able to accomplish what
scholars have been trying to do, and that is to trace the development
of a language and its literature.

Despite what Kurtzweil might claim such a program is well beyond
current capabilities.

http://books.google.co.uk/books?id=EfFA4dF-Zg8C&dq=text+of+koran&pg=PP1&ots=TYaCFkE_fr&source=citation&sig=uaOdwbxKvQ22BEG1UuxwL139D9s&hl=en&sa=X&oi=book_result&resnum=13&ct=result#v=onepage&q=text%20of%20koran&f=false

This why I said I would write using a pseudonym. Professor
Luxenbourg's life has been threatened because of what he has found. AI
of the power you are implying would be able to verify (or otherwise)
his conclusions. In fact to achieve what you are suggesting you would
have to look at the way scholarship works and attempt to duplicate
this in AI.

Mind I think this is absolutely disgraceful. I may disagree with you,
you may disagree with me, but we are not going to kill each other.

> 2) a computer to answer questions using its knowledge of a language's
> grammar</li>

Systran uses grammatical consideration. Google is a "statistical"
system which does not take account of grammar.

> 3) a computer to translate sentences from one language to another,
> correctly, based on context and existing knowledge. In its *most
> simple form*, it would store a concept, associate a word for a given
> language with the concept, and associate another word for another
> language with that same concept. To translate, the word for the target
> language associated with the concept would be found and returned (seehttp://chadjohnson.ath.cx:8080/static/language_translation.png).

You either do this the statistical (Google) way, or you use LSA. Look
at

http://docs.google.com/Doc?docid=0AQIg8QuzTONQZGZxenF2NnNfNzY4ZDRxcnJ0aHI&hl=en_GB

and a raw word basis

http://spreadsheets.google.com/ccc?key=0AgIg8QuzTONQdFZXbHdJZU15b0tOVlI4b2l0TnhtdUE&hl=en_GB

You can see that each word has a number of meanings, you reject the
meanings that don't fit. You need to apply LSA to the alternatives.

One other fly in the ointment n-grams. These can have specific
meanings nSf qTrhA can literally mean half of Qatar, but it means
radius. Iran would like Bahrain and the radius! You need to search for
these specific n-grams and determine the positions in vector space.

Sorry to sound discouraging.

- Ian Parker

chadmaester

unread,

Dec 4, 2009, 6:11:19 PM12/4/09

to

Storing concept representations in a database will only work if a
unique table structure does not have to be defined for each concept.

'Person,' for example, having attributes such as weight, height, hair
color, age, etc. can not have its own table. Each attribute would have
to be a concept in itself. 'John,' 'weight,' 'height', 'hair color,'
'age,' and even the actual attribute values such as 21 for age, 150
for weight, etc. would have to be concepts in themselves.

The ability for the computer to learn and store information for new
concepts would simply break down if a new table schema had to be
defined for each new concept classification.

So, databases in the standard relational, normalized sense are
completely out of question. That is why I am looking to graph-like
data structures.

>
> > That is where my question comes into play: traversal vs. a neural
> > network.
>
> > In case it changes the way you answer, future milestones would be:
> > 1) the computer learning a language's grammar from scratch</li>
>
> I am going to use a pseudonym not my real name! I want you to realize
> exactly what this would mean. If it were possible for a computer to
> learn grammar from scratch it would also be able to accomplish what
> scholars have been trying to do, and that is to trace the development
> of a language and its literature.
>
> Despite what Kurtzweil might claim such a program is well beyond
> current capabilities.
>

> http://books.google.co.uk/books?id=EfFA4dF-Zg8C&dq=text+of+koran&pg=P...

>
> This why I said I would write using a pseudonym. Professor
> Luxenbourg's life has been threatened because of what he has found. AI
> of the power you are implying would be able to verify (or otherwise)
> his conclusions. In fact to achieve what you are suggesting you would
> have to look at the way scholarship works and attempt to duplicate
> this in AI.
>
> Mind I think this is absolutely disgraceful. I may disagree with you,
> you may disagree with me, but we are not going to kill each other.
>
> > 2) a computer to answer questions using its knowledge of a language's
> > grammar</li>
>
> Systran uses grammatical consideration. Google is a "statistical"
> system which does not take account of grammar.
>
> > 3) a computer to translate sentences from one language to another,
> > correctly, based on context and existing knowledge. In its *most
> > simple form*, it would store a concept, associate a word for a given
> > language with the concept, and associate another word for another
> > language with that same concept. To translate, the word for the target
> > language associated with the concept would be found and returned (seehttp://chadjohnson.ath.cx:8080/static/language_translation.png).
>
> You either do this the statistical (Google) way, or you use LSA. Look
> at
>

> http://docs.google.com/Doc?docid=0AQIg8QuzTONQZGZxenF2NnNfNzY4ZDRxcnJ...

>
> and a raw word basis
>

> http://spreadsheets.google.com/ccc?key=0AgIg8QuzTONQdFZXbHdJZU15b0tOV...
>

Bookmarking these.

chadmaester

unread,

Dec 4, 2009, 6:13:00 PM12/4/09

to

Same problem for using objects (OOP).

Ian Parker

unread,

Dec 5, 2009, 6:28:14 AM12/5/09

to

First point - perhaps a trivial one. I am not American I am EU and I
feel that in scientific discussions SI should be used. 150 is obese.
You mean 70. One point about databases. It is perfectly possible to
perform a direct search using graphs, although as you expand the
amount of data the search time becomes longer and longer. In contrast
a hashing algorithm has a search time which is independent of the
amount of data stored.

The basic algorithm takes a prime modulus, you multiply the ASCII
value of each character within the modulus and you search starting
from the result of the calculation. You can avoid the need to have a
large number of files by assigning a numerical value to each type of
variable. In the field you have an address which will take you to your
variable. I will assign "John" to height, age, weight.

If you have multiple entries (as you will have) you simply assign a
pointer to another array which will give everyone whose weight is
70Kg.

You say a data base cannot be constructed yet Google has done
precisely this with its search. Google stores all words, all bigrams
all trigrams etc. in this way. You can search for a phrase and Google
will give all the occurrences.

I have in fact a class which will hash, decide whether the hash has
occurred and then build a chain which gives all occurrences.

- Ian Parker

chadmaester

unread,

Dec 8, 2009, 1:19:36 AM12/8/09

to

Is this what you are thinking? http://chadjohnson.ath.cx:8080/static/hash_map.png

It's similar to the original diagram but organized a little
differently.

Thank you for your feedback so far--it's been helpful. If this diagram
does not correctly represent what you have in mind, would you mind
quickly drawing one and posting a link to it or emailing it to me?
Thanks Ian.

Ian Parker

unread,

Dec 8, 2009, 6:22:17 AM12/8/09

to

It boils down to that. The diagram though is implemented by hashing. I
originally had a C++/Java type structure in mind, but this structure
can be represented as shown. In fact you need to implement it in this
way to use hashing.

- Ian Parker

On 8 Dec, 06:19, chadmaester <chad.d.john...@gmail.com> wrote:
>
> Is this what you are thinking?http://chadjohnson.ath.cx:8080/static/hash_map.png

Ian Parker

unread,

Dec 8, 2009, 6:38:25 AM12/8/09

to

There is one other point to think about further down the line, and
that is you want to write something like a mini compiler to convert
data from one form into another.

- Ian Parker

chadmaester

unread,

Dec 8, 2009, 11:48:20 AM12/8/09

to

What are the two forms the compiler would convert between?

Ian Parker

unread,

Dec 8, 2009, 11:57:12 AM12/8/09

to

Well, if you recall my original scheme was a heirarchical one viz :-

person john();

person{

hair q;

...
...

}

which can be written in the way you suggest.

There is also a more general problem. One of the beefs I have about AI
research is that there is enough software floating around for there to
be to be a high level of AI. One of the main questions is
compatibility. In general if we string software together it wont work,
because although software A provides all the information for software
B, software B cannot understand the output of A.

What I have written above can easily be placed in your scheme. However
your program will not understand it.

- Ian Parker