wanted: 2 tasks (update)

22 views
Skip to first unread message

YKY (Yan King Yin, 甄景贤)

unread,
Sep 3, 2012, 2:46:47 AM9/3/12
to general-in...@googlegroups.com, AGI mailing list, codea...@googlegroups.com
Hi all,

Our AI project needs 2 tasks to be completed, both are not too difficult, each may take approximately a few weeks.  Our project aims to eliminate racism, sexism, and other forms of discrimination in the high-tech entrepreneurial space.  Please support it!! =)

2.  Ontology and concept classifier (See attached PDF)
===========================================
Requirements:  Some familiarity with machine learning (hierarchical clustering).

1.  Simple English grammar in logic form
===============================
We'll use a very simple grammar consisting of 5 lines.  This will be used as a seed for machine learning later.
Requirements:  Some familiarity with computational linguistics.  Basic logic programming such as Prolog (you'll need some time to get acquainted with our logic, but it is very easy to learn and use).

--
KY
"The ultimate goal of mathematics is to eliminate any need for intelligent thought" -- Alfred North Whitehead

Genifer - ontology.pdf

Ivan Vodišek

unread,
Sep 3, 2012, 1:19:41 PM9/3/12
to general-in...@googlegroups.com
Hi KYK :)

I'm thrilled to read new stuff from U. I've red PDF and I'd like to make some constructive critics:

  1. i would like to know a list of examples that can be held by ontology (probably it is any kind of knowledge, but this is unclear from slides). Maybe explicitly noting the most successful example would make the document more interesting.
  2. purpose of matrices is not clear to me
  3. clustering purpouse and mechanism is not clear enough for me
I hoped this would help for sharing knowledge.

(I have worked on ontologies too and achieved some results. If U're interested, I could fix-up some explanation coctail of it for U.)

2012/9/3 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>

Matt Mahoney

unread,
Sep 3, 2012, 1:41:51 PM9/3/12
to general-in...@googlegroups.com
On Mon, Sep 3, 2012 at 2:46 AM, YKY (Yan King Yin, 甄景贤)
<generic.in...@gmail.com> wrote:
> Hi all,
>
> Our AI project needs 2 tasks to be completed, both are not too difficult,
> each may take approximately a few weeks. Our project aims to eliminate
> racism, sexism, and other forms of discrimination in the high-tech
> entrepreneurial space. Please support it!! =)
>
> 2. Ontology and concept classifier (See attached PDF)
> ===========================================
> Requirements: Some familiarity with machine learning (hierarchical
> clustering).

Why matrices? The normal representation of a semantic concept is a vector.

Do you have a more detailed specification that includes a test set and
scoring criteria?


> 1. Simple English grammar in logic form
> ===============================
> We'll use a very simple grammar consisting of 5 lines.

Oh, really? When did the number of rules drop from the millions down to 5?

> This will be used as
> a seed for machine learning later.

How would that work?

> Requirements: Some familiarity with computational linguistics. Basic logic
> programming such as Prolog (you'll need some time to get acquainted with our
> logic, but it is very easy to learn and use).

Also, what are the overall project goals? How do we know if they are
being achieved? I assume we are being paid in recognition for our
work, as opposed to being paid in play money or real money, so this is
important. What is the challenge?


-- Matt Mahoney, mattma...@gmail.com

Ivan Vodišek

unread,
Sep 3, 2012, 3:34:54 PM9/3/12
to general-in...@googlegroups.com
>> 1.  Simple English grammar in logic form
>> ===============================
>> We'll use a very simple grammar consisting of 5 lines.
>
>Oh, really? When did the number of rules drop from the millions down to 5?

I know I could use some kind civilized behavior and a sense of support if I was at YKY's place. He is doing a research for human kind, if U ask me.

A little bit more than five rules for English grammar are, but not that much more, if we are thinking of the same thing:
  1. nouns placements, plural forming rules
  2. verbs placements, tense forming rules
  3. other atom words placements and forming rules
  4. question, negation, direct speech
  5. more rules that I don't know about
I've gathered some useful links from SeH for extracting english knowledge from publicly available databases:
Can't find free downloadable database of English grammar rules, but U probably can buy that somewhere on the web.

Oh, hate to ruin party, but I have to do it, sorry, i have delayed this for a long time.

AI is pretty dangerous thing if U ask me. Combinatorial possibilities of trillions operations per second could think out in a minute a major hell to put us through, just on one tiny software or hardware mistake (i.e. substitute right for wrong). Personally, I WOULD NOT DARE to run autonomous AI program, even if there would exist one. I don't trust myself that much and no one should trust me that much.


I'm for distributed human controlled inducing and deducing stuff, that much seem humans can take.


2012/9/3 Matt Mahoney <mattma...@gmail.com>

Matt Mahoney

unread,
Sep 3, 2012, 4:49:50 PM9/3/12
to general-in...@googlegroups.com
On Mon, Sep 3, 2012 at 3:34 PM, Ivan Vodišek <ivanv...@gmail.com> wrote:
>>> 1. Simple English grammar in logic form
>>> ===============================
>>> We'll use a very simple grammar consisting of 5 lines.
>>
>>Oh, really? When did the number of rules drop from the millions down to 5?
>
> I know I could use some kind civilized behavior and a sense of support if I
> was at YKY's place. He is doing a research for human kind, if U ask me.

Nothing is going to be accomplished without a clear specification or
goal. After several years the only thing actually produced is a bunch
of slides about logic and a couple of toy programs.

> A little bit more than five rules for English grammar are, but not that much
> more, if we are thinking of the same thing:
>
> nouns placements, plural forming rules
> verbs placements, tense forming rules
> other atom words placements and forming rules
> question, negation, direct speech
> more rules that I don't know about

About half of a language model can be described using a 200 word
vocabulary and maybe a dozen grammar rules. The other half requires
millions of words and rules. We keep going down this path and it goes
nowhere.

> I've gathered some useful links from SeH for extracting english knowledge
> from publicly available databases:
>
> http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
> https://framenet.icsi.berkeley.edu/fndrupal/index.php?q=frameIndex
> http://www.mpi-inf.mpg.de/yago-naga/yago/

Yes, these are useful resources, but an AGI should be able to learn
this kind of knowledge from unlabeled text in the same way that humans
learn language.

> Can't find free downloadable database of English grammar rules, but U
> probably can buy that somewhere on the web.

Or write a program that learns grammar from a text corpus. For
example, if you cluster words in context space, you'll find that words
are grouped into parts of speech like nouns and verbs.

> Oh, hate to ruin party, but I have to do it, sorry, i have delayed this for
> a long time.
>
> AI is pretty dangerous thing if U ask me. Combinatorial possibilities of
> trillions operations per second could think out in a minute a major hell to
> put us through, just on one tiny software or hardware mistake (i.e.
> substitute right for wrong). Personally, I WOULD NOT DARE to run autonomous
> AI program, even if there would exist one. I don't trust myself that much
> and no one should trust me that much.
>
> I'm for distributed human controlled inducing and deducing stuff, that much
> seem humans can take.

There is no risk that we could ever build something dangerous.
Globally we pay people $70 trillion per year to do work that machines
aren't smart enough to do. Do you really think we are going to solve
this problem by ourselves? Do you have any idea what it will cost, in
hardware, software, and human knowledge collection, to replicate the
function of billions of human brains, or even a single brain?


-- Matt Mahoney, mattma...@gmail.com

Ivan Vodišek

unread,
Sep 3, 2012, 5:58:39 PM9/3/12
to general-in...@googlegroups.com
@kids
get a fuck away from this post, U might get serious damage of brain if U continue to read this. I SAD NOW!!!!

@ All others
Read this carefully if U want to be useful to human kind.

I think that human kind can do everything they can imagine. Once we taught we can't think of how to fire up personal fire for heating. Once we taught we can't heal diseases. Once we taught we can't fly on the sky. And once we taught we can't interfere evolution. People are breeding hybrids between human and animals now, I've seen terrifying photos of those creatures. I wander how those creatures feel like, are they facing fucking fear whole their life until they die. Scary enough? All because of experimenting with unknowns. Human kind can do it all, including making an autonomous artificial intelligence mechanism, we are smart kids.

Not afraid that machines could pepper down things on Earth? Not afraid of what machines could do to all living species in the Universe? Beyond the Universe? I'm terrified, sorry. All we need is one mistake to make things hell like. Afraid to kill all creatures in the Universe? How about making them feel fucking fear for whole eternity? Play a song with the fucking fear for eternity? And that is just of what I can think of. Imagine what can dig out zillion of combinatorial operations per microsecond in the future. I'll hide something more worst here, I think U've got the point. All we need is one mistake.

Get real, we are not criminals. There is no excuse for abusing power we have. No matter how smart we are, we do mistakes, U can't deny that. We have to be wise with the power of thinking, we would never forgive ourselves such a mistake. And I believe things could be much worse than in my description, we are aware only of what we have experienced.

Here U have what people suck up majorly these days on web. They invested a lot of time on it, an ontology language, try not to waste their achievement, use it for inspiration, use it wise:

I've got better ontology language. I've invested lot of time too and got some results. I fact that is the only thing I've been doing for past 15 years 4-9 hours per day. U might want to try to combine it with Ur achievements. Compare amount of keywords with previous example. Both examples do exactly the same thing. They describe knowledge:
Yessss, of course, maybe I made major mistakes in synth, maybe it is useless, U'll have to decide on that. I just don't want to hide synth from U, that's all. Fire up questions about it, I believe I've covered it all.

Now get sober from this torture and think about things again. I propose human controlled distributed database with deductive and inductive capabilities. That much I decided to trust human kind. Autonomous AGI is out of question, sorry.

2012/9/3 Matt Mahoney <mattma...@gmail.com>

Matt Mahoney

unread,
Sep 3, 2012, 7:59:25 PM9/3/12
to general-in...@googlegroups.com
On Mon, Sep 3, 2012 at 5:58 PM, Ivan Vodišek <ivanv...@gmail.com> wrote:
> I think that human kind can do everything they can imagine.

I didn't say that AGI is impossible. I said that we aren't going to be
the ones to build it.

The value of making machines smart enough to do all the work that
people can do is equal to the world GDP ($70 trillion) divided by
market interest rates. That is about US $1 quadrillion. Don't you
think that other people have been working on this? Already, a lot of
the simpler work has been automated.

People that think there is a simple solution to AGI haven't done a
cost estimate. The best known solutions to hard problems like
language, vision, and robotics use neural networks. A human brain
sized neural network has 10^15 connections and runs at 10 Hz. A
computer simulation requires 10 petaflops and 1 petabyte. Such
computers exist. They fill a large building and use 10 MW of power. If
you wait about 25 years, Moore's Law should bring the cost down to
make it competitive with human labor and we can build several billion
of them to automate the labor force.

There is also software. The human brain is complex. Evolution has
programmed in a lot of optimizations and hacks, hard coding lots of
specific functions like the sneeze reflex, fear of heights and
spiders, and an unknown algorithm for recognizing humor and good music
that you would need to replicate if you want to automate the
entertainment industry along with the rest of the economy. The
complexity is upper bounded by the information content of your DNA,
which is at most 6 x 10^9 bits. An equivalent program is about 10M
lines of code, which would cost about $1 billion at $100 per line. But
you only have to write it once and make lots of copies. Well, almost,
because humans are not genetically identical. It takes about 1000 bits
to describe your DNA given your parent's DNA, due mostly to mutations.
Assuming 100 bits per line of code, then you need 100B lines to
describe the diversity of human brains, or $10 trillion.

But whether it's $1 billion or $10 trillion I don't care, because
either way it is insignificant compared to the cost of hardware and
human knowledge collection. Knowledge will be the most expensive
component once the hardware becomes affordable. People communicate
successfully with other people because they can guess what the other
person knows and how they will act. The reason computers don't
understand us is because they don't have models of our minds like we
do of other people. A model of a mind is a function that takes sensory
input and returns a prediction of your actions. With a model of you, I
could predict what would make you happy, or what would make you buy
something. If I programmed a robot to carry out those predictions in
real time, then I would have an upload of you. Already, companies like
Google and Facebook are building crude models of your mind every time
you write a message. They use these models to predict which emails or
posts or ads are likely to interest you and which ones to block, and
increasingly to understand our messages and respond intelligently.

We can estimate the cost of acquiring this knowledge. According to
Landauer ( http://csjarchive.cogsci.rpi.edu/1986v10/i04/p0477p0493/MAIN.PDF
) human long term memory capacity is 10^9 bits. About 99% of this
knowledge is written down or is known to other people. I estimate this
percentage based on the U.S. Dept of Labor's estimate that it costs 1%
of lifetime earnings to replace an employee. This leaves 10^7 bits
known only to you. It can only be learned through human communication
channels like speech, writing, or typing, which have a rate of about 2
to 5 bits per second each way over a 2 way channel. Human time is
worth about $5 per hour, assuming global per capita income of $10K and
2000 hours per year. Thus, it costs $10K per person, or $100 trillion
for the world population of 10 billion that is likely in 25 years.
That cost will rise as the economy grows and wages go up.

Clearly, AGI will require years of global effort. So we need to think
hard about what we actually plan to build.


-- Matt Mahoney, mattma...@gmail.com

Ivan Vodišek

unread,
Sep 4, 2012, 2:15:50 AM9/4/12
to general-in...@googlegroups.com
Matt, U really think it is worth of risk? We (humans) are going to mess out something this way arount.

Sorry, I don't want that thing we build here. If we continue in these risky intentions, all I can do then is to hope that there is some higher nature force to make some fences for us.

I'll say no more on this theme. Probably never, I wasn't taken seriously enough anyway.

2012/9/4 Matt Mahoney <mattma...@gmail.com>

Matt Mahoney

unread,
Sep 4, 2012, 9:56:27 AM9/4/12
to general-in...@googlegroups.com
On Tue, Sep 4, 2012 at 2:15 AM, Ivan Vodišek <ivanv...@gmail.com> wrote:
> Matt, U really think it is worth of risk? We (humans) are going to mess out
> something this way arount.

As far as I can tell, you, I, and YKY are the only ones still
participating on this list. Maybe there are others lurking, but nobody
here is going to build self replicating robots, artificial life,
uploads, or any other kind of autonomous superhuman AI. Maybe there
are people who think they can, but I at least have no illusions of
grandeur.


-- Matt Mahoney, mattma...@gmail.com

Ivan Vodišek

unread,
Sep 4, 2012, 10:36:36 AM9/4/12
to general-in...@googlegroups.com
Sorry, human kind is grandeurous to me. YKY is after that one and I find it divine and ambitious. Yes, I believe YKY can do it, it requires a lot of work and i'm folowing this list for years. Maybe lot-of-generation span is required to build the damn thing up. It started with an invent of logic. It is scientific investigation and we need dreams to function.

2012/9/4 Matt Mahoney <mattma...@gmail.com>

james northrup

unread,
Sep 4, 2012, 3:07:39 PM9/4/12
to general-in...@googlegroups.com
i'm watching, if only to see the math bandied about, not that my strong point is following the theoretical.

having worked inside of and tinkered with knowledge bases and compression code and just about everything else, I haven't quite got my head around the math that binds smart handwaving with respectable algorithm implementations.  I can recognize the latter however and its quite a bit rarer than the former.

I see a lack of alife involved in ai discussions, which looks like building the roof of the house first to me.  humans observing and guiding an evolution of simple components towards goals might be more successful than a handful of competing philosophies and spikey exploratory projects imho, towards goal solving and genericity.   definitely a multi-generational thing in internet time, Ivan.

Ivan Vodišek

unread,
Sep 4, 2012, 4:40:21 PM9/4/12
to general-in...@googlegroups.com
YKY, please allow a suggestion. I've had some thinkings about another version of knowledge base in several attempts, but never did polute any success with that version. Matt, here U will see what problems typic loonytic AGI programmers are dealing with: 

This one would incorporate sets, unions, and other set operaions from mathematics. consider this analogy of sets with number manipulation

union = addition
difference = difference
intersect = something new
complement = negation


Numbers can form equations like

x + y + z = K

So, we can transform that expression to

x + y = K - z

If we had an "unions of three sets equation"

This one might be interesting. Take a pen, draw set diagrams and follow text. 

a U b U c = S

i think of transforming it in

a U b = S "intersect" "complement of" c 

I draw math graphs to check it and saw that this expression transformation holds for true. What happen to set "c"? When it changed equation side, it got complemented and it had to change union operator to intersect operator.

Let's see what happens with "union of two sets equation":

a U b = S

a = S "intersect" ("complement of" b)

This is not true, check the graph, but it gives a thinking material. To b true, we need to incorporate some general intersection of included sets:

a = "S "intersect" (("complement of" b) "union" "intersection of a, b")

Let's see what happens with "intersection of two sets equation":

a "intersect" b = S

a = S "union" "complement of" b

If we draw a graph here, we'll see that is not completely true. We need intersect some general domain that is union of sets included in equation: 

a = S "union" (("complement of" b) "intersect" "union of a, b")

by now we had to incorporate two weird expression: union of all sets and intersection of all sets.

Let's see what happens with "intersection of three sets equation":

a "intersect" b "intersect" c = S

a "intersect" b = S "union" "complement of" c

Not true also. This is the truth:

a "intersect" b = S "union" (("complement of" c) "intersect" "complement of (intersect of a, b, c)")


Now put the pencil on Ur ear :)

So in each case we have had transformation of union to intersect-complement and intersect to union-complement occasionally acompanioned by some weird union of all included sets / intersect of all included sets / complement of this weird all sets formation. I feel that something big is going on here, something that we could use to get transformable bipolar set knowledge base. But I didn't manage to clue the rules for transformations. I tried several times between long periods of doing something else and did nothing that would give complete solution. But I feel something really big is going on here. Well, I have the rest of my life to solve it. I'd like to get know if anyone else solved this one in the future.

Matt, this is pain in the ass for us, but we can do it. Just think of joy we could give to the world. It's only a matter of time and at least some of our experiments will succeed. We want to give U something U might be able to use, please have some understanding.

2012/9/4 james northrup <northru...@gmail.com>

Matt Mahoney

unread,
Sep 4, 2012, 4:58:48 PM9/4/12
to general-in...@googlegroups.com
On Tue, Sep 4, 2012 at 4:40 PM, Ivan Vodišek <ivanv...@gmail.com> wrote:
> Matt, this is pain in the ass for us, but we can do it. Just think of joy we
> could give to the world. It's only a matter of time and at least some of our
> experiments will succeed. We want to give U something U might be able to
> use, please have some understanding.

I do think that AGI is possible. But it is bigger than any of us can
solve. I have proposed a framework for solving it in
http://mattmahoney.net/agi2.html which maybe you have already read
since it is 4 years old. The protocol is not hard to implement, but
not a solution by itself. It is like defining the first versions of
the HTTP and HTML protocols, as opposed to building the web. It took
one person 6 weeks to write the first version of the NCSA web server
and Mosaic browser, and it ultimately changed the way that billions of
people use the internet.


-- Matt Mahoney, mattma...@gmail.com

Sandeep Pai

unread,
Sep 4, 2012, 5:01:15 PM9/4/12
to general-in...@googlegroups.com
Why haven't you implemented it yet?

Matt Mahoney

unread,
Sep 4, 2012, 5:11:08 PM9/4/12
to general-in...@googlegroups.com
On Tue, Sep 4, 2012 at 5:01 PM, Sandeep Pai <sandee...@gmail.com> wrote:
> Why haven't you implemented it yet?

It wouldn't be very useful until millions of people start using it. I
could write it in a few weeks, but it would take years before anyone
found it useful. Making it something that people wanted to use is a
much harder problem than making it work.

The idea is that you could post a message and it would go to anyone
who cared. The client would learn your preferences from the messages
you post, and send you messages with similar content. Facebook does
something similar when it ranks messages, and it has the advantage of
already having a billion users.

The other useful thing it does is distributed indexing. It could be
1000 times bigger than Google, but before that happens it has to get
over the hump of being as big as Google. Otherwise people will just
use Google and it won't grow.

AGI is being built. That will happen with or without my proposed
protocol. It's just that the agents will use a more complex hodgepodge
of protocols to talk to each other. Or you could argue that I'm just
adding one more protocol to the hodgepodge.


-- Matt Mahoney, mattma...@gmail.com

Ivan Vodišek

unread,
Sep 4, 2012, 5:12:52 PM9/4/12
to general-in...@googlegroups.com
I liked that hill climbing method from the framework :)

2012/9/4 Matt Mahoney <mattma...@gmail.com>

YKY (Yan King Yin, 甄景贤)

unread,
Sep 6, 2012, 2:51:48 AM9/6/12
to general-in...@googlegroups.com
On Tue, Sep 4, 2012 at 1:41 AM, Matt Mahoney <mattma...@gmail.com> wrote:

Why matrices? The normal representation of a semantic concept is a vector.

Matrices because A•B ≠ B•A
         john•loves•mary ≠ mary•loves•john
If it were, our world would have much fewer troubles =)

Do you have a more detailed specification that includes a test set and
scoring criteria?

It can be used to refine Google search results, if the query is a text question and the results are textual...

> 1.  Simple English grammar in logic form
> ===============================
> We'll use a very simple grammar consisting of 5 lines.

Oh, really? When did the number of rules drop from the millions down to 5?

The toy grammar is:
    Sentence ::= NP VP
    NP ::= Pronoun
    Pronoun ::= "I" | "you"
    VP ::= Verb NP
    Verb ::= "love"
It can only recognize 2 sentences, but they're important =)

Remember that parsing can be broken into syntactic and semantic parsing.  The latter generates logic formulas as target, and is quite difficult because of the lack of a nice logical form.

Now the good news is that we can do syntax and semantic parsing in 1 step, using a very elegant formulation.  For example, we'd say informally:

    "If x is a noun phrase and y is a verb phrase, and y follows x, then x concatenated with y would be a sentence".

This is complicated to do in classical logic, but in our new logic it can done easily:
    x Є NP, y Є VP, x follows y -> x*y Є Sentence
where "Є" is the set-element symbol.

>  This will be used as
> a seed for machine learning later.

How would that work?

Above, I've shown how parsing can be done purely as logic inference.  Logic rules such as the one above can be learned by machine.

Also, what are the overall project goals? How do we know if they are
being achieved? I assume we are being paid in recognition for our
work, as opposed to being paid in play money or real money, so this is
important. What is the challenge?

A possible application is to refine search engine results, to answer questions.  Algorithm:

1.  Translate question to matrix form.  If more than 1 proposition, they are the union of several points.

2.  Scan document sentence by sentence, find their locations in matrix space.  A match is found when the point(s) occur in the proximity of the target point(s).

Other applications may be possible too....
KY

Ivan Vodišek

unread,
Sep 6, 2012, 5:40:51 AM9/6/12
to general-in...@googlegroups.com
YKY, could U make a use of javascript enriched shift-reduce parser?

U'd have to equip me with BNF rules for parsing and I would readjust my existing code to parse any text by those rules. There would be exposed one public function.

function parameters would be:
  • text for parsing
function would returned:
  • javascript object containing syntax tree of parsed text
Right now the code I have is adjusted to parse BNF rules string and produce syntax tree of those rules (currently I had to put dot-comma at the end of every line which is not the case in standard BNF wiki definition). In a weak or two, if I'd get lucky, i'd be able to publish parser for readjusted rules. U can check current version on: http://styled.host-ed.me/sharing/bnf-parser-1.html

Of course, it would be ideal to have universal function that can derive syntax tree of text equipped with second parameter - string of BNF rules for parsing that text. But it is complex stuff, no one has done it yet and there are indicies that it can't be done at all (I think it is connected to famous program halting problem). Of course, I've been working on that one too (silly me), but I didn't yet covered all possible combinations of grammars, I get infinite loops for some grammars.

Anyway, maybe Ur grammar could work with readjusted code, and if mentioned javascript enriched shift-reduce approach suits U, I'd be happy to be of help.

2012/9/6 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>

Matt Mahoney

unread,
Sep 6, 2012, 11:16:25 AM9/6/12
to general-in...@googlegroups.com
On Thu, Sep 6, 2012 at 2:51 AM, YKY (Yan King Yin, 甄景贤)
<generic.in...@gmail.com> wrote:
> On Tue, Sep 4, 2012 at 1:41 AM, Matt Mahoney <mattma...@gmail.com>
> wrote:
>
>> Why matrices? The normal representation of a semantic concept is a vector.
>
>
> Matrices because A•B ≠ B•A
> john•loves•mary ≠ mary•loves•john
> If it were, our world would have much fewer troubles =)

Matrix multiplication is associative. A(BC) = (AB)C, but pretty
(girl's school) ≠ (pretty girl's) school.

And what do the matrix elements represent anyway?

>> Do you have a more detailed specification that includes a test set and
>> scoring criteria?
>
> It can be used to refine Google search results, if the query is a text
> question and the results are textual...

OK. Are we going to collect a few hundred Google search results and
re-rank them by hand?

>> > 1. Simple English grammar in logic form
>> > ===============================
>> > We'll use a very simple grammar consisting of 5 lines.
>>
>> Oh, really? When did the number of rules drop from the millions down to 5?
>
>
> The toy grammar is:
> Sentence ::= NP VP
> NP ::= Pronoun
> Pronoun ::= "I" | "you"
> VP ::= Verb NP
> Verb ::= "love"
> It can only recognize 2 sentences, but they're important =)

I count 4, of which 2 are grammatically incorrect and one seems
unlikely. So I guess you need more rules. How many rules do you need
to recognize all of English?

> Remember that parsing can be broken into syntactic and semantic parsing.
> The latter generates logic formulas as target, and is quite difficult
> because of the lack of a nice logical form.
>
> Now the good news is that we can do syntax and semantic parsing in 1 step,
> using a very elegant formulation. For example, we'd say informally:
>
> "If x is a noun phrase and y is a verb phrase, and y follows x, then x
> concatenated with y would be a sentence".
>
> This is complicated to do in classical logic, but in our new logic it can
> done easily:
> x Є NP, y Є VP, x follows y -> x*y Є Sentence
> where "Є" is the set-element symbol.
>
>> > This will be used as
>> > a seed for machine learning later.
>>
>> How would that work?
>
>
> Above, I've shown how parsing can be done purely as logic inference. Logic
> rules such as the one above can be learned by machine.
>
>> Also, what are the overall project goals? How do we know if they are
>> being achieved? I assume we are being paid in recognition for our
>> work, as opposed to being paid in play money or real money, so this is
>> important. What is the challenge?
>
> A possible application is to refine search engine results, to answer
> questions. Algorithm:
>
> 1. Translate question to matrix form. If more than 1 proposition, they are
> the union of several points.

What is the mapping of words to matrices?

> 2. Scan document sentence by sentence, find their locations in matrix
> space. A match is found when the point(s) occur in the proximity of the
> target point(s).
>
> Other applications may be possible too....
> KY

Google doesn't parse queries or documents. It starts with matching
words and phrases in the query to the document. It also matches
synonyms and grammatical variants, which it learns because related
words are likely to occur on the same document. It uses PageRank,
which increases the rank of pages that have links to them from other
highly ranked pages. It also increases the rank of any pages you
visit, which it can do because they give away a browser that collects
these statistics. It personalizes results by increasing the rank more
for you than for others, and by learning your preferences, whether by
using Chrome or an Android phone or gmail or youtube or any of their
other services. Their algorithm is fast because they have enough
computing power to keep a copy of the internet cached entirely in RAM.

Please tell me how you plan to compete with that.

--
-- Matt Mahoney, mattma...@gmail.com

YKY (Yan King Yin, 甄景贤)

unread,
Sep 6, 2012, 1:40:03 PM9/6/12
to general-in...@googlegroups.com
On Thu, Sep 6, 2012 at 11:16 PM, Matt Mahoney <mattma...@gmail.com> wrote:

> Matrices because A•B ≠ B•A
>          john•loves•mary ≠ mary•loves•john
> If it were, our world would have much fewer troubles =)

Matrix multiplication is associative. A(BC) = (AB)C, but pretty
(girl's school) ≠ (pretty girl's) school.

In my formulation,
    pretty (girls' school)  ==>  (pretty, girls') * school
    (pretty girls') school  ==>  pretty * girls' * school
because in the first case, both "pretty" and "girls' " modify "school".

And what do the matrix elements represent anyway?

Well, it's a bit complex.  Referring to these slides (slightly updated).  Basically, from an ontology of concepts we can distribute concepts in a high dimensional vector space (slides 3,4,5,12,13,14,15).  Then each vector corresponds to a matrix simply by rearranging entries in a square matrix (slide 11).  In the vector space the distance is given by the Euclidean norm, whereas in the matrix space the distance is given by the Frobenius norm (which is the same), and doing so will respect invariance under rotation.  Thus we have the matrices for each concept.

> It can be used to refine Google search results, if the query is a text
> question and the results are textual...

OK. Are we going to collect a few hundred Google search results and
re-rank them by hand?

The purpose is not just to re-rank search results, but we can select from a bunch of results to find the text that specifically answers a question.  It can be a complex sentence or even a story with a few lines.

I count 4, of which 2 are grammatically incorrect and one seems
unlikely. So I guess you need more rules. How many rules do you need
to recognize all of English?

Yes, the case of "me" pronoun could serve as a test for machine learning -- see if our system can learn when to use "me" instead of "I".  In classical (predicate) logic this involves "predicate invention" which is considered difficult.  I have not looked into this yet, but my new logic may have some advantages for this task...

As for how many grammar rules we need, I have no idea yet.  Could be up to millions of rules, but the core generalizations could be much fewer.  I've asked this question on StackOverflow.

> 1.  Translate question to matrix form.  If more than 1 proposition, they are
> the union of several points.

What is the mapping of words to matrices?

Explained in the slides.

Google doesn't parse queries or documents. It starts with matching
words and phrases in the query to the document. It also matches
synonyms and grammatical variants, which it learns because related
words are likely to occur on the same document. It uses PageRank,
which increases the rank of pages that have links to them from other
highly ranked pages. It also increases the rank of any pages you
visit, which it can do because they give away a browser that collects
these statistics. It personalizes results by increasing the rank more
for you than for others, and by learning your preferences, whether by
using Chrome or an Android phone or gmail or youtube or any of their
other services. Their algorithm is fast because they have enough
computing power to keep a copy of the internet cached entirely in RAM.

Please tell me how you plan to compete with that.

We don't need to compete head-on with Google, but rather post-process Google's search results, to better answer complex queries.

Also, this is just one possible application....

KY 

Matt Mahoney

unread,
Sep 6, 2012, 2:20:39 PM9/6/12
to general-in...@googlegroups.com
On Thu, Sep 6, 2012 at 1:40 PM, YKY (Yan King Yin, 甄景贤)
<generic.in...@gmail.com> wrote:
> On Thu, Sep 6, 2012 at 11:16 PM, Matt Mahoney <mattma...@gmail.com>
> wrote:
>
>> > Matrices because A•B ≠ B•A
>> > john•loves•mary ≠ mary•loves•john
>> > If it were, our world would have much fewer troubles =)
>>
>> Matrix multiplication is associative. A(BC) = (AB)C, but pretty
>> (girl's school) ≠ (pretty girl's) school.
>
>
> In my formulation,
> pretty (girls' school) ==> (pretty, girls') * school
> (pretty girls') school ==> pretty * girls' * school
> because in the first case, both "pretty" and "girls' " modify "school".

Actually I used the parenthesis to mean the school that the pretty girl goes to.

>> And what do the matrix elements represent anyway?
>
>
> Well, it's a bit complex. Referring to these slides (slightly updated).
> Basically, from an ontology of concepts we can distribute concepts in a high
> dimensional vector space (slides 3,4,5,12,13,14,15). Then each vector
> corresponds to a matrix simply by rearranging entries in a square matrix
> (slide 11). In the vector space the distance is given by the Euclidean
> norm, whereas in the matrix space the distance is given by the Frobenius
> norm (which is the same), and doing so will respect invariance under
> rotation. Thus we have the matrices for each concept.

I don't see the point. The elements of a semantic vector represent
associations to other words or concepts. The dot product of two word
vectors tells you the extent that the words are related to each other.
The order of the elements is arbitrary, as long as they are
consistent. If you rearrange the elements into a matrix and multiply,
then you are combining unrelated terms in a way that depends on this
arbitrary order.


-- Matt Mahoney, mattma...@gmail.com

YKY (Yan King Yin, 甄景贤)

unread,
Sep 6, 2012, 2:49:03 PM9/6/12
to general-in...@googlegroups.com
On Fri, Sep 7, 2012 at 2:20 AM, Matt Mahoney <mattma...@gmail.com> wrote:

I don't see the point. The elements of a semantic vector represent
associations to other words or concepts. The dot product of two word
vectors tells you the extent that the words are related to each other.
The order of the elements is arbitrary, as long as they are
consistent. If you rearrange the elements into a matrix and multiply,
then you are combining unrelated terms in a way that depends on this
arbitrary order.


Yes, the dot product (= Euclidean distance between 2 vectors) tells you how similar 2 concepts are.

The entries of each vector do not have specific meanings, insofar as the entire vector represents 1 concept.  It's the distance between vectors that matters.

If we translate a vector to matrix, the resulting matrix represents a linear transformation.  These linear transformations can be composed (ie, multiplication).

If we multiply 2 matrices, we get a 3rd matrix, corresponding to the composed concept.  Then we can convert it back to the vector space.  This conversion preserves distances, in the sense that d(A,B) in matrix space = d(f A, f B) in vector space.

The point of this is to get the "location" of the new composite, so we can measure distances between it and other composites...

KY

YKY (Yan King Yin, 甄景贤)

unread,
Sep 6, 2012, 3:06:19 PM9/6/12
to general-in...@googlegroups.com
A bit more explanation:

Forget about vectors, 'cause they are secondary.

What we want is really to represent concepts as matrices, since they have the non-commutative multiplication.  We can multiply them and measure the distances between them (the Frobenius norm).

It just happens that matrices can be vectorized while the Frobenius norm becomes the Euclidean norm (ie dot product between vectors).  So it is easier to visualize matrices as vectors, and think of their "positions" in space.

Does it make sense now?
KY

Matt Mahoney

unread,
Sep 6, 2012, 10:07:36 PM9/6/12
to general-in...@googlegroups.com
In a matrix representation of a semantic language model, the rows and
columns are words from the vocabulary, and the elements are the
probabilities that these words will appear near each other. Thus, each
word is a row or column vector, and the matrix is symmetric, which
makes multiplication commutative. The square of this matrix gives you
inferred similarities, like "snow" and "water" are related to each
other because they are mutually related to "cold" or "wet".

To represent word ordering constraints (grammar), you can construct
another matrix where the elements give the probabilities of one word
immediately following another. This matrix is not symmetric. The n'th
power of this matrix gives the probability that one word will be
followed by the other n places later.

But I guess you have something different in mind.

YKY (Yan King Yin, 甄景贤)

unread,
Sep 8, 2012, 5:16:54 AM9/8/12
to general-in...@googlegroups.com
On Thu, Sep 6, 2012 at 5:40 PM, Ivan Vodišek <ivanv...@gmail.com> wrote:
YKY, could U make a use of javascript enriched shift-reduce parser?

Good news:  the "John loves Mary" grammar is successfully parsed (by the Clojure prototype), using only 6 logic rules.

Perhaps you can help translate (with your scripts) an existing grammar into Genifer logic?

There are a few such grammars on the web, eg Attempto.

Here are my Genifer logic rules:

1)  NP VP is-a sentence <- NP is-a noun-phrase, VP is-a verb-phrase, text Fill1 NP VP Fill2

2)  V NP is-a verb-phrase <- V is-a verb, NP is-a noun-phrase, text Fill1 V NP Fill2

3)  N is-a noun-phrase <- N is-a noun

4) X is-a verb <- lexeme-loves X

5) X is-a noun <- lexeme-John X

6) X is-a noun <- lexeme-Mary X

The input is:
    text john loves mary
    lexeme-John john
    lexeme-loves loves
    lexeme-Mary mary

=)
KY

Ivan Vodišek

unread,
Sep 8, 2012, 6:41:23 AM9/8/12
to general-in...@googlegroups.com
YKY, I'm glad U might be able to make a use of the parser.

I've decided that I'll need so much an universal parser an I'll try to implement custom grammar manager adjustments available for scientists to adjust it manually according to their needs. It seems that natural language might be a source of interesting data to scientists from whom humanity can have much benefits. 

Attempto? Tx, i'll use it as a test for universal parser if I'd be able to extract just grammar rules without code.

I'll inform U of my results in a week. Maybe that is enough time for a whole stuff.

2012/9/8 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>

Ivan Vodišek

unread,
Sep 18, 2012, 1:59:10 PM9/18/12
to general-in...@googlegroups.com
People, do U know what could be done with artificial intelligence?

Yes, it is that much important, it could solve the most important problem on whole Earth!

The problem of finding another solution for survival of living beings could be invented. We could investigate rules of Universe with Genifer to find out a way for inventing artificial food. Animals, very beautiful beings would not be violated and undermined by us.

YKY, I love You very very very very very very.

Yes, it is that much important to me.

Thank You for being that important for humanity. Living beings need friends and I think You are the one.

Thank You very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very very

I did not used copy-paste solution to write last expression 

I would do everything for You

Love here from me, I would love to be Your friend.

2012/9/8 Ivan Vodišek <ivanv...@gmail.com>
Reply all
Reply to author
Forward
0 new messages