Is there a wiki page for this project? I forget.
2011/7/19 YKY (Yan King Yin, 甄景贤) <generic.in...@gmail.com>:
Do you have a wiki page somewhere with links to "Background Reading"?
Is there a wiki page for this project? I forget.
Wikis are sticky. I recommended brushing it off or starting a new one.
--
Charles Esterbrook
http://charles-esterbrook.com
Wikis are sticky. I recommended brushing it off or starting a new one.
-- I would just use a Google Project which gives you a wiki, along
with a Google Group which gives you a forum. I went the custom route
with Cobra and now I have to spend time on maintenance. :-(
-- I'm limited in time right now, so I'm limited to discussions. I
don't think this will change for awhile.
-- I'm partial to the BSD or MIT licenses. I forget what you're using,
but I don't think it's one of those.
-- A member of the Cobra community is working on the JVM back-end and
has made progress recently, but it's a long road and I don't think it
will be ready for several months or more. Despite the long time line,
that will be my approach in the future to writing reasonably fast
software that runs on both .NET and JVM.
-- If you make progress on the software and provide instructions for
set-up, I may be able to squeeze in some testing and feedback.
-- I would just use a Google Project which gives you a wiki, along
with a Google Group which gives you a forum.
-- I'm partial to the BSD or MIT licenses. I forget what you're using,
but I don't think it's one of those.
-- A member of the Cobra community is working on the JVM back-end and
has made progress recently, but it's a long road and I don't think it
will be ready for several months or more. Despite the long time line,
that will be my approach in the future to writing reasonably fast
software that runs on both .NET and JVM.
-- If you make progress on the software and provide instructions for
set-up, I may be able to squeeze in some testing and feedback.
Cats need herding. Point us in your preferred direction.
You don't have to repeat their content--just specify project stuff and
then point to the other wikis for more info.
If you're really going to work through OpenCog then obviously this is
moot. But then I get the impression that you're working outside
OpenCog.
>> -- A member of the Cobra community is working on the JVM back-end and
>> has made progress recently, but it's a long road and I don't think it
>> will be ready for several months or more. Despite the long time line,
>> that will be my approach in the future to writing reasonably fast
>> software that runs on both .NET and JVM.
>
> Do you really expect people to write an AGI in Cobra?
What I expect is that people will do their AGI work in whatever they
like which includes C, C++, Java, C#, Haskell, Ocaml, LISP, etc.
> It seems that you're creating a cross-platform language and you expect it to
> take over the world. I'm neutral about that, but it just appears
> unrealistic to me.
I don't expect it to "take over the world", but I do expect that upon
having a mature JVM back-end, it will be significantly more popular,
which is generally beneficial for any open source community.
Therefore, I'm looking forward to it.
I also enjoy productivity benefits in Cobra that I don't find in C#,
Java, Python, Ruby and other languages. I also have the power to
improve it as needed which I can't generally do with the others (in a
practical sense).
> So the precondition for you to actively participate is if we all code in
> Cobra? Would you accept other possibilities...?
Practically, I don't have time to dive into something interesting
right now like Ocaml or Haskell--I wish I did! Languages I would never
use for AGI are C, C++ and Java, for reasons I think we have already
covered.
I'd probably have an easier time with C# (and F#) because I use it at
my day job, so there is less cognitive load when switching between
projects.
But these points are largely specific to me. Well I do I have arguable
reasons why some languages are bad and some are good. But putting that
aside, those views vary per person. I don't expect you to manage
Genifer based solely on my views. I offer them as input, like others
here do.
I stand by my earlier statement that you should quit worrying who will
join and who won't. Pick a decent language, set some project goals and
then recruit new folks to help out.
You seem to like LISP and have established that it can run on .NET,
JVM and outside. And it wouldn't be unheard of to use LISP on an AI
project. :-)
>> -- If you make progress on the software and provide instructions for
>> set-up, I may be able to squeeze in some testing and feedback.
>
> Right now I'm doing 100% of the coding. I can do it, but it's going very
> slow, and I also believe that things should not be like this. What have I
> done wrong??
Well you change your mind a lot. :-)
I used to do 100% of the coding on the Cobra project, but now there
are other contributors. I had to lead out on my own (and wanted to
since I had very specific goals in mind).
I suggest you consider "recruiting" as a regular, ordinary activity.
To do it, you will need some goals, guidelines, etc. that you can
present. And you need some stable decisions. Good luck.
There is no document telling anyone else what they are supposed to write.
"YKY has done the Lisp version of this but needs some help translating to Haskell."
Point me to the Lisp version, and I'll go wild with translation.
And then switch again when the next developer says "Wah, I want to use
bzr!" and then switch again when the next developer says "Wah, I want
to use hg!" and then darcs, git, ...
Let's switch every 6 months so each person can use their favorite tool
at least once.
;-)
In my experience, Git, alas, is preferred.
> PPS: if you don't like using Hg, we're considering switching to Git. What's your preference?In my experience, Git, alas, is preferred.
YKY's list of tasks still lacks a specification. A specification describes what the software will do and not how it will work. I guess I should give up on trying to get YKY to write one and just write it myself. Can we agree on the following?
1. We are not going to automate a $60 trillion/year global economy, launch a singularity, or solve the life extension or uploading problems by ourselves.
2. We are not going to solve AGI by any definition that implies (1).
3. In particular we are not going to write a programming assistant that rewrites a smarter version of itself. It is mathematically impossible.
So I would like to discuss what you think it might be worthwhile to work on. Keep in mind that you won't get paid. I'm not going to make any pretense over shares or virtual credits or play money. This is not a company or business. Any reward you get for your efforts will be to your reputation for having written something cool, that has an impact on the way people use computers. You are free to work on whatever you want and do it how you want, or not. The purpose of this discussion is to coordinate our actions. Whether or not your code is useful will depend on what others decide to write. So here are some suggestions:
1. Write a Mailpool server. It should be a website that presents the user with an email client that lacks a "to" box. Initially it might just present a list of every other message posted by anyone else, sorted from newest to oldest. The ranking problem can be solved later. The important requirement at this point is that it should implement the CMR protocol described in the appendix of http://mattmahoney.net/agi2.html specifically HTTP handshake and Diffie-Hellman key exchange, unless you can suggest something better. Once others start independently writing peers, we will be stuck with whatever protocol we have chosen.
2. Solve the ranking problem for Mailpool for text messages. This is a language modeling problem. The success of your algorithm will ultimately be measured by user feedback and the prices you can charge for advertising on an open market. I believe that this will correlate with compression ratio, so I will probably work in this area. You may prefer a different approach. I have suggested that OpenCog could implement a peer, for example.
3. Extend the ranking problem to images and video (i.e. solve the vision problem) such as matching names to faces.
The success of this project will depend on others finding it useful. I suggest publishing your code open source (GPL, MIT or whatever license you feel is appropriate). I don't care what language you write it in.
Comments?
-- Matt Mahoney, matma...@yahoo.com
"Solve the ranking problem for Mailpool for text messages" - better
language understanding/search/text mining would certainly have many
uses, Mailpool among them. One guy I know who's doing biomedical
research told me the one thing AI researchers could possibly come up
with in the near future that would really make a difference to him
would be something that could sift through thousands of published
papers and bring to his attention the ones most relevant to his work,
based on understanding the content not just keyword search.
The question is how to break down this task further, including how to
do it in such a way as to make use of work other people are already
doing. It's too big a job for one person to take on as a single chunk.
For example, I'm trying to figure out how to write a general-purpose
inference engine; that's a plenty big enough job for me, and it would
only be one component of a system that could do a reasonably thorough
job of understanding English.
So if you want to go that route, I'm inclined to think a good starting
point would be to survey what relevant work is already being done in
the area. For example, I know the OpenCog guys are doing some work on
natural language understanding. Would it be possible to make use of
their code? If so, how? If not, why not?
However, they have not solved the natural language problem either. They are still using traditional parsing techniques, which as we all know, doesn't work very well.
I don't know a good solution that isn't expensive. Specialization reduces the computational load, but there is still a lot of human knowledge that the system needs that isn't written down. That's why I'm suggesting Mailpool as our initial approach. As people use it, the knowledge will be collected.
-- Matt Mahoney, matma...@yahoo.com
1. Write a Mailpool server. It should be a website that presents the user with an email client that lacks a "to" box. Initially it might just present a list of every other message posted by anyone else, sorted from newest to oldest. The ranking problem can be solved later. The important requirement at this point is that it should implement the CMR protocol described in the appendix of http://mattmahoney.net/agi2.html specifically HTTP handshake and Diffie-Hellman key exchange, unless you can suggest something better. Once others start independently writing peers, we will be stuck with whatever protocol we have chosen.
2. Solve the ranking problem for Mailpool for text messages. This is a language modeling problem. The success of your algorithm will ultimately be measured by user feedback and the prices you can charge for advertising on an open market. I believe that this will correlate with compression ratio, so I will probably work in this area. You may prefer a different approach. I have suggested that OpenCog could implement a peer, for example.
What do you mean by different areas? Subject matter? Parsing methods?
Something else?
The primary use of P2P is to evade censorship. It would be more useful if there search indexes for it. Even that doesn't work well because search is controlled by a few large engines like Google and Bing. Hopefully Mailpool would solve that problem by implementing distributed indexing.
P2P is hampered by the difficulty of running servers on home computers. There are problems with slow upload speed, multiple firewalls blocking incoming requests, dynamic IP addresses, the lack of DNS names, reliability, and security. It is easier to rent space from a hosting service or cloud and let someone else solve those problems. CMR could then be implemented as a CGI script.
The CMR protocol is intended for peer to peer communication. A peer is both a server and a client. Peer to user communication can still be done through a standard web server and browser.
It would be good to see CMR implemented over multiple transport protocols like HTTP, email, and existing P2P. A CMR address is a string that uniquely identifies a peer. It could be a URL or an email address.
>Message content will be written in different ways and in different narrative modes.
Messages should be human-understandable. CMR doesn't require anything more specific than that. For example, if a message was translated from French, the translator might put at the beginning "(Translated from French by X)" or something.
>Some service peers on the network can be programmed to respond with suggestions for a revisions of messages you author, and the process repeats if a revised message is transmitted again. It can correct spelling, suggest extra information you may want to include, or show an example of a similar message.
There is no provision for updating or deleting messages. It would be a security risk. Instead, you post another message with a newer timestamp.
-- Matt Mahoney, matma...@yahoo.com
By topic. For example, one peer might be an expert on the history of aspirin. It might respond to any query with the words "history" and "aspirin" in it with a 1 page article on the subject that was prepared in advance. A more intelligent version might respond to other queries like "when was aspirin discovered?". The idea is to respond to X with Y whenever the mutual information between X and Y is high.
More generally, the topic may be chosen implicitly by its users. The peer would keep a copy of X. Later it might receive a message Z that had high mutual information, like "Bayer patented aspirin in 1900 according to http://inventors.about.com/library/inventors/blaspirin.htm". Then it would forward Z to the sender of X, and X to the sender of Z. In this way, a peer would send, receive, and accumulate lots of messages that had a lot of mutual information, even if it was not initially a specialist on any particular topic.
AGI comes from having billions of such specialists covering every conceivable topic.
We do not have to build billions of narrow experts. We only need to write the software on which they will run. People have an incentive to publish data of their choosing and pay for making it available. That is why the above webpage exists, and the reason that Google exists so that I could easily find it. The CMR protocol allows peers to set their own policies with regard to replying to, saving, or deleting messages. Peer owners can charge fees to give user messages preferential rankings to offset the cost of running a peer. They will have an incentive to provide high quality information that people want so that they can charge higher fees.
-- Matt Mahoney, matma...@yahoo.com
Okay. So presumably the software can be divided into a framework
common to all topics, plus topic-specific content. The task for AI
researchers then is...
> We do not have to build billions of narrow experts. We only need to write the software on which they will run.
... to design the framework, exactly. Any ideas on how to break down
that task into smaller chunks?
Sent: Monday, July 25, 2011 11:18 AM
Subject: Re: [GI] Direction of Genifer
The more difficult part of the job, however, starts when the message
has arrived at the right node tagged for the attention of the right
agent: how does that agent actually understand what the message means?
That's the part we don't know how to do yet.
It's what we're doing now. One nice feature of Mailpool as conceived
so far is that the infrastructure is straightforward enough to build.
The intelligence grows into/onto it as it proves itself. Right now
you could measure responses as part of a message's "value" or simply
+1 a message to keep in cache. Eventually the old "important"
messages are no longer important, so they'll age out. Also for narrow
domains the relevance of certain ideas will be predictable enough to
offload the responsibility to the agent that manages it for you.
There was once a time that spam was a manual process we all had to
deal with - I haven't manually managed spam in years. People
subscribe to RSS, at some point they've subscribed to so much nonsense
that it's too difficult to properly unsubscribe from the bulk of it -
so we'll need automation to filter even the opt-in content we asked to
receive. The Mailpool interface can grow in complexity as the users
and the system evolve with it.
forgive me if I say something about Mailpool that isn't either already
in print or still in Matt's head - I feel like so much of it is
synchronicity.
I have looked at YKY's book, slides, Lisp code, and list of tasks. I would classify the code as experimental. It shows that inductive inference is possible on toy problems by exhaustive search over a tiny hypothesis space tested on a tiny discrete knowledge base. We may learn something from this code. In particular, it obviously won't scale. First, the algorithm would be horribly inefficient over a knowledge base on the scale of human intelligence, about 10^8 facts and 10^7 rules.
Second, we already know from experiments dating back to the 1960's (ELIZA, SHRDLU) that encoding all of human knowledge in Lisp is not feasible. This must also be true of all formal languages (Cycl, FOL, HOL, Progol, Erlang, Haskell, F#, C++, whatever) because it is not hard to write a converter to go from any implemented formal language to any other. So something other than the choice of knowledge representation must be the hard step. Arguing over languages is avoiding the question and a waste of time.
YKY's list of tasks still lacks a specification. A specification describes what the software will do and not how it will work. I guess I should give up on trying to get YKY to write one and just write it myself. Can we agree on the following?
1. We are not going to automate a $60 trillion/year global economy, launch a singularity, or solve the life extension or uploading problems by ourselves.
2. We are not going to solve AGI by any definition that implies (1).
3. In particular we are not going to write a programming assistant that rewrites a smarter version of itself. It is mathematically impossible.
So I would like to discuss what you think it might be worthwhile to work on. Keep in mind that you won't get paid. I'm not going to make any pretense over shares or virtual credits or play money. This is not a company or business. Any reward you get for your efforts will be to your reputation for having written something cool, that has an impact on the way people use computers. You are free to work on whatever you want and do it how you want, or not.
The purpose of this discussion is to coordinate our actions. Whether or not your code is useful will depend on what others decide to write. So here are some suggestions:
1. Write a Mailpool server. It should be a website that presents the user with an email client that lacks a "to" box. Initially it might just present a list of every other message posted by anyone else, sorted from newest to oldest. The ranking problem can be solved later. The important requirement at this point is that it should implement the CMR protocol described in the appendix of http://mattmahoney.net/agi2.html specifically HTTP handshake and Diffie-Hellman key exchange, unless you can suggest something better. Once others start independently writing peers, we will be stuck with whatever protocol we have chosen.
2. Solve the ranking problem for Mailpool for text messages. This is a language modeling problem. The success of your algorithm will ultimately be measured by user feedback and the prices you can charge for advertising on an open market. I believe that this will correlate with compression ratio, so I will probably work in this area. You may prefer a different approach. I have suggested that OpenCog could implement a peer, for example.
3. Extend the ranking problem to images and video (i.e. solve the vision problem) such as matching names to faces.
The success of this project will depend on others finding it useful. I suggest publishing your code open source (GPL, MIT or whatever license you feel is appropriate). I don't care what language you write it in.
Just a brainstorming idea:
Let's have elections. Let candidates state their intensions and approaches, so we can vote?
I'm glad You solved the problem anyhow. But if You ran into problems, I'll be there.
From: "YKY (Yan King Yin, 甄景贤)" <generic.in...@gmail.com>
To: general-in...@googlegroups.com
Sent: Monday, July 25, 2011 11:27 PM
Subject: Re: [GI] Direction of Genifer
applied to Genifer's design, programming language and otherAlright, alright, maby it's bad idea, but at least voting could be
important questions where different opinions may exists. Voting
is a good thing, it beautifies commanding process.
Sent: Tuesday, July 26, 2011 12:43 PM
Subject: Re: [GI] Direction of Genifer
Want to use twitter for broadcasting notifications? ... or does this
violate the terms of service of both Twitter AND Mailpool?
Want to use twitter for broadcasting notifications? ... or does this
violate the terms of service of both Twitter AND Mailpool?
Here is a quick mockup of the Blogpool variation of Mailpool:
http://blogpool.enformable.com
It's Wordpress with a microblogging theme called P2. It shows how it would accumulate replies from various bots that monitor it when bots submit comments to posts (which can be moderated and spam-filtered).
Another method for accumulating bot feedback is by syndicating an RSS feed provided by each bot which holds the bot's responses for one's blog. Each response would appear as a top-level post, not a comment.
This model does not have the strong security features of Mailpool but may still be useful in certain situations.
about logic:
Should machine learn the very logic set from induction tests on user data?
about probability:
:. Should machine learn that too from user data?
:::.
Subject: Re: [GI] Direction of Genifer
From: "YKY (Yan King Yin, 甄景贤)" <generic.in...@gmail.com>
To: general-in...@googlegroups.com
Sent: Tuesday, July 26, 2011 10:34 PM
Subject: Re: [GI] Direction of Genifer
I don't think anyone is asking for credits or virtual shares for their ideas. At least I'm not.