Mailpool/Genifer

18 views
Skip to first unread message

Matt Mahoney

unread,
Jul 26, 2011, 3:41:25 PM7/26/11
to a...@listbox.com, general-intelligence
After years of floundering in the general-intelligence (Genifer) group, I have been attempting to coordinate a real effort to develop AGI. I thought there might be some interest in the AGI group too. My Mailpool proposal is posted to http://groups.google.com/group/general-intelligence/msg/43afe2dbae577a96 with followup discussion in http://groups.google.com/group/general-intelligence/browse_thread/thread/f252cf1ec4f62b9/ee07e083d924da50

The user view will be an email or chat client without a "to" box. Your messages will be posted to a global public pool and go to anyone who cares, human or machine. From this pool you would receive messages ranked according to relevance, which the system would guess from messages you posted in the past.

This is a distributed design based on lots of specialists and a distributed index for routing messages to the right experts. Ownership would be distributed. People will have an economic incentive to compete for reputation by providing useful and high quality information on a hostile P2P network because doing so will allow them to sell preferential ranking (advertising) at higher market prices. Thus, our role would be to design and code the software to build the infrastructure by which peers will communicate. This would be a tiny but influential effort relative the the enormous costs of providing the human knowledge and corresponding computing resources needed to implement AGI. It would be analogous to designing and implementing the first versions of HTTP and HTML (which took 1 person 6 weeks) as opposed to building the Web.

As the first step, we need to nail down the P2P protocol, because once we write the software and people start using it, we are stuck with it. The architecture is based on passing digitally signed and timestamped messages in natural language, which later could be extended to images, video, and other human-understandable data types. Once the infrastructure is built, we can focus on the more interesting problem of distributed indexing, which boils down to quickly estimating mutual information between messages. This is an AI problem, deciding which messages are appropriate responses to other messages, where the question and answer could contain words or pictures. Whether this is best solved by OpenCog, Genifer, NARS, TexAI, Cyc, text compression, or something else will be decided by the market.

Specifically I have proposed the CMR protocol described in appendix A of http://mattmahoney.net/agi2.html with HTTP handshake and Diffie-Hellman key exchange.


-- Matt Mahoney, matma...@yahoo.com 

Abram Demski

unread,
Jul 26, 2011, 8:25:51 PM7/26/11
to general-in...@googlegroups.com, a...@listbox.com
Matt,

Silly question. Will it be compatible with RSS?

(Perhaps) less silly question. In your description, information has negative value, but agents prefer to offload it to each other rather than delete it outright. Why is this? The idea that it's beneficial to keep similar information together (building specialists) makes sense, but it seems that in the cases in which we don't just want to throw information away, it still has positive value. It's just more valuable to someone who is building more expertise in the area (perhaps to the point where they won't delete it to free up room).

The idea that a *lot* of information has negative value to us (such as irrelevant facebook posts) also makes sense. But, if all information had negative value, it seems like we would discard all of it.

And, finally... it makes sense to me that the mutual-information distance metric would be an ok approximation of what might interest someone, but really, interests are dependent on goals in a complex way. For example, if I type a bloglike post into mailpool, I'll be interested if it spits back a few similar posts made by other people. If it spits back hundreds, I won't be so interested. Mere similarity is not enough. I'd rather see the posts from the experts in the area, if there are any. Is there a way this is modeled in your framework?

--Abram

Matt Mahoney

unread,
Jul 26, 2011, 9:26:07 PM7/26/11
to general-in...@googlegroups.com, agi
 Abram Demski <abram...@gmail.com> wrote:

>Silly question. Will it be compatible with RSS?

Yes. CMR protocol is for peer to peer. Peer to user can be anything, like RSS, HTML, email, facebook, twitter, text message...

>(Perhaps) less silly question. In your description, information has negative value, but agents prefer to offload it to each other rather than delete it outright. Why is this? The idea that it's beneficial to keep similar information together (building specialists) makes sense, but it seems that in the cases in which we don't just want to throw information away, it still has positive value. It's just more valuable to someone who is building more expertise in the area (perhaps to the point where they won't delete it to free up room).
>
>The idea that a *lot* of information has negative value to us (such as irrelevant facebook posts) also makes sense. But, if all information had negative value, it seems like we would discard all of it.


Negative value is much greater to humans than to machines because the cost of human time is much greater than machine storage. But in either case that is just the average. Some information has positive value to some people.

>And, finally... it makes sense to me that the mutual-information distance metric would be an ok approximation of what might interest someone, but really, interests are dependent on goals in a complex way. For example, if I type a bloglike post into mailpool, I'll be interested if it spits back a few similar posts made by other people. If it spits back hundreds, I won't be so interested. Mere similarity is not enough. I'd rather see the posts from the experts in the area, if there are any. Is there a way this is modeled in your framework?

Good point. A peer receiving lots of similar messages can store them very cheaply by compressing them. But the cost to humans is their uncompressed size.

The CMR protocol says that duplicate messages and messages with the recipient in the list of senders should be discarded. This prevents routing loops if all peers behave, but they might not. So peers need to be smart about which messages they accept. Peers will be ranked by reputation networks. You trust X and X says you can trust Y. It's how a lot of the internet works now. You trust the big websites and the links on them. CMR supports this by providing secure authentication.


-- Matt Mahoney, matma...@yahoo.com

Abram Demski

unread,
Jul 26, 2011, 9:40:41 PM7/26/11
to general-in...@googlegroups.com
On Tue, Jul 26, 2011 at 6:26 PM, Matt Mahoney <matma...@yahoo.com> wrote:
 Abram Demski <abram...@gmail.com> wrote:

>Silly question. Will it be compatible with RSS?

Yes. CMR protocol is for peer to peer. Peer to user can be anything, like RSS, HTML, email, facebook, twitter, text message...

>(Perhaps) less silly question. In your description, information has negative value, but agents prefer to offload it to each other rather than delete it outright. Why is this? The idea that it's beneficial to keep similar information together (building specialists) makes sense, but it seems that in the cases in which we don't just want to throw information away, it still has positive value. It's just more valuable to someone who is building more expertise in the area (perhaps to the point where they won't delete it to free up room).
>
>The idea that a *lot* of information has negative value to us (such as irrelevant facebook posts) also makes sense. But, if all information had negative value, it seems like we would discard all of it.


Negative value is much greater to humans than to machines because the cost of human time is much greater than machine storage. But in either case that is just the average. Some information has positive value to some people.

Ah, ok.
 

>And, finally... it makes sense to me that the mutual-information distance metric would be an ok approximation of what might interest someone, but really, interests are dependent on goals in a complex way. For example, if I type a bloglike post into mailpool, I'll be interested if it spits back a few similar posts made by other people. If it spits back hundreds, I won't be so interested. Mere similarity is not enough. I'd rather see the posts from the experts in the area, if there are any. Is there a way this is modeled in your framework?

Good point. A peer receiving lots of similar messages can store them very cheaply by compressing them. But the cost to humans is their uncompressed size.

The CMR protocol says that duplicate messages and messages with the recipient in the list of senders should be discarded. This prevents routing loops if all peers behave, but they might not. So peers need to be smart about which messages they accept. Peers will be ranked by reputation networks. You trust X and X says you can trust Y. It's how a lot of the internet works now. You trust the big websites and the links on them. CMR supports this by providing secure authentication.

Perhaps a better way to state my question: if I submit a question to mailpool, it sounds like I'd get a bunch of questions back rather than a bunch of answers (of the mutual information protocol is how replies are determined). How could I get answers back, rather than more questions?
 


-- Matt Mahoney, matma...@yahoo.com

Matt Mahoney

unread,
Jul 27, 2011, 1:00:31 PM7/27/11
to general-in...@googlegroups.com
Abram Demski <abram...@gmail.com> wrote: 
> Perhaps a better way to state my question: if I submit a question to mailpool, it sounds like I'd get a bunch of questions back rather than a bunch of answers (of the mutual information protocol is how replies are determined). How could I get answers back, rather than more questions?

I see the problem. Using only mutual information, the best match to a query is another copy of the same question. The response should also be information dense, i.e. not be a lot of identical copies of the same response. But it is not so simple as that either. A spammer could exploit that strategy by responding to a question by responding with a copy of the question, an ad, and lots of random characters. Peers will have to be intelligent to estimate relevance to their users and the reliability of their sources. It is a hard problem. The market will determine the winners.

 
-- Matt Mahoney, matma...@yahoo.com

From: Abram Demski <abram...@gmail.com>
To: general-in...@googlegroups.com
Sent: Tuesday, July 26, 2011 9:40 PM
Subject: Re: [GI] Mailpool/Genifer

Abram Demski

unread,
Jul 27, 2011, 1:37:30 PM7/27/11
to general-in...@googlegroups.com, a...@listbox.com
Matt,

[accidentally dropped the AGI list from the receiver list, re-adding...]

Glad to see you are not dogmatic about mutual information being exactly the right formula. I agree, the right solution is just to say that the market has to work it out.

I'm not totally clear on what the typical use-case would look like, though. CMR is supposed to act like a big associative memory, I take it, which "learns" high-value input-output behavior by adding peers that implement different valuable transformations. Mailpool is then just one kind of peer, right? IE, a mailpool peer will store messages and respond to messages with relevant messages it has received in the past. Other peers might do other kinds of things, like interface with Wolfram Alpha or a search engine. Human users will want to send their message to a peer that generates the kind of response they are after, which might often be a "mixer" peer which received a message and sent it off to several peers (such as search engines, mailpools, etc), recorded the responses, and ranked them by estimated quality for the original user.

Is this right? If so, what is the advantage over the existing internet? I already decide where to send each message based on the sort of responses I want (facebook, buzz, google+, google search, emailing people...), and frequently use "mixers" of various sorts to make the task easier...

swkane

unread,
Jul 27, 2011, 2:18:30 PM7/27/11
to general-in...@googlegroups.com
Peers will have to be intelligent to estimate relevance to their users and the reliability of their sources. It is a hard problem. The market will determine the winners.

What if the market determines that the cost of determining the winners to a degree that is sufficiently useful is greater than the payoff of determining the winners? Large corporations with large amounts of resources such as Google have had access to a massive amount of email and other data. Where's their 'MailPool'? For instance, why can't I set up my Gmail account as a node in a 'MailPool'?

Steven

Matt Mahoney

unread,
Jul 27, 2011, 5:24:39 PM7/27/11
to a...@listbox.com, general-intelligence
Abram Demski <abram...@gmail.com> wrote:
> what is the advantage over the existing internet?

We should not have to remember which website, which application, or which file to go to for different things. We should just tell the computer what we want, and it will do it.

Facebook, Twitter, blogs, and mailing lists should figure out who you know and who you would like to know, based on common interests.

A few large companies like Google and Microsoft (and the governments that regulate them) should not control which parts of the internet you see.

When you update a website, you should not have to wait for search engines to be aware of the changes.

Some of this is already happening, but I think the trend in search engines, at least, is in the wrong direction. Smaller players cannot compete with larger ones that can keep a copy of a larger piece of the internet in their cache and update it faster. We can do better. Google has only 0.1% of the knowledge and computing power of the internet. A distributed index could have all of it.

 
-- Matt Mahoney, matma...@yahoo.com

From: Abram Demski <abram...@gmail.com>
To: AGI <a...@listbox.com>
Sent: Wednesday, July 27, 2011 1:37 PM
Subject: [agi] Re: [GI] Mailpool/Genifer

AGI | Archives | Modify Your Subscription


Abram Demski

unread,
Jul 27, 2011, 6:04:12 PM7/27/11
to general-in...@googlegroups.com, a...@listbox.com
Matt,

More comments. I hope it's clear that the tone is meant to be constructive rather than critical; I like the idea, but I'm trying to pick it apart.

On Wed, Jul 27, 2011 at 2:24 PM, Matt Mahoney <matma...@yahoo.com> wrote:
Abram Demski <abram...@gmail.com> wrote:
> what is the advantage over the existing internet?

We should not have to remember which website, which application, or which file to go to for different things. We should just tell the computer what we want, and it will do it.

Sounds fair, but in the scenario I wrote out, a human user would still have to decide where to direct a message-- though they will often direct it at some mixer in order to minimize the thought put into this. How can this be avoided? How does the protocol get around this?
 

Facebook, Twitter, blogs, and mailing lists should figure out who you know and who you would like to know, based on common interests.

Again, fair, but I'm not sure how the protocol causes this to happen. There is an economic incentive to provide this service, but that's equally true of existing social networking sites.
 


A few large companies like Google and Microsoft (and the governments that regulate them) should not control which parts of the internet you see.

How does this fix that problem? In the loose description I wrote out, there were still "mixers" which would mostly be run by large companies/organizations. How can this be avoided?


When you update a website, you should not have to wait for search engines to be aware of the changes.

This seems like a decent point; we can directly notify an index of updates... however, how do we decide who to send our updates to? We can't just send them to anyone on the network, even trusted peers, because some peers are optimized for query/response behavior such as a wolfram|alpha interface peer. We would have to decide which mixers and so on we wanted to notify, right? (A mixer which automatically notified other relevant peers would be a good service, of course, and would get business.)
 

Some of this is already happening, but I think the trend in search engines, at least, is in the wrong direction. Smaller players cannot compete with larger ones that can keep a copy of a larger piece of the internet in their cache and update it faster.

How does the system encourage small players? In the scenarios I've imagined, it seems like big players are still encouraged.

I suppose the closest thing I can think of is the technology news situation, where there are a lot of players like slashdot, reddit, small blogs, large blogs which aggregate small blogs... Slashdot and reddit rise to the top due to quality crowdsourced filtering techniques.

Matt Mahoney

unread,
Jul 27, 2011, 7:47:15 PM7/27/11
to general-in...@googlegroups.com
 Abram Demski <abram...@gmail.com> wrote:
>On Wed, Jul 27, 2011 at 2:24 PM, Matt Mahoney <matma...@yahoo.com> wrote:
>>> what is the advantage over the existing internet?
>>
>>We should not have to remember which website, which application, or which file to go to for different things. We should just tell the computer what we want, and it will do it.
>
>Sounds fair, but in the scenario I wrote out, a human user would still have to decide where to direct a message-- though they will often direct it at some mixer in order to minimize the thought put into this. How can this be avoided? How does the protocol get around this?

We can't completely avoid it. The CMR protocol makes all messages available to all peers, but peers may have different user interfaces and different strategies for ranking the messages presented to the user. The user will still have to make choices.

>>Facebook, Twitter, blogs, and mailing lists should figure out who you know and who you would like to know, based on common interests.
>
>Again, fair, but I'm not sure how the protocol causes this to happen. There is an economic incentive to provide this service, but that's equally true of existing social networking sites.


Yes, and they are already doing it, for example, Facebook's news feed. A peer could have a user interface with "like" and "spam" buttons to learn your preferences. I expect there will be experimentation.

>>A few large companies like Google and Microsoft (and the governments that regulate them) should not control which parts of the internet you see.

>
>How does this fix that problem? In the loose description I wrote out, there were still "mixers" which would mostly be run by large companies/organizations. How can this be avoided?

Right now, indexing is centralized and content is distributed. But there is no reason that indexing can't be distributed too. Each part of the index would know about some topic and about other peers that know about similar topics. Messages would get routed to the right expert along multiple paths that can't be shut down by controlling just a few peers. Large companies would still have the advantage of higher reputations, but would no longer have the advantage of access to more content and faster updates that now shut out small competitors.

> how do we decide who to send our updates to?

If each peer used a simple strategy of sending 2 copies to other peers picked at random, then eventually the message would get to every peer in O(log n) time. But peers that use a more targeted strategy based on message content would be rewarded by preferential treatment of their messages. Peers would learn the specialties of their neighbors by caching their messages and comparing incoming messages with them.

>How does the system encourage small players? In the scenarios I've imagined, it seems like big players are still encouraged.

Just like anyone can start a website or a blog now.


-- Matt Mahoney, matma...@yahoo.com

Matt Mahoney

unread,
Jul 27, 2011, 8:45:00 PM7/27/11
to general-in...@googlegroups.com
swkane <diss...@gmail.com> wrote:
> For instance, why can't I set up my Gmail account as a node in a 'MailPool'?

Google might not be interested in developing something that would compete with their search engine and which they could not own. Or maybe they would, if presented with the right opportunities.

 
-- Matt Mahoney, matma...@yahoo.com

From: swkane <diss...@gmail.com>
To: general-in...@googlegroups.com
Sent: Wednesday, July 27, 2011 2:18 PM
Subject: Re: [GI] Mailpool/Genifer

Abram Demski

unread,
Jul 27, 2011, 8:49:02 PM7/27/11
to general-in...@googlegroups.com, a...@listbox.com
On Wed, Jul 27, 2011 at 4:47 PM, Matt Mahoney <matma...@yahoo.com> wrote:
 Abram Demski <abram...@gmail.com> wrote:
>On Wed, Jul 27, 2011 at 2:24 PM, Matt Mahoney <matma...@yahoo.com> wrote:
>>> what is the advantage over the existing internet?
>>
>>We should not have to remember which website, which application, or which file to go to for different things. We should just tell the computer what we want, and it will do it.
>
>Sounds fair, but in the scenario I wrote out, a human user would still have to decide where to direct a message-- though they will often direct it at some mixer in order to minimize the thought put into this. How can this be avoided? How does the protocol get around this?

We can't completely avoid it. The CMR protocol makes all messages available to all peers, but peers may have different user interfaces and different strategies for ranking the messages presented to the user. The user will still have to make choices.

Ok, makes sense.
 

>>Facebook, Twitter, blogs, and mailing lists should figure out who you know and who you would like to know, based on common interests.
>
>Again, fair, but I'm not sure how the protocol causes this to happen. There is an economic incentive to provide this service, but that's equally true of existing social networking sites.


Yes, and they are already doing it, for example, Facebook's news feed. A peer could have a user interface with "like" and "spam" buttons to learn your preferences. I expect there will be experimentation.


Makes sense...
 
>>A few large companies like Google and Microsoft (and the governments that regulate them) should not control which parts of the internet you see.

>
>How does this fix that problem? In the loose description I wrote out, there were still "mixers" which would mostly be run by large companies/organizations. How can this be avoided?

Right now, indexing is centralized and content is distributed. But there is no reason that indexing can't be distributed too. Each part of the index would know about some topic and about other peers that know about similar topics. Messages would get routed to the right expert along multiple paths that can't be shut down by controlling just a few peers. Large companies would still have the advantage of higher reputations, but would no longer have the advantage of access to more content and faster updates that now shut out small competitors.

This sounds somewhat plausible, but only if a "deep network" developed. A "shallow network" (with most messages only traveling 1 hop, or 2 when a mixer is used) might be more likely.

> how do we decide who to send our updates to?

If each peer used a simple strategy of sending 2 copies to other peers picked at random, then eventually the message would get to every peer in O(log n) time. But peers that use a more targeted strategy based on message content would be rewarded by preferential treatment of their messages. Peers would learn the specialties of their neighbors by caching their messages and comparing incoming messages with them.

This is where I'm missing something. It seems to me that most peers would not even re-send messages that came to them out of the blue. Peers corresponding to human's clients would be for sending stuff to the network and looking at what came back, so I don't see that they would host content or try to direct messages sent at them. Peers corresponding to automated response services would try to send back information in reply, and they may talk to other peers in the process (especially if they are what I'm calling a "mixer"), but only with targeted queries.

I see that you specify the behavior of "routers" to be more like what you are talking about, but it seems as if people would (ahem) route around these in order to get more predictable results.

So, I suppose the problem of creating a useful deep network boils down to the problem of creating useful routers. This is a hard problem; as you say, and AGI problem; so I feel it would not happen...
 

>How does the system encourage small players? In the scenarios I've imagined, it seems like big players are still encouraged.

Just like anyone can start a website or a blog now.


-- Matt Mahoney, matma...@yahoo.com

Matt Mahoney

unread,
Jul 27, 2011, 9:17:14 PM7/27/11
to general-in...@googlegroups.com, a...@listbox.com
 Abram Demski <abram...@gmail.com> wrote:
> It seems to me that most peers would not even re-send messages that came to them out of the blue.

They might, if they know where to send them. It wouldn't be hard to do using term matching. Even if peers used the very simple strategy of copying messages to everyone they know and discarding duplicates, every message would get to every router. As the network grew, this would no longer work, so peers would need to specialize by topic. Peers would tend to receive targeted messages as well as send them because other peers would learn their specialty.

Peers can act as sources, sinks, routers, or all three. Peers with user interfaces are obviously useful to their users. I suppose it is less obvious why a peer should provide the service of routing messages. Because routers rank messages before deciding whether to discard them or cache and relay them, and they can charge a fee for higher rankings if they want. Even if they don't do this immediately, they can invest in their reputation by providing this service, which will enable them to charge higher fees later because other peers ranked them highly.

-- Matt Mahoney, matma...@yahoo.com

Reply all
Reply to author
Forward
0 new messages