RFC - contacts data model

Steve Ivy

unread,

Dec 28, 2007, 10:41:52 PM12/28/07

to diso-project

Hi all,

I can't seem to stay on one task, but they're all related, really!
Chris and I have started working on a data model for a combined
Contact object to be implemented as a separate table within wordpress
- loosely to the xmpp roster work and wordpress users. We would
appreciate some feedback on the data model and would like any thoughts
you have.

Right now it's a Google Spreadsheet and can be found here:

http://spreadsheets.google.com/ccc?key=prxS6MTFWALYj_BV3x6PIWw&hl=en

Thanks,

--Steve

--
Steve Ivy
http://redmonk.net // http://diso-project.org
This email is: [ ] bloggable [x] ask first [ ] private

Steve Ivy

unread,

Dec 29, 2007, 4:02:01 AM12/29/07

to diso-project

I was going to throw out that I'd like to hear from the OpenContacts
(http://opencontacts.org) guys, then hit their wiki and found:

http://wiki.opencontacts.org/index.php?title=Contact_schema

...which, in addition to anything you want to bring up there, has some
good thoughts.

--Steve

Tijs Teulings

unread,

Dec 29, 2007, 7:19:49 AM12/29/07

to diso-p...@googlegroups.com

In further discussions we settled on using tags as group identifiers.
I don't have many contacts which are in just one group so i need the
more 'fine tuned' categorization of tags where people can belong to
the big three; family, friends, work but also to a lot of smaller
entities like a project a company etc. this is probably the main
differece from whats on the wiki but is see you have tags/categories
in your model already. I presume this is what they would be for.

Regarding a DiSo model; on the one hand i would like DiSo to use it's
own overaching list of fields, or contact attributes as you seem to be
collecting but on the other hand it might make more sense to just pick
one of the existing standards and see how far we can go with those. I
can fit my address book entries into a vCard so while more fields may
be nice it's not strictly neccessary in my view.

An OpenContacts server was supposed to be somewhat agnostic in that it
would offer different plugins for different data sources and exports.
Import from Mac OSX address book, a list of vCards, a foaf file.
Export as an endpoint for sync services, publish as a list of vCards
etc.

Providers of OpenContacts servers could then compete on utility, i.e.
"we support Exchange syncing", or "we support attibute exchange" and
users could just pick the provider that fit their style. In this view
DiSo would be just one of your options (the install it yourself option).

Not a real answer to your questions perhaps but thats about where we
(at OpenContacts) were at regarding this problem.

Tijs

--
Tijs Teulings
tel: +31645004824
http://tijs.jaiku.com

more:
http://www.automatique.nl
http://roomwareproject.org
http://wiki.opencontacts.org

Stephen Paul Weber

unread,

Dec 29, 2007, 10:57:42 AM12/29/07

to diso-p...@googlegroups.com

I'm a big fan of using hCard+XFN field names directly, even in the database :)

--
- Stephen Paul Weber, Amateur Writer
<http://www.awriterz.org>

MSN/GTalk/Jabber: singp...@gmail.com
ICQ/AIM: 103332966
BLOG: http://singpolyma.net/

Steve Ivy

unread,

Dec 29, 2007, 12:03:24 PM12/29/07

to diso-p...@googlegroups.com

My current thinking (no doubt affected by a sleepless night with the
3yo) is to store a subset of what I've proposed in a Contact table,
then have a second table ContactMeta - mirroring the User/UserMeta
model in Wordpress. A common subset of data would be in Contact (more
than is in User but less than, say, the complete hCard set) while an
extended set would be in ContactMeta. In addition, ContactMeta is
available to plugins to use to extend the model further.

Like User, when you call get_contact ($contact_id), we'd merge the
data from both tables for you.

Thoughts?

Stephen Paul Weber

unread,

Dec 29, 2007, 7:14:21 PM12/29/07

to diso-p...@googlegroups.com

Sounds perfect -- I love the Wordpress extendable model :)

Chris Messina

unread,

Dec 29, 2007, 7:21:07 PM12/29/07

to diso-p...@googlegroups.com

The more I think about it and look over the OpenSocial data model, it
occurs to me that a lot of the complexity of vcard/hcard could be
handled through the use of rel attributes and links and identifiers.

In particular, telephone numbers, location information, URLs, IM
handles... these are all identifiers. Now, as to whether we store all
the available data is not obvious, but it does seem that, where
identifiers are found, we should store them and attach them to
verified OpenIDs (as a latent way of identifying people as they come
across your activity input stream).

I like OpenSocial's use of ATOM as the foundation for its data model,
and if we could be OpenSocial compatible by default, that seems wise.
Take a look:

http://code.google.com/apis/opensocial/docs/gdata/people/reference.html#Other

Meanwhile, I'm adding Facebook's boil-the-ocean data model to the spreadsheet.

Chris

--
Chris Messina
Citizen-Participant &
Open Source Advocate-at-Large
Work: http://citizenagency.com
Blog: http://factoryjoe.com/blog
Cell: 412.225.1051
IM: factoryjoe
This email is: [ ] bloggable [X] ask first [ ] private

Stephen Paul Weber

unread,

Dec 29, 2007, 9:58:39 PM12/29/07

to diso-p...@googlegroups.com

So, I read the Pibb discussion, and started to feel queasy as I
understood where this is going. Do we WANT to store contact's data
beyond name+link+relationship? I mean, this is what's broken about
address books and what people love about social networks. If my
friend changes address he updates his profile page and I go see the
new data... NOTHING happens (even automatically) on my end. I have
the new info because I rely on HIS page for data about him...

David Recordon

unread,

Dec 30, 2007, 1:35:52 AM12/30/07

to diso-p...@googlegroups.com

I think this is a really good point. There is some data which should
be cached (name, link, relationship, date updated) but you're right
that the majority of it shouldn't really be stored. That said, we
don't want to get into a situation where we're constantly displaying
Chris Messina's profile and hitting his site in realtime for our
visitors. This might be something which ends up starting as storing
very little, seeing what people build that displays remote data, and
then adding appropriate caching locally.

--David

Stephen Paul Weber

unread,

Dec 30, 2007, 9:05:16 AM12/30/07

to diso-p...@googlegroups.com

Or caching the HTML output...
But yes, we don't want to melt bandwidth for requesting everything every time.

Steve Ivy

unread,

Dec 30, 2007, 2:51:31 PM12/30/07

to diso-p...@googlegroups.com

Stephen,

Yes, as David and Chris and I were chatting the other day I started
feeling the same (not queasy, but that we need to back off a bit on
what we want to store). Ultimately I think we just need enough so that
an app can get access to a person's profile, and that we can hang
other services off of.

Chris mentioned the idea of identifiers - urls, email address, chat
ids, relationships. Perhaps via rel=me and hcard we can cache parts of
the hcard.

Tangentially, we're working on some XMPP-based push ideas, so possibly
we could hitchhike on XMPP "contact updated" events to figure out when
to recache the profile.

Thanks for the thoughts!

--Steve

Dan Weinand

unread,

Dec 30, 2007, 3:18:35 PM12/30/07

to diso-p...@googlegroups.com

On Dec 30, 2007 1:51 PM, Steve Ivy <stev...@gmail.com> wrote:
> Tangentially, we're working on some XMPP-based push ideas, so possibly
> we could hitchhike on XMPP "contact updated" events to figure out when
> to recache the profile.

You could also simply use HTTP's standard cache control mechanisms,
Last-Modifed and If-Modified-Since, or ETag and If-None-Match. I'm
curious to see what you guys come up with using XMPP, but I for one
would rather see a straight HTTP solution all-around.

--
Dan Weinand

James D Kirk

unread,

Dec 30, 2007, 5:50:58 PM12/30/07

to DiSo Project

Does one of these mechanisms have more or less overhead to either the
requester or provider than the other? Is security an issue or concern
with one v. the other? Will it be easier for the site owners/managers
to use the tools of one mechanism over the other???

James.

Steve Ivy

unread,

Dec 30, 2007, 6:01:04 PM12/30/07

to diso-p...@googlegroups.com

Dan,

In my mind, XMPP is just a part of notification model - I'd use
modified dates and E-Tags in addition, that's just smart resource
usage.

--Steve

--

Tao Takashi

unread,

Jan 2, 2008, 6:26:46 PM1/2/08

to diso-p...@googlegroups.com

Wouldn't such an update maybe nothing more like we update blogs these days via RSS? Feedreaders use (hopefully) sensible timespans between updates and I could think of somethink like that for profile information, too. So in fact it's cached then and to me using standard HTTP headers seems like a useful idea for that.

-- Christian

--
taota...@gmail.com
Blog: http://mrtopf.de
Planet: http://worldofsl.com

RL: Christian Scholz, mrt...@gmail.com
http://mrtopf.de

Company: http://comlounge.net
Tech Video Blog: http://comlounge.tv
IRC: MrTopf/Tao_T

Chris Messina

unread,

Jan 2, 2008, 9:38:10 PM1/2/08

to diso-p...@googlegroups.com

The problem with RSS/Atom or even plain HTML is that it really doesn't
scale with the number of contacts you have. While many people might
have 10 or fewer contacts, it'll be increasingly common for folks to
have hundreds or even thousands of friends (depending on "how" they
social network). RSS works when you're dealing with, on average, 30-50
items. When you're distributed and dealing with many hundreds of
friends across many contexts, you need to really do something like
subscribing only to diffs and changes.

It's also true that the nature of this data is somewhat different,
since it's primarily nodal, as in, this node is connected to this node
with these attributes, or these nodes are no longer connected, etc...
RSS is pretty much a dumb data dump. It gets even more convoluted when
the friending activity is two-way.

XMPP has solved this problem with its approach and architecture. While
I'd like to do something over basic HTTP, I'm worried about scaling
and being respectful to various servers.

Then again, I'm no technical expert, so I'd be happy to defer to
someone who has experience in this stuff.

Chris

--

Dan Brickley

unread,

Jan 2, 2008, 9:57:39 PM1/2/08

to diso-p...@googlegroups.com

On 03/01/2008, Chris Messina <chris....@gmail.com> wrote:

> The problem with RSS/Atom or even plain HTML is that it really doesn't
> scale with the number of contacts you have. While many people might
> have 10 or fewer contacts, it'll be increasingly common for folks to
> have hundreds or even thousands of friends (depending on "how" they
> social network). RSS works when you're dealing with, on average, 30-50
> items. When you're distributed and dealing with many hundreds of
> friends across many contexts, you need to really do something like
> subscribing only to diffs and changes.

Yup, makes sense. But RSS/Atom could be a transport format for
descriptions of those updates, perhaps?

> It's also true that the nature of this data is somewhat different,
> since it's primarily nodal, as in, this node is connected to this node
> with these attributes, or these nodes are no longer connected, etc...

On that last point, negation's really slippery. Especially in a
distributed, open world environment where you're compelled to admit
you only have partial knowledge. Facebook often goofs this, by
assuming it knows all: if you tell it you're single, they'll show a
broken heart icon, ie implicitly assume that you can't have been in
this state without its database knowing. So "no longer known by this
data source to be connected" might be all we can get, rather than "are
no longer connected", if we're in a distributed, partial-knowledge
world. (some notes on which here:
http://danbri.org/words/2007/09/13/194 )

> RSS is pretty much a dumb data dump. It gets even more convoluted when
> the friending activity is two-way.
>
> XMPP has solved this problem with its approach and architecture. While
> I'd like to do something over basic HTTP, I'm worried about scaling
> and being respectful to various servers.

Just to complicate matters, there's also an HTTP binding for XMPP,
"BOSH" --- http://www.xmpp.org/extensions/xep-0124.html ... though
I've no experience with it. Whenever I talk to Peter St Andre about
doing this kind of stuff in XMPP, I learn abouto a new XEP. That was
the last one on my reading list :)

cheers,

Dan

Steve Ivy

unread,

Jan 2, 2008, 10:44:19 PM1/2/08

to diso-p...@googlegroups.com

Hi Dan,

On Jan 2, 2008 7:57 PM, Dan Brickley <danbr...@gmail.com> wrote:
>
> Yup, makes sense. But RSS/Atom could be a transport format for
> descriptions of those updates, perhaps?

Yes, in fact I have it on extremely good authority that Twitter is
going to be using ATOM messages as payloads inside XMPP for their
upcoming pubsub implementation.

> > It's also true that the nature of this data is somewhat different,
> > since it's primarily nodal, as in, this node is connected to this node
> > with these attributes, or these nodes are no longer connected, etc...
>
> On that last point, negation's really slippery. Especially in a
> distributed, open world environment where you're compelled to admit
> you only have partial knowledge. Facebook often goofs this, by
> assuming it knows all: if you tell it you're single, they'll show a
> broken heart icon, ie implicitly assume that you can't have been in
> this state without its database knowing. So "no longer known by this
> data source to be connected" might be all we can get, rather than "are
> no longer connected", if we're in a distributed, partial-knowledge
> world. (some notes on which here:
> http://danbri.org/words/2007/09/13/194 )
>
> > RSS is pretty much a dumb data dump. It gets even more convoluted when
> > the friending activity is two-way.
> >
> > XMPP has solved this problem with its approach and architecture. While
> > I'd like to do something over basic HTTP, I'm worried about scaling
> > and being respectful to various servers.
>
> Just to complicate matters, there's also an HTTP binding for XMPP,
> "BOSH" --- http://www.xmpp.org/extensions/xep-0124.html ... though
> I've no experience with it. Whenever I talk to Peter St Andre about
> doing this kind of stuff in XMPP, I learn abouto a new XEP. That was
> the last one on my reading list :)

Yeah, the XEPs are a bit out of control. ;-) Dan, since you're an XMPP
buff I hope we can chat at some point about some of the finer points
of what I'm hoping to do with DiSo and contacts.

>
> cheers,
>
> Dan

Stephen Paul Weber

unread,

Jan 2, 2008, 10:52:41 PM1/2/08

to diso-p...@googlegroups.com

ugh... ATOM... sorry, this is just noise, couldn't help myself ;)

--

Tijs Teulings

unread,

Jan 3, 2008, 4:33:36 AM1/3/08

to diso-p...@googlegroups.com

Op 3 jan 2008, om 03:38 heeft Chris Messina het volgende geschreven:

>
> The problem with RSS/Atom or even plain HTML is that it really doesn't
> scale with the number of contacts you have. While many people might
> have 10 or fewer contacts, it'll be increasingly common for folks to
> have hundreds or even thousands of friends (depending on "how" they
> social network). RSS works when you're dealing with, on average, 30-50
> items. When you're distributed and dealing with many hundreds of
> friends across many contexts, you need to really do something like
> subscribing only to diffs and changes.

indeed. and those "1000s of friends" would just be a model where you
are hosting your own service. now imagine someone like an identity
provider offering this service to a few hundred thousand people who
would be in the addressbooks of a multiple of that number. besides
needing to go out and look for new data for all these people it will
also be hit by the identity providers of all these contacts. And even
if all those other identity providers would check for updated info
only twice a day it would still be a bitch to keep that server live.

twitter is a good example of that, i think they get (a lot) more
traffic from feedreaders and other aggregators wishing to update their
copy of the data than from actual humans.

Reply all

Reply to author

Forward