I've been working with people in various contact management/email
projects about defining a standard schema for storing contacts in
CouchDB. Here's a rundown of the thinking so far and what everyone's
working with.
First, the record serialisation format. This is the "basic" format for
storing data in CouchDB; CouchDB itself doesn't impose any structure on
data that you store within it. This "record serialisation format" is the
base level of structure on stuff stored in Couch that allows apps to
co-operate in a "standard" way.
--------------------8<-------------------------
CouchDB record serialisation format
A record has the following top level keys:
* record_type: a URL unique to this record type (the URL should
point to a description of the record schema, but this is not enforced)
* record_type_version: a version of the above (so there can be
multiple versions of a record_type)
* application_annotations: a key containing application-specific
data and configuration
* other keys: as required by the schema
An application can store whichever keys it likes at the top level. These
keys should be defined by the schema because it makes co-operation on
records easier, but again this is not enforced.
Requirements for applications
Applications agree to honour the following:
1. Each record has a record_type
2. If the application does not use the application_annotations key,
it promises to not alter the contents of it, and to preserve those
contents unchanged
3. The application_annotations key is a dictionary, keyed on
application name. An application that uses the application_annotations
key promises to store its information under its own application name.
MergeableSets
JSON offers a "list" as part of its standard serialisation. While it is
possible to use this list serialisation in CouchDB records, it makes
merging complicated if you're synchronizing data between two CouchDBs.
To avoid this complication, it is suggested that instead of lists,
applications store data in a MergeableSet structure. A MergeableSet is a
dictionary with keys which are generated UUIDs. Thus, the following list:
[ "one", "two", "three" ]
is serialised into a MergeableSet instead:
{
"b215fec4-ee21-4195-b54f-7ac2e06aa9f7": "one",
"0ff24bf8-c152-472d-b044-d4ff7ac110e2": "two",
"2cdc6c05-40ad-47e7-99fd-b8559e7de6f6": "three"
}
--------------------8<-------------------------
Second, the proposed schema specifically for contact data. A "record
schema" here is a list of fields that make up a particular type of
record (in this example, contacts). This isn't a fixed format; we're
still working on what we think it ought to look like, and definitely I'd
welcome both input and collaboration on it from you guys as to your
requirements here!
--------------------8<-------------------------
Contact schema
The core fields
first_name (string)
last_name (string)
birth_date (string, "YYYY-MM-DD")
addresses (MergeableSet of "address" dictionaries)
city (string)
address1 (string)
address2 (string)
pobox (string)
state (string)
country (string)
postalcode (string)
description (string, e.g., "Home")
email_addresses (MergeableSet of "emailaddress" dictionaries)
address (string),
description (string)
phone_numbers (MergeableSet of "phone number" dictionaries)
number (string)
description (string)
Things that we're thinking about being in the core fields but aren't
sure about (very interested in thoughts on these)
chat_addresses
where each address has: address, description, protocol
work_title
work_company
work_department
nick_name
display_name
notes
--------------------8<-------------------------
In particular, note the base "record serialisation format" has the
"application_annotations" section in each record, so data that a
specific application, like, say, Evolution requires or would like to
store can either be made part of the contacts schema (if it's generally
useful to everyone) or stored under application_annotations.Evolution
(meaning that it's always there for Evolution to work with). For
example, one of the apps we have that works with contact records is
Funambol, which stores data like "spouse's name"; this, to us, doesn't
seem like something that should go into the main contacts schema, so
Funambol stores it for a contact in
application_annotations.Funambol.spouse_name.
This schema is obviously fairly inspired by VCard, and we have a
"mapping" of VCard-fields-to-these-fields, which we're finding perfectly
adequate for what we need. Obviously, vcard import and export needs to
be available to and from this format.
http://desktop-couchdb.googlegroups.com/web/createCouchContacts.py is a
very simple Python script which creates some contact records in the
above schema for you to play with. equires python-couchdb. Syntax:
python createCouchContacts.py 5984 contacts 100
will create 100 contacts in the "contacts" database in CouchDB running
on local port 5984.
sil