I've been doing some work on migrating Plings to CouchDB, partially as
an exercise in improving my CouchDB skills and partially because I
believe CouchDB would make a great way of storing Plings in the
future. I have a few bugs in the current API and some thoughts on
a CouchDB version.
First the bugs:
The JSON output from the current API does not match
the field names coming from the XML output. That's mostly just
annoying because the XML is more human readable whilst developing, but
it's also not documented and it suggests you're repeating code rather
than having nice code reusage.
The linked activities fields don't appear to be giving me any data in
the JSON output. I haven't looked into this in any detail, mostly
because I don't want to handle linked activities in my code.
The number of days filter doesn't clearly state whether it's searching
events with a start time in that period, events with an end time in
that period or events which are taking place in that period. It's
probably more optimal to search on start or end times, but more useful
to search on events currently taking place if there are multi-day
events taking place.
Now my version:
It's by no means feature-complete, it's just a coding spike which
demonstrates features of CouchDB and Sinatra you won't get with
PHP/MySQL. The code is being kept in Git at
http://github.com/kaerast/Plings-to-CouchDB
Without the existing codebase and db dump it's a little difficult in
porting the API to Sinatra, but it's also given a lesson in how exactly
everything fits together (see above bugs).
Having the data in CouchDB makes for easy replication between sites.
Pulling the data is a simple http call between the databases, and it
only pulls in changed data. The Designs feature in CouchDB means that
queries are cached, and so run very fast; Designs store the queries to
be run along with the cached data, and can be configured to return
stale data - meaning you don't need to recalculate on every query. All
the summing and statistics operations can be stored in CouchDB too, so
you'd have all that nicely cached without need for rebuilding in PHP
all the time. CouchDB is schema-less, meaning adding extra fields to
documents is really easy and doesn't involve massive database changes
or downtime. And if you ever do face problems with CouchDB speeds, you
just add in an http caching proxy because all the data from the
database is simply json going over http.
The existing API can be rebuilt in Sinatra, taking care to ensure good
code reuse and both unit testing and integration testing.
Sinatra is a Ruby microframework which is really quick to develop in.
It's not necessarily the best tool for building the current API, though
in time you may not need the current API quite as much - you can just
give access to the CouchDB database for reading the raw data (for
writing you'd probably want to keep existing APIs to do data
validation, and you'd maybe want to offer geo functions that the
database can't do on it's own).
I guess the real question is how does this perform compared to the
current setup? Well that's going to depend on where it's hosted.
Self-hosted it should be as quick if not faster for reading, with the
added bonus of being easy to grow. Cloud-hosted it could fly along,
with the API being hosted on Heroku and the database being hosted on
Cloudant. Writing to CouchDB is always fairly slow, but since the vast
majority of operations on the database are reads then this isn't really
an issue.
So there we are, my CouchDB version is working well enough to say "I
told you it'd work". I've not added complete coverage of the existing
APIs, but it'd be relatively quick to add much of the functionality now
the basics are all in place. Geodb isn't something I've done in
CouchDB before, so the searching by postcode might take a little more
work but everything else would be relatively simple.
--
Alice