Bugs & CouchDB Migration

5 views

Skip to first unread message

Alice Kaerast

unread,

Jun 22, 2010, 4:30:30 PM6/22/10

to pli...@googlegroups.com

Hi,

I've been doing some work on migrating Plings to CouchDB, partially as
an exercise in improving my CouchDB skills and partially because I
believe CouchDB would make a great way of storing Plings in the
future. I have a few bugs in the current API and some thoughts on
a CouchDB version.

First the bugs:

The JSON output from the current API does not match
the field names coming from the XML output. That's mostly just
annoying because the XML is more human readable whilst developing, but
it's also not documented and it suggests you're repeating code rather
than having nice code reusage.

The linked activities fields don't appear to be giving me any data in
the JSON output. I haven't looked into this in any detail, mostly
because I don't want to handle linked activities in my code.

The number of days filter doesn't clearly state whether it's searching
events with a start time in that period, events with an end time in
that period or events which are taking place in that period. It's
probably more optimal to search on start or end times, but more useful
to search on events currently taking place if there are multi-day
events taking place.

Now my version:

It's by no means feature-complete, it's just a coding spike which
demonstrates features of CouchDB and Sinatra you won't get with
PHP/MySQL. The code is being kept in Git at
http://github.com/kaerast/Plings-to-CouchDB

Without the existing codebase and db dump it's a little difficult in
porting the API to Sinatra, but it's also given a lesson in how exactly
everything fits together (see above bugs).

Having the data in CouchDB makes for easy replication between sites.
Pulling the data is a simple http call between the databases, and it
only pulls in changed data. The Designs feature in CouchDB means that
queries are cached, and so run very fast; Designs store the queries to
be run along with the cached data, and can be configured to return
stale data - meaning you don't need to recalculate on every query. All
the summing and statistics operations can be stored in CouchDB too, so
you'd have all that nicely cached without need for rebuilding in PHP
all the time. CouchDB is schema-less, meaning adding extra fields to
documents is really easy and doesn't involve massive database changes
or downtime. And if you ever do face problems with CouchDB speeds, you
just add in an http caching proxy because all the data from the
database is simply json going over http.

The existing API can be rebuilt in Sinatra, taking care to ensure good
code reuse and both unit testing and integration testing.

Sinatra is a Ruby microframework which is really quick to develop in.
It's not necessarily the best tool for building the current API, though
in time you may not need the current API quite as much - you can just
give access to the CouchDB database for reading the raw data (for
writing you'd probably want to keep existing APIs to do data
validation, and you'd maybe want to offer geo functions that the
database can't do on it's own).

I guess the real question is how does this perform compared to the
current setup? Well that's going to depend on where it's hosted.
Self-hosted it should be as quick if not faster for reading, with the
added bonus of being easy to grow. Cloud-hosted it could fly along,
with the API being hosted on Heroku and the database being hosted on
Cloudant. Writing to CouchDB is always fairly slow, but since the vast
majority of operations on the database are reads then this isn't really
an issue.

So there we are, my CouchDB version is working well enough to say "I
told you it'd work". I've not added complete coverage of the existing
APIs, but it'd be relatively quick to add much of the functionality now
the basics are all in place. Geodb isn't something I've done in
CouchDB before, so the searching by postcode might take a little more
work but everything else would be relatively simple.

--
Alice

David Carpenter

unread,

Jun 23, 2010, 5:52:03 AM6/23/10

to Plings Developer Support

Thanks Alice

Good work - to be honest this is going to take a bit for me personally
to digest, but I'm sure other people will get it quicker!

I can answer a few of your points directly - see below....

On Jun 22, 9:30 pm, "Alice Kaerast" <kaer...@newscloud.com> wrote:
> Hi,
>
> I've been doing some work on migrating Plings to CouchDB, partially as
> an exercise in improving my CouchDB skills and partially because I
> believe CouchDB would make a great way of storing Plings in the
> future. I have a few bugs in the current API and some thoughts on
> a CouchDB version.
>
> First the bugs:
>
> The JSON output from the current API does not match
> the field names coming from the XML output. That's mostly just
> annoying because the XML is more human readable whilst developing, but
> it's also not documented and it suggests you're repeating code rather
> than having nice code reusage.

Noted: We're already taking a look at this and hope to improve our
JSON support in the coming weeks, as well as making sure development
on one output is reflect across all outputs automagically. We're also
working on ways of communicating our progress on such issues, to make
that better.

>
> The linked activities fields don't appear to be giving me any data in
> the JSON output. I haven't looked into this in any detail, mostly
> because I don't want to handle linked activities in my code.
>

Again - thanks we'll try to address this...

> The number of days filter doesn't clearly state whether it's searching
> events with a start time in that period, events with an end time in
> that period or events which are taking place in that period. It's
> probably more optimal to search on start or end times, but more useful
> to search on events currently taking place if there are multi-day
> events taking place.
>

I'll look to update the documentation on this, but the answer is:
We should be returning anything that is 'taking place in the time
period requested'
i.e. if it starts or ends in the requested time period it is included
- and therefore, if it has started, but not yet ended, then it is
included - I'm also pretty certain that if the start time is before
the search period and the end time is after (multi-day events) then
that would also be picked up. (However, in many cases, these should
probably be submitted to Plings as a number of day long linked
activities, with distinct start and end times each day - perhaps
residential trips are an exception to that)

This is the same if you searched with a time/date parameter - e.g.
2010-06-23+10:00 - would return an activity that takes place between
9am and 11am (Please tell me if I am wrong!!)

Cheers
David

> Now my version:
>
> It's by no means feature-complete, it's just a coding spike which
> demonstrates features of CouchDB and Sinatra you won't get with

> PHP/MySQL. The code is being kept in Git athttp://github.com/kaerast/Plings-to-CouchDB

Reply all

Reply to author

Forward

0 new messages