Shameless Plug: ShopTalk

Christian Wyglendowski

unread,

Jul 23, 2009, 10:47:00 PM7/23/09

to cl...@googlegroups.com

Hey guys,

On the heels of the story of the end of one startup, here's a bit of
info about a new one.

We're in a beta period right now, so I covet your feedback if any of
you would like to sign up for an account and try it out with a few
others at your respective organizations.

Techie Details:

The main chat page and the administrative interface are served by
CherryPy. The chat service itself is built on a custom libevent-based
async framework for Python. Hopefully we'll have more announcements
about that project in the near future. Much of the data is stored in
a distributed key/value store called Dynomite
(http://github.com/cliffmoon/dynomite/tree/master). We are using
nginx, HAProxy and memcached to all of their respective advantages.

The application is light on features right now. We wanted to get it
out there months ago, but since we are moonlighting it has taken us
longer than we thought. Some stuff that is definitely on the roadmap
is a bot-API and some sort of way for organizations to integrate their
current LDAP directories.

The biggest challenges we've faced have been x-browser issues, mostly
on the administrative interface. We haven't really added any new
features to the application for months - it's been a slow tedious
process of making the thing look and work right in
IE7/IE8/FF3/Safari3/4. We were trying to support IE6 for a while too,
and have since dropped support for it. That felt good.

Anyhow, we announced it on Hacker News early last week too. Here is
the link to the post there: http://news.ycombinator.com/item?id=703925
There are some more details about the project there.

Thanks again to anyone who wants to test drive it, and sorry for the
self-promotion. You all can give me a hard time next time I show up
for a ClePy meeting. :P

Christian
http://www.dowski.com

Nick Barendt

unread,

Jul 24, 2009, 9:12:41 AM7/24/09

to cl...@googlegroups.com

Christian,

Best of luck!

On the techie side, I'd appreciate any feedback or experience you (or anyone else on the list) has with Dynomite or any other persistent distributed key/value stores and/or highly scalable distributed file systems (e.g., Kai, MogileFS, Cassandra, CloudBase, HDFS, Voldemort, etc.).

-Nick

Christian Wyglendowski

unread,

Jul 24, 2009, 10:04:59 AM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 9:12 AM, Nick Barendt<nickb...@gmail.com> wrote:
> Best of luck!

Thanks Nick.

> On the techie side, I'd appreciate any feedback or experience you (or
> anyone else on the list) has with Dynomite or any other persistent
> distributed key/value stores and/or highly scalable distributed file systems
> (e.g., Kai, MogileFS, Cassandra, CloudBase, HDFS, Voldemort, etc.).

Well, one of the crucial points is how to deal with conflict
resolution in the eventually-consistent model. This is something we
have sort of put on the back burner for now, and we're using
distributed locking for the time being. It's something we're going to
have to address though to get the full benefits of the
reliability/scalability of a dist. k/v store.

What I have in mind (all theory right now) is some sort of
object-diffing library. If

User(1, displayname='David', email='da...@shoptalkapp.com',
rooms=['Lobby', 'IT'])

gets changed on two separate Dynomite nodes before they synchronize,
we'll get two records in the DB for the key. Doing some sort of a
diff and merging the changes would allow the application to construct
the complete state of the object, as long as the changes weren't
overlapping (ie, one change updated email, one added a room). For
overlapping changes (two changes to email address) it would fall back
to last-writer-wins.

Christian

David Stanek

unread,

Jul 24, 2009, 12:55:28 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 10:04 AM, Christian
Wyglendowski<chri...@dowski.com> wrote:
>
>
> What I have in mind (all theory right now) is some sort of
> object-diffing library. If
>
> User(1, displayname='David', email='da...@shoptalkapp.com',
> rooms=['Lobby', 'IT'])
>
> gets changed on two separate Dynomite nodes before they synchronize,
> we'll get two records in the DB for the key. Doing some sort of a
> diff and merging the changes would allow the application to construct
> the complete state of the object, as long as the changes weren't
> overlapping (ie, one change updated email, one added a room). For
> overlapping changes (two changes to email address) it would fall back
> to last-writer-wins.
>

Dynomite doesn't do this for you? That's one of the reasons I like
CounchDB. I haven't tried it yet, but it supposedly handles changes
happening on different nodes and then replicating them across a
cluster. The way it works, IIRC, is similar to version control.

--
David
blog: http://www.traceback.org
twitter: http://twitter.com/dstanek

Nick Barendt

unread,

Jul 24, 2009, 1:08:22 PM7/24/09

to cl...@googlegroups.com

I'm not sure how Dynomite handles conflict resolutions with eventual consistency (I the Amazon Dynamo prototype attempted to provide some developer configuration for particular applications to tradeoff consistency and availability), but it is fundamentally a different beast than CouchDB which, while insanely cool, isn't really distributed in the same sense as Dynomite/S3/etc.

CouchDB (based on my understanding) supports replication, but not scalable distribution like these other distributed key/value stores. The distributed key/value stores have to deal with the whole CAP tradeoff, and often the "right" answer is to give up consistency to guarantee availability and tolerate partitioning. There are all sorts of solutions to the different problems (e.g., Lamport's Paxos and Vector Clocks) that I only vaguely understand.

-Nick

Christian Wyglendowski

unread,

Jul 24, 2009, 2:07:32 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 12:55 PM, David Stanek<dst...@dstanek.com> wrote:
>

> Dynomite doesn't do this for you? That's one of the reasons I like
> CounchDB. I haven't tried it yet, but it supposedly handles changes
> happening on different nodes and then replicating them across a
> cluster. The way it works, IIRC, is similar to version control.

Well it does do the low-level replication and stuff. It's the
application level stuff that you need to handle on your own.

Imagine this scenario. Two Dynomite nodes become disconnected due to
a network partition and key X is update on both of them,
independently. When the network connection between those nodes is
restored, both values of X will be retained. That's a conflict for
that value though.

The application will get both values for X the next time it asks for
it and it will have to rectify the two back to a single value. That's
what I mean when I say conflict resolution.

At least I think that is all right.*

Christian

* A lot of it depends on how you tune it (which two of Consistency,
Availabilty and Partition you go for).

David Stanek

unread,

Jul 24, 2009, 2:08:56 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 1:08 PM, Nick Barendt<nickb...@gmail.com> wrote:
> I'm not sure how Dynomite handles conflict resolutions with eventual consistency (I the Amazon Dynamo prototype attempted to provide some developer configuration for particular applications to tradeoff consistency and availability), but it is fundamentally a different beast than CouchDB which, while
> insanely cool, isn't really distributed in the same sense as
> Dynomite/S3/etc.
> CouchDB (based on my understanding) supports replication, but not scalable
> distribution like these other distributed key/value stores. The distributed
> key/value stores have to deal with the whole CAP tradeoff, and often the
> "right" answer is to give up consistency to guarantee availability and
> tolerate partitioning. There are all sorts of solutions to the different
> problems (e.g., Lamport's Paxos and Vector Clocks) that I only vaguely
> understand.

What do you mean by scalable distribution?

David Stanek

unread,

Jul 24, 2009, 2:10:24 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 2:07 PM, Christian
Wyglendowski<chri...@dowski.com> wrote:
>
> The application will get both values for X the next time it asks for
> it and it will have to rectify the two back to a single value. That's
> what I mean when I say conflict resolution.
>
> At least I think that is all right.*
>

My understanding of CouchDB is that it will figure out which entry was
last and use that. It doesn't do any updates to records. Instead
everything makes a new one. This obviously doesn't work for all
datasets.

Christian Wyglendowski

unread,

Jul 24, 2009, 2:19:00 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 2:10 PM, David Stanek<dst...@dstanek.com> wrote:
>
> My understanding of CouchDB is that it will figure out which entry was
> last and use that. It doesn't do any updates to records. Instead
> everything makes a new one. This obviously doesn't work for all
> datasets.

Yeah, you could choose to use Dynomite that way too. Basically use
the result[-1] of the data that you get back.

I'm sure the # conflicts will differ in our environment (based on our
usage of Dynomite) but acc'd to the Dynamo paper, conflicts at Amazon
are fairly rare (though still significant b/c of their volume of
traffic). So it's sort of an edge case, but an interesting one.

Christian

Nick Barendt

unread,

Jul 24, 2009, 2:36:21 PM7/24/09

to cl...@googlegroups.com

So, CouchDB, as I understand it (I've only toyed with it), runs on a node (like most "normal" databases), node A. At any time, you can choose to replicate all or a portion of a database to another node, node B. You now have a copy of that database on nodes A and B. At a later time, you can merge any changes to the database on node B and resolve conflicts. It is akin to a DVCS.

Most importantly, though, at any given time, the contents of the database must fit onto a single node (again, this is my understanding of how CouchDB works), and there's no built-in load balancing of clients making requests. If they're configured to talk to node A and node A is down, game over. If node A is busy, they have to wait.

Distributed Hash Tables (DHT) like Dyanmo, Dynamite, etc. have (hopefully), many, many nodes, and their total storage capacity and read/write bandwidth scales with the number of nodes added to the cluster by some function (hopefully linearly).

I haven't seen anything with CouchDB that makes me think it can scale that way, other than maybe via some ad hoc load balancing/sharding/rapid replication mechanism on top of it. CouchDB isn't really a distributed application in that sense. But, maybe I've missed something.

-Nick

David Stanek

unread,

Jul 24, 2009, 2:53:23 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 2:36 PM, Nick Barendt<nickb...@gmail.com> wrote:
>
> So, CouchDB, as I understand it (I've only toyed with
> it), runs on a node (like most "normal" databases),
> node A. At any time, you can choose to replicate all or a portion of a database to another node,
> node B. You now have a copy of that database on nodes A and B. At a later
> time, you can merge any changes to the database on node B and resolve
> conflicts. It is akin to a DVCS.
> Most importantly, though, at any given time, the contents of the database
> must fit onto a single node (again, this is my understanding of how CouchDB
> works)

This is correct as far as I understand. It uses the share nothing model.

>, and there's no built-in load balancing of clients making requests.
> If they're configured to talk to node A and node A is down, game over. If
> node A is busy, they have to wait.

Using the share nothing approach to data management this is really not
a concern. Put you CouchDB nodes behind a load balancer.

> Distributed Hash Tables (DHT) like Dyanmo, Dynamite, etc. have (hopefully),
> many, many nodes, and their total storage capacity and read/write bandwidth
> scales with the number of nodes added to the cluster by some function
> (hopefully linearly).

Right. CouchDB does not do this. So if you have more than a few
hundred gigs of data you may have a problem. Luckly (or unluckily)
none of my projects using CouchDB have greater than 100 megs.

> I haven't seen anything with CouchDB that makes me think it can scale that
> way, other than maybe via some ad hoc load balancing/sharding/rapid
> replication mechanism on top of it. CouchDB isn't really a distributed
> application in that sense. But, maybe I've missed something.

It is highly distributed. The data, however, is not partitioned across nodes.

Nick Barendt

unread,

Jul 24, 2009, 3:09:27 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 2:53 PM, David Stanek <dst...@dstanek.com> wrote:

On Fri, Jul 24, 2009 at 2:36 PM, Nick Barendt<nickb...@gmail.com> wrote:
>
> So, CouchDB, as I understand it (I've only toyed with
> it), runs on a node (like most "normal" databases),
> node A. At any time, you can choose to replicate all or a portion of a database to another node,
> node B. You now have a copy of that database on nodes A and B. At a later
> time, you can merge any changes to the database on node B and resolve
> conflicts. It is akin to a DVCS.
> Most importantly, though, at any given time, the contents of the database
> must fit onto a single node (again, this is my understanding of how CouchDB
> works)

This is correct as far as I understand. It uses the share nothing model.

>, and there's no built-in load balancing of clients making requests.
> If they're configured to talk to node A and node A is down, game over. If
> node A is busy, they have to wait.

Using the share nothing approach to data management this is really not
a concern. Put you CouchDB nodes behind a load balancer.

> Distributed Hash Tables (DHT) like Dyanmo, Dynamite, etc. have (hopefully),
> many, many nodes, and their total storage capacity and read/write bandwidth
> scales with the number of nodes added to the cluster by some function
> (hopefully linearly).

Right. CouchDB does not do this. So if you have more than a few
hundred gigs of data you may have a problem. Luckly (or unluckily)
none of my projects using CouchDB have greater than 100 megs.

For the projects I was looking at these for (like BitBacker), it was for massive data storage, akin to Amazon's S3, and petabytes would be the goal. No load balancer is going to fix that.

> I haven't seen anything with CouchDB that makes me think it can scale that
> way, other than maybe via some ad hoc load balancing/sharding/rapid
> replication mechanism on top of it. CouchDB isn't really a distributed
> application in that sense. But, maybe I've missed something.

It is highly distributed. The data, however, is not partitioned across nodes.

I don't want to make this a flame war, but others (notably Jonathan Ellis) have argued that for these and other reasons, CouchDB is not a distributed application, any more than Mercurial or git is. A comment by Ellis at the end of that page:

As everyone knows, scalability isn't about single-node numbers (although those don't look too hot either); it's about whether adding Nx machines gives you Nx performance, and it's about automating growing and failure recovery so that you don't have to add Nx members to your ops team at the same time. If sharding + replication were enough to scale we'd all stick with pg and mysql, but it's not; at this point everyone's pretty much concluded that that's not adequate for big data.

But manual sharding is labor intensive, error prone, and inflexible. You _can_ deal with machine failures but it's painful. And that's the good news. Growing your cluster is much worse. So is dealing with load hot spots.

That's what my problem is with couchdb -- as Zach quoted, the first feature they tout is "distributed," which has become associated, fairly or not, with scalability features couchdb doesn't have. But none of their devs ever post a correction to articles lumping couchdb in with scalable databases to say, "actually, we mean this _other_ definition of distributed." They seem content to allow people to assume they have these other features too, which is understandable in some sense, but not really honest.

I think CouchDB is really cool, I do. Been looking for a project to use it on.

-Nick

David Stanek

unread,

Jul 24, 2009, 3:47:00 PM7/24/09

to cl...@googlegroups.com

On Fri, Jul 24, 2009 at 3:09 PM, Nick Barendt<nickb...@gmail.com> wrote:
>
> I don't want to make this a flame war, but others (notably Jonathan Ellis)
> have argued that for these and other reasons, CouchDB is not a distributed
> application, any more than Mercurial or git is.
> A comment by Ellis at the end of that page:

:-) CouchDB is good for certain things and something like ...
Cassandra for others (Jonathan works on it).

Douglas Stanley

unread,

Jul 24, 2009, 4:38:37 PM7/24/09

to cl...@googlegroups.com

Since we're all on the subject, what would everyone recommend to use to
store general metadata in a scalabe/reliable fashion?

I was originally leaning towards couchdb, as it's json storage seems
like an ideal fit for metadata, but these discussions made me think

I should ask around.

Thanks,
Doug

Ralph Heimburger

unread,

Jul 24, 2009, 10:56:33 PM7/24/09

to cl...@googlegroups.com

Doug,
What type of metadata?
I built a metadata app in python/sqlite that stores all of our metadata.

--
Ralph Heimburger
1stpOint incorporated
www.1stpointinc.com
Ph. 216-906-3640
Fax 702-995-3640

---------------------------------------------------------------------------------------
The information transmitted in this e-mail is intended for the exclusive use of the person or entity to which it is addressed and may contain privileged or confidential information.
If you are not the intended recipient of this e-mail, you are prohibited from reading, printing, duplicating, disseminating or otherwise using or acting in reliance upon this information.
If you have received this information in error, notify the sender at 1stpOint inc. immediately, delete this information from your computer
and destroy all copies of the information.
Thank you.
---------------------------------------------------------------------------------------

Douglas Stanley

unread,

Jul 27, 2009, 10:41:48 AM7/27/09

to cl...@googlegroups.com

Well, any kind actually. It needs to be flexible. I would say it'd be all textual and not structured. JSON is kind of the ideal format that I'm envisioning, but a simpler key/value store should also be fine, it would just take a little more effort.

After hearing some of the talks at PyOhio, I was clued into mongodb, but I'm not quite sure about it's scaling features
as they still seem "alpha". For my needs, couchdb's replication with something like couchdb-lounge in front would be sufficient for scaling.

I dunno, I think I'm just wasting time and avoiding the inevitable really. This is all for my masters thesis, which I've been put off starting for about 5 years now. So it's not really business critical that I choose the ideal solution right now, just something I can run with to get something written up and finally graduate :)

Besides, I hope my project is modular enough, that I could easily swap out one solution for metadata storage for another one later on should I find a better solution.

Just thought I'd ask the masses what they think since everyone's on the subject right now.

Thanks,
Doug

Ralph Heimburger

unread,

Jul 27, 2009, 10:47:42 AM7/27/09

to cl...@googlegroups.com

Typically metadata refers to the structure of storage objects, e.g. Database Tables, Cognos models, answer questions of usability, consistency, where used, etc. I don't know how a non structured db would answer questions across databases or if the metadata itself you are referring to is indeed non structured.

Douglas Stanley

unread,

Jul 27, 2009, 11:22:41 AM7/27/09

to cl...@googlegroups.com

I see what you mean, I'm storing scientific data sets, so the metadata I need to store will be describing them. Some of the data sets might just be a 3-d tiff image, so the metadata I want to store would be the kind found in a tiff file. However, not all tiffs are created equal. For example, the olympus tiff format (I think it's actually called fluotiff) stores EXTRA metadata about the microscope that took the image and also about the flourescent dyes used, and sticks it in unconventional places in the resulting tiff file.

So when I said it needed to be unstructured, I meant the metadata itself needed to be unstructured. Plus, the end user might add extra metadata too, so it won't all be coming from the datasets being stored.

Did that answer your question? I guess I'm talking about metadata like the librarian types I work with use the term, and not so much like CS people use the term anymore. So basically, extra "semantic" (those librarian types LOVE the word semantic) data describing the data being stored (which can just be thought of as big blobs of binary, actual bytes).

I hope that made sense.

Thanks,
Doug

Nick Barendt

unread,

Jul 28, 2009, 3:23:03 PM7/28/09

to cl...@googlegroups.com

On Mon, Jul 27, 2009 at 11:22 AM, Douglas Stanley <douglas....@gmail.com> wrote:

I see what you mean, I'm storing scientific data sets, so the metadata I need to store will be describing them. Some of the data sets might just be a 3-d tiff image, so the metadata I want to store would be the kind found in a tiff file. However, not all tiffs are created equal. For example, the olympus tiff format (I think it's actually called fluotiff) stores EXTRA metadata about the microscope that took the image and also about the flourescent dyes used, and sticks it in unconventional places in the resulting tiff file.

So when I said it needed to be unstructured, I meant the metadata itself needed to be unstructured. Plus, the end user might add extra metadata too, so it won't all be coming from the datasets being stored.

Did that answer your question? I guess I'm talking about metadata like the librarian types I work with use the term, and not so much like CS people use the term anymore. So basically, extra "semantic" (those librarian types LOVE the word semantic) data describing the data being stored (which can just be thought of as big blobs of binary, actual bytes).

I hope that made sense.

Do you need to be able to search/query the metadata, or just be able to retrieve a blob based on some unique key (e.g., the path to the TIFF file, hash thereof, or something)?

If you don't need to perform queries against the metadata, a simple key-value store (e.g., anydbm, S3, tokyo cabinet, etc.) might be the best way to go - pretty simple and lots of choices for the backend.

If you need to be able to search the metadata efficiently (e.g., find all images taken by Olympus 1234 with dye ABC) and the metadata doesn't fit a single schema, then you might be best with a schema-free database, like CouchDB, tokyo cabinet's Table database, or Amazon's SDB if you want to stick it in the cloud.