Riak

64 views
Skip to first unread message

CLR

unread,
May 18, 2012, 1:20:30 PM5/18/12
to pdx...@googlegroups.com
Hey there, fellow Rubyists.  I recall a talk on MongoDB a few months ago.  Any interest in hearing about Riak, a NoSQL database that is fault-tolerant and scales horizontally?  I've run a couple training courses on Riak now, and I could put together a talk pretty quickly if there is interest.

Thanks!
-Casey

kocherek

unread,
May 18, 2012, 1:21:20 PM5/18/12
to pdx...@googlegroups.com
Super interested. 

--
You received this message because you are subscribed to the Google Groups "pdxruby" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pdxruby/-/p7wZzFJzro8J.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxruby+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxruby?hl=en.

Peter Keen

unread,
May 18, 2012, 1:25:39 PM5/18/12
to pdx...@googlegroups.com
Yeah, that sounds really interesting.

Sean McCleary

unread,
May 18, 2012, 1:30:24 PM5/18/12
to pdx...@googlegroups.com
I was just studying up on Riak. I would love to know more. I am currently trying to decide between using Riak or Cassandra for a project I am working on. I am leaning towards Riak just because it looks like more fun.

- Sean McCleary

Andrew Ettinger

unread,
May 18, 2012, 1:30:30 PM5/18/12
to pdx...@googlegroups.com
+1

Adron Hall

unread,
May 18, 2012, 1:33:38 PM5/18/12
to pdx...@googlegroups.com
Ditto. I'm working on putting something together with it soon and would love to see a preso and talk shop.

Adron

Brad Heller

unread,
May 18, 2012, 1:52:30 PM5/18/12
to pdx...@googlegroups.com, Jon Frisby
+1, we've been evaluating riak for a bit now and this would be super useful to us.

Bill Burcham

unread,
May 18, 2012, 1:58:37 PM5/18/12
to pdx...@googlegroups.com
Riak is the most powerful open-source, distributed database you'll ever put into production.

+1

PS I wonder where the emphasis goes in that blurb. Is it on "powerful", "distributed", "production" or "you" ;-)

Adron Hall

unread,
May 18, 2012, 2:14:55 PM5/18/12
to pdx...@googlegroups.com
So that's enough +1s I think, you're coming to present and talk shop with the meetup eh Casey?  :)

What is the date of the next meeting anyway? 

-Adron



--
You received this message because you are subscribed to the Google Groups "pdxruby" group.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxruby+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pdxruby?hl=en.



--
Adron B Hall
Iron Foundry Projecthttp://www.ironfoundry.org

Audrey Eschright

unread,
May 18, 2012, 2:20:24 PM5/18/12
to pdx...@googlegroups.com

On May 18, 2012, at 11:14 AM, Adron Hall wrote:

> What is the date of the next meeting anyway?

June 5th.

Audrey

CLR

unread,
May 18, 2012, 2:22:58 PM5/18/12
to pdx...@googlegroups.com
Awesome.  Checking my calendar.  Thanks!
-Casey

Sam Livingston-Gray

unread,
May 18, 2012, 9:57:06 PM5/18/12
to pdx...@googlegroups.com
I'd love to hear why someone might think it a good idea to wrap all Riak calls in a thousand-line Haskell program... http://devblog.bu.mp/from-mongodb-to-riak

--
(Sent from phone; please excuse brevity.)
--
You received this message because you are subscribed to the Google Groups "pdxruby" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pdxruby/-/p7wZzFJzro8J.

CLR

unread,
May 30, 2012, 2:51:36 PM5/30/12
to pdx...@googlegroups.com
With whom do I confirm this talk?  How much time do I have, etc?  Who runs this ship, anyhow?  ;-)

Thanks!
-Casey


On Friday, May 18, 2012 6:57:06 PM UTC-7, Sam LG wrote:
I'd love to hear why someone might think it a good idea to wrap all Riak calls in a thousand-line Haskell program... http://devblog.bu.mp/from-mongodb-to-riak

--
(Sent from phone; please excuse brevity.)

On May 18, 2012, at 10:20 AM, CLR wrote:

Hey there, fellow Rubyists.  I recall a talk on MongoDB a few months ago.  Any interest in hearing about Riak, a NoSQL database that is fault-tolerant and scales horizontally?  I've run a couple training courses on Riak now, and I could put together a talk pretty quickly if there is interest.

Thanks!
-Casey

--
You received this message because you are subscribed to the Google Groups "pdxruby" group.
To view this discussion on the web visit https://groups.google.com/d/msg/pdxruby/-/p7wZzFJzro8J.
To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxruby+unsubscribe@googlegroups.com.

Jesse Cooke

unread,
May 30, 2012, 3:41:01 PM5/30/12
to pdx...@googlegroups.com
On Wed, May 30, 2012 at 11:51 AM, CLR <clr.m...@gmail.com> wrote:
With whom do I confirm this talk?  How much time do I have, etc?  Who runs this ship, anyhow?  ;-)
Confirmed!
You can have as much time as you want, but I tend to get antsy after 45 minutes.
This ship runs itself! :) 
To view this discussion on the web visit https://groups.google.com/d/msg/pdxruby/-/AIdx0ijrATcJ.

To post to this group, send email to pdx...@googlegroups.com.
To unsubscribe from this group, send email to pdxruby+u...@googlegroups.com.

Matthew Boeh

unread,
May 30, 2012, 3:41:23 PM5/30/12
to pdx...@googlegroups.com
Nobody! It's an anarchist ghost pirate ship. Decisions are reached by
consensus and/or indifference and/or the occasional keelhauling.

The next meeting is Tuesday, June 5 at 7 PM. As far as I know, there's
nothing else scheduled for that meeting, so you should have as much
time as you like.

Matthew Boeh


On Wed, May 30, 2012 at 11:51 AM, CLR <clr.m...@gmail.com> wrote:
>> pdxruby+u...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/pdxruby?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pdxruby" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/pdxruby/-/AIdx0ijrATcJ.
>
> To post to this group, send email to pdx...@googlegroups.com.
> To unsubscribe from this group, send email to
> pdxruby+u...@googlegroups.com.

Igal Koshevoy

unread,
May 30, 2012, 11:05:04 PM5/30/12
to pdx...@googlegroups.com
On Fri, May 18, 2012 at 10:20 AM, CLR <clr.m...@gmail.com> wrote:
Hey there, fellow Rubyists.  I recall a talk on MongoDB a few months ago.  Any interest in hearing about Riak, a NoSQL database that is fault-tolerant and scales horizontally?  I've run a couple training courses on Riak now, and I could put together a talk pretty quickly if there is interest.

Sorry for not getting to this earlier. Can you post a paragraph summary of what you'd like to talk about? How long do you need? How does 45-60 minutes sound? 

It would be ideal if you covered the following topics in your talk:
* What is this thingy, in plain English?
* How is it different from relational and other NoSQL databases?
* Why do people choose it? Why do people NOT choose it?
* How is it used from Ruby, Rails, etc? This is a Ruby group, after all. :)
* How do you scale your storage?
* How does it deal with failure? At what point will it lose data?
* How stable is it? For reals?

I'm specifically mentioning these questions because they're often overlooked or glossed over by those with a professional stake in such a technology, and it's better for all if you've prepared answers for them because people will ask.

Thanks,

-igal, occasional ghost ship captain

Jesse Cooke

unread,
May 31, 2012, 11:55:40 AM5/31/12
to pdx...@googlegroups.com

I've been talking with Dann Stayskal at Copious about presenting on open source licenses. Not specifically Ruby but should still be very interesting. He'll need at least 30 minutes.

--
You received this message because you are subscribed to the Google Groups "pdxruby" group.

Igal Koshevoy

unread,
May 31, 2012, 12:48:22 PM5/31/12
to pdx...@googlegroups.com
On Thu, May 31, 2012 at 8:55 AM, Jesse Cooke <je...@jc00ke.com> wrote:

I've been talking with Dann Stayskal at Copious about presenting on open source licenses. Not specifically Ruby but should still be very interesting. He'll need at least 30 minutes.

Hmmm. PLUG <http://www.pdxlinux.org/> covers this topic regularly and seems like the more appropriate group for it.

Would people still like to hear about this at pdxruby? Do we want to want to dedicate this month's meeting to two non-Ruby related topics, or should we split these across meetings so there's more time for Ruby-related discussion at each?

-igal 

Jesse Cooke

unread,
May 31, 2012, 1:03:50 PM5/31/12
to pdx...@googlegroups.com
On Thu, May 31, 2012 at 9:48 AM, Igal Koshevoy <ig...@pragmaticraft.com> wrote:
On Thu, May 31, 2012 at 8:55 AM, Jesse Cooke <je...@jc00ke.com> wrote:

I've been talking with Dann Stayskal at Copious about presenting on open source licenses. Not specifically Ruby but should still be very interesting. He'll need at least 30 minutes.

Hmmm. PLUG <http://www.pdxlinux.org/> covers this topic regularly and seems like the more appropriate group for it.
Yes, while PLUG might go over it more, in the last few years I haven't heard this kind of material at a pdxruby. 

Would people still like to hear about this at pdxruby? Do we want to want to dedicate this month's meeting to two non-Ruby related topics, or should we split these across meetings so there's more time for Ruby-related discussion at each?
Pushing Dann back to next month would be ok if people want more Ruby content this month. 

-igal 

Jacob Helwig

unread,
May 31, 2012, 1:10:40 PM5/31/12
to pdx...@googlegroups.com
I don't quite see what makes PDX.rb any less of an appropriate venue for this than PLUG. We use (and sometimes even write) OSS on a regular basis as rubyists, so this seems like a very relevant topic for a PDX.rb meeting. I could definitely see wanting to do this a different month from the Riak talk, but if nobody else has anything they'd want to talk about, and if Dan's ready...

--
Jacob Helwig
http://technosorcery.net/about/me


signature.asc

Jesse Cooke

unread,
May 31, 2012, 1:31:51 PM5/31/12
to pdx...@googlegroups.com
Even recently a prominent Ruby project, Sidekiq, went through some license changes. Mike wanted dual personal/commercial and it was pretty confusing. I think the Ruby community can be educated on this topic.

Dann Stayskal

unread,
May 31, 2012, 2:22:42 PM5/31/12
to pdx...@googlegroups.com, Jesse Cooke
On 05/31/2012 10:03 AM, Jesse Cooke wrote:
> On Thu, May 31, 2012 at 9:48 AM, Igal Koshevoy <ig...@pragmaticraft.com
> <mailto:ig...@pragmaticraft.com>> wrote:
> On Thu, May 31, 2012 at 8:55 AM, Jesse Cooke <je...@jc00ke.com
> <mailto:je...@jc00ke.com>> wrote:
> Would people still like to hear about this at pdxruby? Do we want to
> want to dedicate this month's meeting to two non-Ruby related
> topics, or should we split these across meetings so there's more
> time for Ruby-related discussion at each?
>
> Pushing Dann back to next month would be ok if people want more Ruby
> content this month.

I'm fine with presenting this next month if need be or presenting at a
different group, just let me know either way whether I should have
something ready for Tuesday. This is a talk I've given multiple times,
so I don't need much advanced notice to have it ready to go.

The topics I generally cover in this talk are:

* Where FL/OSS licensing fits within the intellectual property world
* The major differences between Licenses and Agreements (EULAs)
* Techniques to evaluate licenses for web v. native development
* The GPL, MIT, BSD, Affero, CC, and Artistic licenses (as well as any
others that a group specifically requests that I cover)
* Balance of rights and responsibilities with FL/OSS software
* Navigating the complexities of multiple licensing

My background is in software engineering (IANAL), though I've also
written two peer-reviewed books (and am actively working on a third)
that cover the academic territory between intellectual property, human
rights, and globalization. Both of my current books are available
through a Creative Commons license at <http://dann.stayskal.com/books/>.

-- Dann

Jon Guymon

unread,
May 31, 2012, 4:01:19 PM5/31/12
to pdx...@googlegroups.com
If we don't already have enough presentations for Tuesday, I could present on Unix IPC in Ruby. I just finished a months long expedition into this territory at New Relic and I could stand and report on what I found.

I would be some code walk-through of the New Relic Ruby agent, but I would also go over general concepts and some alternatives to what I ended up using.

Interest? Available time?

-Jon

Jesse Cooke

unread,
May 31, 2012, 8:16:12 PM5/31/12
to pdx...@googlegroups.com
Yes, I would be interested in hearing about this. Since it's Ruby related, let's schedule it for this month & we can have Dann present next month.

So, Riak and Unix IPC plus Hangman of course ;) 

-Jon

CLR

unread,
Jun 1, 2012, 4:09:05 AM6/1/12
to pdx...@googlegroups.com
45 to 60 minutes sounds great.  I will be covering the following topics in my talk:

* What is this thingy, in plain English?
* How is it different from relational and other NoSQL databases?
* Why do people choose it? Why do people NOT choose it?
* How is it used from Ruby, Rails, etc? This is a Ruby group, after all. :)
* How do you scale your storage?
* How does it deal with failure? At what point will it lose data?
* How stable is it? For reals?

:-)

Thanks!
-Casey

On Wednesday, May 30, 2012 8:05:04 PM UTC-7, Igal Koshevoy wrote:

Igal Koshevoy

unread,
Jun 1, 2012, 3:50:46 PM6/1/12
to pdx...@googlegroups.com
On Thu, May 31, 2012 at 1:01 PM, Jon Guymon <jon.g...@gmail.com> wrote:
Yes, this would be great.

> Available time?
How's 30 minutes sound?

Thanks for offering to do this talk.

Jesse Cook wrote:
> Yes, I would be interested in hearing about this. Since it's Ruby related, let's schedule it for this month & we can have Dann present next month.

:)

-igal

Igal Koshevoy

unread,
Jun 1, 2012, 4:08:57 PM6/1/12
to pdx...@googlegroups.com
On Fri, Jun 1, 2012 at 1:09 AM, CLR <clr.m...@gmail.com> wrote:
> 45 to 60 minutes sounds great.  I will be covering the following topics in
> my talk:
>
> * What is this thingy, in plain English?
> * How is it different from relational and other NoSQL databases?
> * Why do people choose it? Why do people NOT choose it?
> * How is it used from Ruby, Rails, etc? This is a Ruby group, after all. :)
> * How do you scale your storage?
> * How does it deal with failure? At what point will it lose data?
> * How stable is it? For reals?

Awesome, I look forward to it.

Also, I didn't mean to railroad you into these topics, nor imply that
you wouldn't cover them on your own. I've seen some product-oriented
talks recently that didn't cover these high-level topics as part of
the talk and things got awkward. I only suggested these because I
haven't seen you present before and wanted to help you prepare. I'm
sure you'll do great. Thanks for volunteering.

-igal

Jon Guymon

unread,
Jun 1, 2012, 4:19:30 PM6/1/12
to pdx...@googlegroups.com
On Jun 1, 2012, at 12:50 PM, Igal Koshevoy wrote:
> How's 30 minutes sound?
>
> Thanks for offering to do this talk.

Thirty minutes would be great, I'll work it up over the weekend. See you guys Tuesday!

Igal Koshevoy

unread,
Jun 1, 2012, 4:20:53 PM6/1/12
to pdx...@googlegroups.com, Jesse Cooke
Sounds great.

I read through the links Jesse posted (e.g.
<https://github.com/mperham/sidekiq/issues/128>), found Mike comments
and Zed's essay interesting, and did some more reading about this. So,
yeah, I think this is worth talking about at pdxruby.

As Jesse suggested, let's schedule you for 30-45 minutes next month.

Thanks!

-igal

Igal Koshevoy

unread,
Jun 1, 2012, 4:21:18 PM6/1/12
to pdx...@googlegroups.com
On Fri, Jun 1, 2012 at 1:19 PM, Jon Guymon <jon.g...@gmail.com> wrote:
> On Jun 1, 2012, at 12:50 PM, Igal Koshevoy wrote:
>> How's 30 minutes sound?
>
> Thirty minutes would be great, I'll work it up over the weekend.  See you guys Tuesday!

Excellent, thank you!

-igal

markus

unread,
Jun 6, 2012, 3:35:05 PM6/6/12
to pdx...@googlegroups.com
Casey asked me to write up a send to the list the math behind a minor
quibble I had with one of the statements in his presentation.
Specifically, the claim was that:

If your network becomes partitioned into two equal-sized sets,
it is extremely/astronomically unlikely that all copies of any
of the keys will fall in one of the sets, leaving the other set
without a copy.

These seems plausible for small configurations; in fact, it is easy to
see that for S=4 nodes and n=3 copies, there is no way to split the
network into two equal sized sets {S1=2,S2=2} where either set has all
three copies of the key's data ('cause 3 > 2).

But in the limit of large N, we should in fact expect a significant
fraction (1/2^n) of the keys to fall entirely within one of the sets.
To see this, first consider an arbitrary key k and an arbitrary
partitioning of S -> {S1,S2}; for each of the n nodes which hold a copy
of k's data (in arbitrary but consistent order, say by IP address) we
write a "1" if that node is in S1 and a "0" if it is in S2. This will
give us an n bit binary number, of which there are 2^n. And clearly
1/2^n of these will be all 0.

For n = 3 this means that we should expect 1/8th of the keys, or 12.5%,
to be on the "other side" of a 50/50 partition in the limit case of
large S.

The objection was raised that this is unrealistic (S isn't infinite) and
that in practical cases the situation is much more favourable.

Working through the math in detail, though, we can see that this is not
the case.

For the number of different ways to choose k elements from a set of n
without replacement, we define [from n choose k] = n!/(k!(n-k)!) as
usual, there are

T = [from S choose S/2]

different ways to partition a network of S nodes into two equal sized
{S/2,S/2} sets. For any key with n copies the bad cases will be those
in which all n are in the inaccessible side, of which there are:

B = [from S-n choose S/2]

and thus the fraction of all possible worlds in which should expect our
arbitrarily selected key to be inaccessible is:

B/T = [from S-n choose S/2]/[from S choose S/2]

Working through the specific case that was mentioned, S=50, n=3 we find:

B/T = [from 47 choose 25]/[from 50 choose 25]
= ((47!)/(25!22!)) / (50!/(25!25!))
= (25!/22!) / (50!/47!)
= (25 * 24 * 23) / (50 * 49 * 48)
= (25/50) * (24/49) * (23/48)
= 1/2 * 1/(2 1/24) * 1/(2 2/23)
~ 1/8.5

So for any arbitrary key, about 12% of the time we should expect it to
fall on the far side of a 50/50 partition of 50 node network. 12% is
not "astronomically unlikely," which was the core of my point.

For smaller networks, the situation is better, but not astronomically
so. With S = 10 & n = 3, we have, by the same formula:

B/T = (5*4*3)/(10*9*8) = (5*4*3)/(2*5 * 3*3 * 2*4) = 1/12

So 8.5% of the keys will be AWOL.

If this is unacceptable, the solution is of course to increase n as the
network grows (within some reasonable bounds, such as S/2+2 > n >
ln2(risk indifference threshold), or sqrt(S) or...?) but this means that
the constant replication time scaling claim would have to be weakened to
an O(ln(n)) or O(n^(1/2)) bound.

This is a legitimate tuning parameter, and one that would be easy to map
to business choices (cost vs. risk tolerance). The situation is still
very good, just not "too good to be true."

-- Markus







CLR

unread,
Jul 22, 2012, 6:53:55 PM7/22/12
to pdx...@googlegroups.com
This is a little late, given that my presentation was a few months ago, however: I want to thank everyone for coming out to the Introduction to Riak presentation that I gave.  Your questions were all excellent, and I had a wonderful time discussing Big Data with the group.

I want to thank Markus in particular for following up with the explanation of this condition that we were poking at near the end of the meeting.  It took me a re-read to follow the math, but of course I do indeed agree with Markus' conclusion.  In the case of an even network partition, we would have a small but significant portion of the data unreadable given the default, random Node names.

There are a few ways to mitigate this:
*) Increase the value of N.  This wouldn't necessarily decrease cluster performance, because you could allow W and R values to remain the same.  As long as internal network pressure isn't saturated, then cluster response time would remain the same.  In fact, cluster performance would be expected to improve, since for any given operation, with a higher N value, some closer node would respond to the R or W than would with a lower N on a cluster with lots of nodes.
*) Manually construct the ring file so that the node pref list reflects nodes that are physically distributed around the ring.
*) Wait for rack-awareness in an upcoming version of Riak. ;-)

There is a competing database to RiakCS that uses a ring where the replication offset is (360 / N) degrees around the ring.  So with N=3, if a data object hashes to a position on the ring in the first third of the state space, then the second copy will be propagated to the second third, and the third copy will be stored on the last third.  If the ring corresponds to a physical layout of nodes, then you can assure that in a 50/50 network partition split there will always be at least one copy of the data available on each side, barring other simultaneous failures.  This has significant implications in the administration of new nodes, etc. -- but it does 'solve' this one situation, which I thought is interesting.

Thanks again for the vigorous discussion.  I enjoyed it very much, and I encourage others to present if for no other reason that to endure such a lively exploration.

Thanks!
-Casey
Reply all
Reply to author
Forward
0 new messages