Next session -- Hands on with Riak?

cclark

unread,

Oct 14, 2010, 12:34:29 AM10/14/10

to NoSQL Summer Vancouver

My apologies for the late follow-up to our last session. I was out of
town but now that I'm back I wanted to get the planning rolling.

Our turnout at the last session to discuss Riak was less than usual
but Riak itself seemed quite interesting and one people are looking to
get some real experience with using. Discussion quickly turned to the
idea posed at the previous session about having a hands-on hackathon
with some of the NoSQL offerings.

=== NoSQL Hackathon ===

Here's what I took from the discussion:

Data: There are lots of datasets (energy, UN, Netflix, Honeypot logs
etc.) but we want one with a time series component. Everyone agreed
the Wikipedia page edit and access logs would be a good candidate.

Problem definition: We need to probably refine this but aggregating
across time boundaries (hour, day, week, month, year) was the one
request made time and again. There was also interest in performing
some calculations like average, median, std dev. General search was
brought up at one point.

Implementation: There has already been a suggestion that Team Erlang
was going to be one team. An alternative suggestion was for everyone
to use javascript and instead of comparing NoSQL datastores AND
different languages (and their features, standard libraries etc.) that
we settle on implementing in javascript and compare different
implementations of the same problem and just change the datastore from
session to session. Since almost all of the products use JSON
javascript seems like a natural fit. Speaking of teams we felt teams
of 2-3 working on an implementation and then every team taking turns
presenting their findings afterward

Format: All day or night hackathon vs a regular recurring event.
Immersion is great but the reality is we don't all necessarily have
the time. We'll aim to continue having sessions where we focus on
solving one/the same problem but each time use a different NoSQL
technology. I was going to suggest a 90 minute working window and
then 30 minutes to share findings -- my gut tells me it might be 120
and 45 though.

Tool Chain: We all agreed having a ready to go tool chain is a must
for a productive session. Either we'll have an EC2 image or a dataset
ready to be imported in to the chosen datastore ready to go before the
session and everyone should come with their laptop ready to roll so we
won't waste time installing and configuring but instead can get to the
meat of solving the problem.

Candidate datastores: Riak, Couch, MongoDB, Cassandra, Redis,
Postgres, Hadoop. Postgres was included because we feel it will serve
as a good reference for NoSQL vs SQL.

=== Next Steps ===

1. Discussion: Anything I missed or suggestions for refinement?
2. Pick a data store: I'm going to suggest Riak since it was the last
one we reviewed and that everyone have worked through the intro at
https://wiki.basho.com/display/RIAK/The+Riak+Fast+Track
3. Tool chain prep: If we use the Wikipedia edits I think we have to
work in EC2 because the data is over 1TB but it can be easily mounted
as an EBS volume. Given that Riak is written in Erlang I'm guessing
it makes sense to create an AMI and share it amongst everyone. Any
volunteers to complete this critical step?
4. Refine the problem: We need to layout exactly what we want to
achieve in the exercise. As a strawman suggestion:
- Pick 5 articles to get up and running and then expand the
algorithms out from there (Apple_Inc., Vancouver, Barack_Obama,
Beatles and Beer)
- Count the page edits and page views over the history of the
dataset and aggregate to report over days, weeks and months
- Identify the days of most edits and page views
5. Do it: I'm a little late in sending out the invite and the prep for
this session is more than just reading a paper. I'm going to suggest
we don't do it on Oct 18th which would've been 2 weeks but on Oct 25th
if we can get the image ready and problem defined by the 20th or Nov 1
if it takes longer.

Thoughts, questions or concerns? I'm sure there are things I'm
missing out on.

thanks,
chuck

saem

unread,

Oct 14, 2010, 3:02:24 PM10/14/10

to NoSQL Summer Vancouver

Over the weekend, my intention is to get the data (found here
http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset)
into an EBS volume. Then start poking around, there are more data sets
there than detailed on the page, I'm going to pull down a subset --
not sure if I really feel like paying to pull down the whole thing,
and I'm still not sure how much it'll cost just to access the data in
the first place.

I'll try and do it earlier in the weekend and put up a list of all the
various data sets and their details from the README files, and we can
pick and choose what we want to pull down.

> one we reviewed and that everyone have worked through the intro athttps://wiki.basho.com/display/RIAK/The+Riak+Fast+Track

Tavis Rudd

unread,

Oct 15, 2010, 1:35:47 AM10/15/10

to NoSQL Summer Vancouver

Saem, need any help?

+1 for the 25th.

Chuck, I'll look into preparing (or just re-using) an AMI this
weekend.

On Oct 14, 12:02 pm, saem <saemgh...@gmail.com> wrote:
> Over the weekend, my intention is to get the data (found herehttp://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset)

David Dossot

unread,

Oct 15, 2010, 2:11:25 AM10/15/10

to nosql-summe...@googlegroups.com

25th is no-go for me as I really want to attend VanDev's presentation on Lean/Kanban.

D.

Tavis Rudd

unread,

Oct 15, 2010, 2:31:13 PM10/15/10

to nosql-summe...@googlegroups.com

hmm, I'd also like to attend that -> so, +1 for this Monday. Even if
we're not ready for a dedicated hack session by then we could use the time
to finish the prep work. Once we have the data ebs and an ami,
we should be ready for doing view/edit count aggregration across time
slices.

geoff webb

unread,

Oct 17, 2010, 3:41:37 PM10/17/10

to nosql-summe...@googlegroups.com

Hi Folks

I'd be keen on tomorrow, Oct 18th to, even if just to help with prep
and planning/setup.

I am in Australia from Oct 22nd - Nov 7th so will most likely miss the
real-deal hands-on session *sigh*.

Geoff

chuck clark

unread,

Oct 17, 2010, 4:01:41 PM10/17/10

to nosql-summe...@googlegroups.com

I definitely don't think we're ready for the Riak session tomorrow because I'm sure most people haven't yet looked at the Riak Fast Track. And it sounds like we're starting to get a little bit light for next Monday. I'm also not confident yet we'd be ready for a Riak session next Monday. Unfortunately there is another internal event happening at the Pulse Energy offices tomorrow but I like the idea of an informal session to get an AMI with Riak and EBS volumes ready. That way we can be sure we're ready two weeks from now.

The Pulse offices are available on Wednesday this week otherwise if Monday works out best for everyone we could pick a coffeeshop with WiFi or another office is someone else can host.

thanks,

chuck

Tavis Rudd

unread,

Oct 18, 2010, 12:57:26 AM10/18/10

to NoSQL Summer Vancouver

Saem, Geoff and I got started on prepping an EBS of a subset of the
data today. I'm up for something informal tomorrow night continuing
the prep work. Anyone else interested?

On Oct 17, 1:01 pm, chuck clark <ccl...@ziclix.com> wrote:
> I definitely don't think we're ready for the Riak session tomorrow because
> I'm sure most people haven't yet looked at the Riak Fast Track. And it
> sounds like we're starting to get a little bit light for next Monday. I'm
> also not confident yet we'd be ready for a Riak session next Monday.
> Unfortunately there is another internal event happening at the Pulse Energy
> offices tomorrow but I like the idea of an informal session to get an AMI
> with Riak and EBS volumes ready. That way we can be sure we're ready two
> weeks from now.
>
> The Pulse offices are available on Wednesday this week otherwise if Monday
> works out best for everyone we could pick a coffeeshop with WiFi or another
> office is someone else can host.
>
> thanks,
> chuck
>

> On Sun, Oct 17, 2010 at 12:41 PM, geoff webb <geoffnettagl...@gmail.com>wrote:
>
> > Hi Folks
>
> > I'd be keen on tomorrow, Oct 18th to, even if just to help with prep
> > and planning/setup.
>
> > I am in Australia from Oct 22nd - Nov 7th so will most likely miss the
> > real-deal hands-on session *sigh*.
>
> > Geoff
>

David Dossot

unread,

Oct 18, 2010, 11:36:37 AM10/18/10

to nosql-summe...@googlegroups.com

I'm turning into the master of lousy excuses but... tonight Agile Vancouver is having a presentation that seems very interesting...

:)

saem

unread,

Oct 18, 2010, 6:16:29 PM10/18/10

to NoSQL Summer Vancouver

I think 6PM at Agro Cafe in Yaletown (http://maps.google.com/maps?
f=q&hl=en&geocode=&time=&date=&ttype=&q=1207+Hamilton+Street
+vancouver&sll=37.0625,-95.677068&sspn=45.197878,71.894531&ie=UTF8&z=16&iwloc=addr&om=1)
should be a decent meeting spot, as currently most of the people who
said they're coming will be in the area already. At 615PM we might
move if the venue doesn't work out -- if that's the case and you can't
find us, just email me directly via the list (Smart Phone FTW), or
call if you have my number.

On Oct 18, 8:36 am, David Dossot <da...@dossot.net> wrote:
> I'm turning into the master of lousy excuses but... tonight Agile Vancouver

> is having a presentation<http://www.agilevancouver.ca/2010/09/estimating-the-sociological-effe...>that
> seems very interesting...
>
> :)

saem

unread,

Oct 19, 2010, 1:10:09 AM10/19/10

to NoSQL Summer Vancouver

Tavis, Geoff, and I got together tonight, the result:

The AMI is in better shape, we understand the data, and we got Riak up
and running. It's just a matter of some polish, and then we can push a
'canonical' form of the data into one of Riak's buckets -- flat
namespaces that it supports. This should be a good starting point for
people to start reworking it into any other schema they want. It's
pretty close to the point where people can clone the AMI and have a
useful environment. Another possibility I'm considering is pulling
down part of the data set locally and then running it that way (yay,
for a Linux laptop), and possibly just creating a simple VirtualBox
image. Which should run decently on people's laptops, etc.

Tomorrow is 'Why haskell matters?' put on by the Vancouver Functional
Programming Unmeetup, and the day after is the JS meetup with WebGL, I
think both Travis and I are in dispose, so looks like Thursday might
be when we next pick this up.

saem

unread,

Oct 28, 2010, 12:44:46 AM10/28/10

to NoSQL Summer Vancouver

I have a free evening tomorrow (I'll be starting at 5:00), and will
hopefully be getting things better organized for Riak hacking.

I'm going to be at Agro Coffee shop (http://www.agrocafe.org/locations/
yaletown.php), with a possible venue move (to be determined at 6:15).
Feel free to get in touch with me by emailing me directly (see: 'Reply
to author' link below), I get it on my phone so I can get back to you
quickly, I can also pass along my phone number that way.

Simon Claret

unread,

Oct 28, 2010, 2:15:07 AM10/28/10

to nosql-summe...@googlegroups.com

Hi Saem,

I should have some time tomorrow afternoon, if not at 5pm then
starting at 5:30 or 6. I don't know what you guys have done so far
but I'm happy to help set things up.

Cheers,

Simon

Mobile: 778 230 4513

cclark

unread,

Oct 31, 2010, 10:38:53 PM10/31/10

to NoSQL Summer Vancouver

So what are the thoughts on starting to move forward with a Riak hack-
a-thon?

We can meet at the Pulse Energy offices tomorrow and start playing
around with what Saem, Tavis, Geoff and Simon have cooked up so far.
My gut tells me it might take two sessions to get as far as we want so
we might as well get started.

Unfortunately I have a family commitment a little later in the evening
and will have to dodge out early but if we do meetup Jerry will be
around and the space can stay open later so the fun can continue.

thoughts?

chuck

saem

unread,

Nov 1, 2010, 1:46:36 AM11/1/10

to NoSQL Summer Vancouver

Apologies for the the tardy reply. That sounds like a plan, we're
going to need at least one walk through. There are a number of basic
data related problems to tackled that I came up with, but they get
rather same-y, quickly. There is also the entire DevOps, and
performance tuning side, so at the very least a pilot run would be a
very good idea.

saem

unread,

Nov 7, 2010, 6:40:15 PM11/7/10

to NoSQL Summer Vancouver

NoSQL Hands-On: Riak Scheduling

So an AWS image and a snapshot, with the data, exist for people to
use, Tavis and I got that sorted a few nights back. I've got a VM in
VirtualBox up and running, I just need to load the dataset into the
the cluster. Then run through some basics, so there are a at least a
few people aware of the API.

Basics being:

* Map/Reduce
* Links
* V-clocks

There are a few items that I'm still not entirely clear as to how to
cover -- though, I'm not sure we'll get that far.

* Tune r,w,dw values
* Joining and leaving the cluster

I think that's about all the preparation that we need, I'd like to
have others dive in, and see where we get. That being the case, and
seeing as you're hosting, would you like to determine a schedule?

chuck clark

unread,

Nov 7, 2010, 11:40:02 PM11/7/10

to nosql-summe...@googlegroups.com

Awesome. Thanks to Saem and Tavis for all the legwork on getting it setup.

How about if we shoot for getting together at Pulse a week from tomorrow, November 15th at 6:00?

So you'd suggest everyone have Virtual Box ready to run so they can download the image you created and we can run from there?

thanks,

chuck

saem

unread,

Nov 8, 2010, 1:15:23 AM11/8/10

to NoSQL Summer Vancouver

Either VirtualBox, or amazon.

The only 'issue' with the VB image is that the quickest appliance I
could find that was nice and stripped down was a rather old version of
Ubuntu, it works and everything seems fine, but if people expect
things from repos, it'll be old.

The image Tavis got up and running for AWS is pretty awesome and up to
date, thanks to it being based on Gentoo, IIRC. It's got more love and
polish. It's up to whatever people are more comfortable with, and the
AWS costs are minor, and for new sign-ups, they can use a free account
-- lucky bastards.

Reply all

Reply to author

Forward