First migration from typical relational (MySQL) to NoSQL

76 views
Skip to first unread message

Edgar Veiga

unread,
Mar 30, 2012, 8:01:23 PM3/30/12
to NOSQL
Hello all!

I'm dropping here this message in order to get some ideas/opinions on
a migration from a typical relational solution to a NoSQL one.

First, a little context on the situation:
- Our database has experienced a fast grown in the past few months,
and at the present moment we are achieving very fast a 1.5TB of
information with a tendency to grow event faster until the end of 2012
(triplicate this value).
- We are sincerely considering the hypothesis of migrating our
persistency system to a NoSQL one, since that MySQL is becoming the
bottleneck of our application (We replicated it, sharded it,
partitioned it, but still it won't feedback in an acceptable time).
- We already started the process of layering our application in
different services, still, we want to give a chance to NoSQL as the
persistency system and don't stick "ad eternum" to the relational
paradigm.

Requisits:
- The NoSQL flavour should be oriented to Document Store: We have the
visitor entity with a lot of attributes, sessions related to a
visitor, and actions perfomed by those visitors during a session. We
think that a document that represents this information would fit
quietly well!
- We write a lot of of records per second (sessions and actions).
- We read big big sets of data(i.e. visitor entire history).
- On the top of this, we still need to deliver accurate and on demand
(real time) analytics related to visitors and actions.

Hardware:
- We will have 4 dedicated servers (for persistency) to this implement
this new architecture, each one with 96Gs of RAM, SSD disks and 2
quadcores 2.4Ghz. (not quiet sure on this but it will be very similar)

Now the question is:
We have been reading a lot about couchbase and mongodb:
- The first one gives us an easier deploy, with master to master
replication, auto sharding and balancing but without a convenient way
of making queries in order to retrieve less and concrete data.
- The second one has master-slave replication, a more difficult deploy
and maintenance process, but a query/fetch system a little more
elastic than couchbase.

With your experience, can you give us some topics/ideas/opinions on
how to start this implementation?

Best Regards!

Itamar Syn-Hershko

unread,
Mar 31, 2012, 7:39:13 PM3/31/12
to nosql-di...@googlegroups.com
You left the most important details out - what kind of queries are you going to be performing?

Having the data modeled correctly is 80% of the work really. Once you've done that, everything should more or less be a breeze. Modeling for a document-based DB is completely different than for RDBMSes. You need to think in terms of transactional boundaries and remember you can store full object graphs (even non flat ones). You should also consider and take use of other features that are provided by the DB you choose.

In that respect - check out RavenDB - easy deploy, including master-master replication capabilities with the ability to automatically resolve write conflicts between nodes. I think you'll find the indexes and querying mechanism of Raven quite fluent and elastic.

Itamar.


--
You received this message because you are subscribed to the Google Groups "NOSQL" group.
To post to this group, send email to nosql-di...@googlegroups.com.
To unsubscribe from this group, send email to nosql-discussi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nosql-discussion?hl=en.



Edgar Veiga

unread,
Apr 2, 2012, 3:33:28 PM4/2/12
to NOSQL
Hi Itamar,

First of all, thanks for your response!

So, just to complete the last post, let me give you a little bit more
context on our document model:
We have the visitor entity:
- name, telephone, email, country, etc

Sessions (each visitor has a lot of sessions, with information like
the OS, browser, creation_timestamp, duration, ip, etc);
Events(each session has a lot of events, like clicks, mouse move, etc)
Actions(during a session, a visitor may or may not execute some
actions)

The queries should be like: Give me all the ACTIVE visitor of
countries A,B,C using Chrome as browser but not IE. Or give me all the
visitor with a session that lasted at least X minutes.

This information should fit very well in a document! We have started
to model this, but right now we're kind a stuck in what document store
system to use, just like expressed at the end of the last post.

Best Regards,

Itamar Syn-Hershko

unread,
Apr 4, 2012, 5:22:35 AM4/4/12
to nosql-di...@googlegroups.com
So you are basically asking questions about the session - you can probably go ahead and have 2 types of documents: Visitor and Session, where the session document will contain all Event and Action objects. With smart use of indexes you can have all queries answered.

Did you have a look at RavenDB?

Emin Gun Sirer

unread,
Apr 4, 2012, 9:10:13 AM4/4/12
to nosql-di...@googlegroups.com
Hi Edgar,

This is a very natural and straightforward fit for HyperDex
(http://www.hyperdex.org). Three tables, no indices, very fast and
consistent lookups coupled with very expressive queries. You get to
specify the schema in a way very similar to how you'd do it in SQL.
Let us know on the hyperdex-discuss mailing list if you need further
help.

- egs

Edgar Veiga

unread,
Apr 4, 2012, 11:33:14 AM4/4/12
to nosql-di...@googlegroups.com
I already knew RavenDB because I tend to follow Ayende blog!

Is it RavenDB stable enough to run under a linux box with a python client?

Best Regards

Edgar Veiga

unread,
Apr 4, 2012, 11:39:03 AM4/4/12
to nosql-di...@googlegroups.com
Thanks Emin,

I'm gonna take a look at hyperdex!

Best Regards,

Itamar Syn-Hershko

unread,
Apr 4, 2012, 11:39:31 AM4/4/12
to nosql-di...@googlegroups.com
Yes. You can run the server on a Linux box (using Mono), but the recommended storage engine for production is Esent, which is not available there. There are plans to support a production-ready storage engine on Linux as well, but that might take a while.

Try looking at hosted solutions like CloudBird and RavenHQ, your python client could talk with them just the same

--
You received this message because you are subscribed to the Google Groups "NOSQL" group.
To view this discussion on the web visit https://groups.google.com/d/msg/nosql-discussion/-/RektESEze3gJ.
Reply all
Reply to author
Forward
0 new messages