I'm dropping here this message in order to get some ideas/opinions on
a migration from a typical relational solution to a NoSQL one.
First, a little context on the situation:
- Our database has experienced a fast grown in the past few months,
and at the present moment we are achieving very fast a 1.5TB of
information with a tendency to grow event faster until the end of 2012
(triplicate this value).
- We are sincerely considering the hypothesis of migrating our
persistency system to a NoSQL one, since that MySQL is becoming the
bottleneck of our application (We replicated it, sharded it,
partitioned it, but still it won't feedback in an acceptable time).
- We already started the process of layering our application in
different services, still, we want to give a chance to NoSQL as the
persistency system and don't stick "ad eternum" to the relational
paradigm.
Requisits:
- The NoSQL flavour should be oriented to Document Store: We have the
visitor entity with a lot of attributes, sessions related to a
visitor, and actions perfomed by those visitors during a session. We
think that a document that represents this information would fit
quietly well!
- We write a lot of of records per second (sessions and actions).
- We read big big sets of data(i.e. visitor entire history).
- On the top of this, we still need to deliver accurate and on demand
(real time) analytics related to visitors and actions.
Hardware:
- We will have 4 dedicated servers (for persistency) to this implement
this new architecture, each one with 96Gs of RAM, SSD disks and 2
quadcores 2.4Ghz. (not quiet sure on this but it will be very similar)
Now the question is:
We have been reading a lot about couchbase and mongodb:
- The first one gives us an easier deploy, with master to master
replication, auto sharding and balancing but without a convenient way
of making queries in order to retrieve less and concrete data.
- The second one has master-slave replication, a more difficult deploy
and maintenance process, but a query/fetch system a little more
elastic than couchbase.
With your experience, can you give us some topics/ideas/opinions on
how to start this implementation?
You left the most important details out - what kind of queries are you going to be performing?
Having the data modeled correctly is 80% of the work really. Once you've done that, everything should more or less be a breeze. Modeling for a document-based DB is completely different than for RDBMSes. You need to think in terms of transactional boundaries and remember you can store full object graphs (even non flat ones). You should also consider and take use of other features that are provided by the DB you choose.
In that respect - check out RavenDB - easy deploy, including master-master replication capabilities with the ability to automatically resolve write conflicts between nodes. I think you'll find the indexes and querying mechanism of Raven quite fluent and elastic.
On Sat, Mar 31, 2012 at 3:01 AM, Edgar Veiga <edgarmve...@gmail.com> wrote: > Hello all!
> I'm dropping here this message in order to get some ideas/opinions on > a migration from a typical relational solution to a NoSQL one.
> First, a little context on the situation: > - Our database has experienced a fast grown in the past few months, > and at the present moment we are achieving very fast a 1.5TB of > information with a tendency to grow event faster until the end of 2012 > (triplicate this value). > - We are sincerely considering the hypothesis of migrating our > persistency system to a NoSQL one, since that MySQL is becoming the > bottleneck of our application (We replicated it, sharded it, > partitioned it, but still it won't feedback in an acceptable time). > - We already started the process of layering our application in > different services, still, we want to give a chance to NoSQL as the > persistency system and don't stick "ad eternum" to the relational > paradigm.
> Requisits: > - The NoSQL flavour should be oriented to Document Store: We have the > visitor entity with a lot of attributes, sessions related to a > visitor, and actions perfomed by those visitors during a session. We > think that a document that represents this information would fit > quietly well! > - We write a lot of of records per second (sessions and actions). > - We read big big sets of data(i.e. visitor entire history). > - On the top of this, we still need to deliver accurate and on demand > (real time) analytics related to visitors and actions.
> Hardware: > - We will have 4 dedicated servers (for persistency) to this implement > this new architecture, each one with 96Gs of RAM, SSD disks and 2 > quadcores 2.4Ghz. (not quiet sure on this but it will be very similar)
> Now the question is: > We have been reading a lot about couchbase and mongodb: > - The first one gives us an easier deploy, with master to master > replication, auto sharding and balancing but without a convenient way > of making queries in order to retrieve less and concrete data. > - The second one has master-slave replication, a more difficult deploy > and maintenance process, but a query/fetch system a little more > elastic than couchbase.
> With your experience, can you give us some topics/ideas/opinions on > how to start this implementation?
> Best Regards!
> -- > You received this message because you are subscribed to the Google Groups > "NOSQL" group. > To post to this group, send email to nosql-discussion@googlegroups.com. > To unsubscribe from this group, send email to > nosql-discussion+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/nosql-discussion?hl=en.
So, just to complete the last post, let me give you a little bit more
context on our document model:
We have the visitor entity:
- name, telephone, email, country, etc
Sessions (each visitor has a lot of sessions, with information like
the OS, browser, creation_timestamp, duration, ip, etc);
Events(each session has a lot of events, like clicks, mouse move, etc)
Actions(during a session, a visitor may or may not execute some
actions)
The queries should be like: Give me all the ACTIVE visitor of
countries A,B,C using Chrome as browser but not IE. Or give me all the
visitor with a session that lasted at least X minutes.
This information should fit very well in a document! We have started
to model this, but right now we're kind a stuck in what document store
system to use, just like expressed at the end of the last post.
Best Regards,
On Apr 1, 12:39 am, Itamar Syn-Hershko <ita...@code972.com> wrote:
> You left the most important details out - what kind of queries are you
> going to be performing?
> Having the data modeled correctly is 80% of the work really. Once you've
> done that, everything should more or less be a breeze. Modeling for a
> document-based DB is completely different than for RDBMSes. You need to
> think in terms of transactional boundaries and remember you can store full
> object graphs (even non flat ones). You should also consider and take use
> of other features that are provided by the DB you choose.
> In that respect - check out RavenDB - easy deploy, including master-master
> replication capabilities with the ability to automatically resolve write
> conflicts between nodes. I think you'll find the indexes and querying
> mechanism of Raven quite fluent and elastic.
> Itamar.
> On Sat, Mar 31, 2012 at 3:01 AM, Edgar Veiga <edgarmve...@gmail.com> wrote:
> > Hello all!
> > I'm dropping here this message in order to get some ideas/opinions on
> > a migration from a typical relational solution to a NoSQL one.
> > First, a little context on the situation:
> > - Our database has experienced a fast grown in the past few months,
> > and at the present moment we are achieving very fast a 1.5TB of
> > information with a tendency to grow event faster until the end of 2012
> > (triplicate this value).
> > - We are sincerely considering the hypothesis of migrating our
> > persistency system to a NoSQL one, since that MySQL is becoming the
> > bottleneck of our application (We replicated it, sharded it,
> > partitioned it, but still it won't feedback in an acceptable time).
> > - We already started the process of layering our application in
> > different services, still, we want to give a chance to NoSQL as the
> > persistency system and don't stick "ad eternum" to the relational
> > paradigm.
> > Requisits:
> > - The NoSQL flavour should be oriented to Document Store: We have the
> > visitor entity with a lot of attributes, sessions related to a
> > visitor, and actions perfomed by those visitors during a session. We
> > think that a document that represents this information would fit
> > quietly well!
> > - We write a lot of of records per second (sessions and actions).
> > - We read big big sets of data(i.e. visitor entire history).
> > - On the top of this, we still need to deliver accurate and on demand
> > (real time) analytics related to visitors and actions.
> > Hardware:
> > - We will have 4 dedicated servers (for persistency) to this implement
> > this new architecture, each one with 96Gs of RAM, SSD disks and 2
> > quadcores 2.4Ghz. (not quiet sure on this but it will be very similar)
> > Now the question is:
> > We have been reading a lot about couchbase and mongodb:
> > - The first one gives us an easier deploy, with master to master
> > replication, auto sharding and balancing but without a convenient way
> > of making queries in order to retrieve less and concrete data.
> > - The second one has master-slave replication, a more difficult deploy
> > and maintenance process, but a query/fetch system a little more
> > elastic than couchbase.
> > With your experience, can you give us some topics/ideas/opinions on
> > how to start this implementation?
> > Best Regards!
> > --
> > You received this message because you are subscribed to the Google Groups
> > "NOSQL" group.
> > To post to this group, send email to nosql-discussion@googlegroups.com.
> > To unsubscribe from this group, send email to
> > nosql-discussion+unsubscribe@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/nosql-discussion?hl=en.
So you are basically asking questions about the session - you can probably go ahead and have 2 types of documents: Visitor and Session, where the session document will contain all Event and Action objects. With smart use of indexes you can have all queries answered.
On Mon, Apr 2, 2012 at 10:33 PM, Edgar Veiga <edgarmve...@gmail.com> wrote: > Hi Itamar,
> First of all, thanks for your response!
> So, just to complete the last post, let me give you a little bit more > context on our document model: > We have the visitor entity: > - name, telephone, email, country, etc
> Sessions (each visitor has a lot of sessions, with information like > the OS, browser, creation_timestamp, duration, ip, etc); > Events(each session has a lot of events, like clicks, mouse move, etc) > Actions(during a session, a visitor may or may not execute some > actions)
> The queries should be like: Give me all the ACTIVE visitor of > countries A,B,C using Chrome as browser but not IE. Or give me all the > visitor with a session that lasted at least X minutes.
> This information should fit very well in a document! We have started > to model this, but right now we're kind a stuck in what document store > system to use, just like expressed at the end of the last post.
> Best Regards, > On Apr 1, 12:39 am, Itamar Syn-Hershko <ita...@code972.com> wrote: > > You left the most important details out - what kind of queries are you > > going to be performing?
> > Having the data modeled correctly is 80% of the work really. Once you've > > done that, everything should more or less be a breeze. Modeling for a > > document-based DB is completely different than for RDBMSes. You need to > > think in terms of transactional boundaries and remember you can store > full > > object graphs (even non flat ones). You should also consider and take use > > of other features that are provided by the DB you choose.
> > In that respect - check out RavenDB - easy deploy, including > master-master > > replication capabilities with the ability to automatically resolve write > > conflicts between nodes. I think you'll find the indexes and querying > > mechanism of Raven quite fluent and elastic.
> > Itamar.
> > On Sat, Mar 31, 2012 at 3:01 AM, Edgar Veiga <edgarmve...@gmail.com> > wrote: > > > Hello all!
> > > I'm dropping here this message in order to get some ideas/opinions on > > > a migration from a typical relational solution to a NoSQL one.
> > > First, a little context on the situation: > > > - Our database has experienced a fast grown in the past few months, > > > and at the present moment we are achieving very fast a 1.5TB of > > > information with a tendency to grow event faster until the end of 2012 > > > (triplicate this value). > > > - We are sincerely considering the hypothesis of migrating our > > > persistency system to a NoSQL one, since that MySQL is becoming the > > > bottleneck of our application (We replicated it, sharded it, > > > partitioned it, but still it won't feedback in an acceptable time). > > > - We already started the process of layering our application in > > > different services, still, we want to give a chance to NoSQL as the > > > persistency system and don't stick "ad eternum" to the relational > > > paradigm.
> > > Requisits: > > > - The NoSQL flavour should be oriented to Document Store: We have the > > > visitor entity with a lot of attributes, sessions related to a > > > visitor, and actions perfomed by those visitors during a session. We > > > think that a document that represents this information would fit > > > quietly well! > > > - We write a lot of of records per second (sessions and actions). > > > - We read big big sets of data(i.e. visitor entire history). > > > - On the top of this, we still need to deliver accurate and on demand > > > (real time) analytics related to visitors and actions.
> > > Hardware: > > > - We will have 4 dedicated servers (for persistency) to this implement > > > this new architecture, each one with 96Gs of RAM, SSD disks and 2 > > > quadcores 2.4Ghz. (not quiet sure on this but it will be very similar)
> > > Now the question is: > > > We have been reading a lot about couchbase and mongodb: > > > - The first one gives us an easier deploy, with master to master > > > replication, auto sharding and balancing but without a convenient way > > > of making queries in order to retrieve less and concrete data. > > > - The second one has master-slave replication, a more difficult deploy > > > and maintenance process, but a query/fetch system a little more > > > elastic than couchbase.
> > > With your experience, can you give us some topics/ideas/opinions on > > > how to start this implementation?
> > > Best Regards!
> > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "NOSQL" group. > > > To post to this group, send email to nosql-discussion@googlegroups.com > . > > > To unsubscribe from this group, send email to > > > nosql-discussion+unsubscribe@googlegroups.com. > > > For more options, visit this group at > > >http://groups.google.com/group/nosql-discussion?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "NOSQL" group. > To post to this group, send email to nosql-discussion@googlegroups.com. > To unsubscribe from this group, send email to > nosql-discussion+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/nosql-discussion?hl=en.
This is a very natural and straightforward fit for HyperDex (http://www.hyperdex.org). Three tables, no indices, very fast and consistent lookups coupled with very expressive queries. You get to specify the schema in a way very similar to how you'd do it in SQL. Let us know on the hyperdex-discuss mailing list if you need further help.
On Mon, Apr 2, 2012 at 3:33 PM, Edgar Veiga <edgarmve...@gmail.com> wrote: > Hi Itamar,
> First of all, thanks for your response!
> So, just to complete the last post, let me give you a little bit more > context on our document model: > We have the visitor entity: > - name, telephone, email, country, etc
> Sessions (each visitor has a lot of sessions, with information like > the OS, browser, creation_timestamp, duration, ip, etc); > Events(each session has a lot of events, like clicks, mouse move, etc) > Actions(during a session, a visitor may or may not execute some > actions)
> The queries should be like: Give me all the ACTIVE visitor of > countries A,B,C using Chrome as browser but not IE. Or give me all the > visitor with a session that lasted at least X minutes.
> This information should fit very well in a document! We have started > to model this, but right now we're kind a stuck in what document store > system to use, just like expressed at the end of the last post.
> Best Regards, > On Apr 1, 12:39 am, Itamar Syn-Hershko <ita...@code972.com> wrote: >> You left the most important details out - what kind of queries are you >> going to be performing?
>> Having the data modeled correctly is 80% of the work really. Once you've >> done that, everything should more or less be a breeze. Modeling for a >> document-based DB is completely different than for RDBMSes. You need to >> think in terms of transactional boundaries and remember you can store full >> object graphs (even non flat ones). You should also consider and take use >> of other features that are provided by the DB you choose.
>> In that respect - check out RavenDB - easy deploy, including master-master >> replication capabilities with the ability to automatically resolve write >> conflicts between nodes. I think you'll find the indexes and querying >> mechanism of Raven quite fluent and elastic.
>> Itamar.
>> On Sat, Mar 31, 2012 at 3:01 AM, Edgar Veiga <edgarmve...@gmail.com> wrote: >> > Hello all!
>> > I'm dropping here this message in order to get some ideas/opinions on >> > a migration from a typical relational solution to a NoSQL one.
>> > First, a little context on the situation: >> > - Our database has experienced a fast grown in the past few months, >> > and at the present moment we are achieving very fast a 1.5TB of >> > information with a tendency to grow event faster until the end of 2012 >> > (triplicate this value). >> > - We are sincerely considering the hypothesis of migrating our >> > persistency system to a NoSQL one, since that MySQL is becoming the >> > bottleneck of our application (We replicated it, sharded it, >> > partitioned it, but still it won't feedback in an acceptable time). >> > - We already started the process of layering our application in >> > different services, still, we want to give a chance to NoSQL as the >> > persistency system and don't stick "ad eternum" to the relational >> > paradigm.
>> > Requisits: >> > - The NoSQL flavour should be oriented to Document Store: We have the >> > visitor entity with a lot of attributes, sessions related to a >> > visitor, and actions perfomed by those visitors during a session. We >> > think that a document that represents this information would fit >> > quietly well! >> > - We write a lot of of records per second (sessions and actions). >> > - We read big big sets of data(i.e. visitor entire history). >> > - On the top of this, we still need to deliver accurate and on demand >> > (real time) analytics related to visitors and actions.
>> > Hardware: >> > - We will have 4 dedicated servers (for persistency) to this implement >> > this new architecture, each one with 96Gs of RAM, SSD disks and 2 >> > quadcores 2.4Ghz. (not quiet sure on this but it will be very similar)
>> > Now the question is: >> > We have been reading a lot about couchbase and mongodb: >> > - The first one gives us an easier deploy, with master to master >> > replication, auto sharding and balancing but without a convenient way >> > of making queries in order to retrieve less and concrete data. >> > - The second one has master-slave replication, a more difficult deploy >> > and maintenance process, but a query/fetch system a little more >> > elastic than couchbase.
>> > With your experience, can you give us some topics/ideas/opinions on >> > how to start this implementation?
>> > Best Regards!
>> > -- >> > You received this message because you are subscribed to the Google Groups >> > "NOSQL" group. >> > To post to this group, send email to nosql-discussion@googlegroups.com. >> > To unsubscribe from this group, send email to >> > nosql-discussion+unsubscribe@googlegroups.com. >> > For more options, visit this group at >> >http://groups.google.com/group/nosql-discussion?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "NOSQL" group. > To post to this group, send email to nosql-discussion@googlegroups.com. > To unsubscribe from this group, send email to nosql-discussion+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/nosql-discussion?hl=en.
On Wednesday, April 4, 2012 10:22:35 AM UTC+1, Itamar Syn-Hershko wrote:
> So you are basically asking questions about the session - you can probably > go ahead and have 2 types of documents: Visitor and Session, where the > session document will contain all Event and Action objects. With smart use > of indexes you can have all queries answered.
On Wednesday, April 4, 2012 2:10:13 PM UTC+1, Emin Gun Sirer wrote:
> Hi Edgar,
> This is a very natural and straightforward fit for HyperDex > (http://www.hyperdex.org). Three tables, no indices, very fast and > consistent lookups coupled with very expressive queries. You get to > specify the schema in a way very similar to how you'd do it in SQL. > Let us know on the hyperdex-discuss mailing list if you need further > help.
Yes. You can run the server on a Linux box (using Mono), but the recommended storage engine for production is Esent, which is not available there. There are plans to support a production-ready storage engine on Linux as well, but that might take a while.
Try looking at hosted solutions like CloudBird and RavenHQ, your python client could talk with them just the same
On Wed, Apr 4, 2012 at 6:33 PM, Edgar Veiga <edgarmve...@gmail.com> wrote: > I already knew RavenDB because I tend to follow Ayende blog!
> Is it RavenDB stable enough to run under a linux box with a python client?
> Best Regards
> On Wednesday, April 4, 2012 10:22:35 AM UTC+1, Itamar Syn-Hershko wrote:
>> So you are basically asking questions about the session - you can >> probably go ahead and have 2 types of documents: Visitor and Session, where >> the session document will contain all Event and Action objects. With smart >> use of indexes you can have all queries answered.
> To post to this group, send email to nosql-discussion@googlegroups.com. > To unsubscribe from this group, send email to > nosql-discussion+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/nosql-discussion?hl=en.