Hi,
Just wanted recommendations from people on a task I'm trying to
achieve. Any help would be greatly appreciated.
I have about 500M log file entries each representing an "ad
impression" (we are an advertising company). Each "hit" has about 50
attributes to it (example: Country, State, City, Adsize, Browser, OS,
etc) .. I want to load all 500M into some form of database and then
run queries against this set. [PS: For types of queries we would
issue, you can checkout: http://chitika.com/research/ ]
Question for you: Which one of the following would you suggest?
1) Load 500M rows in Mysql with 50 columns -- and index each column
[Sounds lame!]
2) Load 500M rows as documents in mongo and index each column that
will be used in a WHERE clause (basically all 50 columns)
3) Load 500M rows into Hadoop -- and then run Map/Reduce on the
whole set.
PS: The 500M is a starting set. The actual number of rows will go
into the billions.
Mongo can probably handle it internally with indexes, but we're also working on an official hadoop plugin, so you can always use mongo as the storage layer, and use hadoop for processing.
On Fri, Aug 27, 2010 at 1:07 AM, Alden DoRosario (Chitika)
<adorosa...@gmail.com> wrote: > Hi, > Just wanted recommendations from people on a task I'm trying to > achieve. Any help would be greatly appreciated.
> I have about 500M log file entries each representing an "ad > impression" (we are an advertising company). Each "hit" has about 50 > attributes to it (example: Country, State, City, Adsize, Browser, OS, > etc) .. I want to load all 500M into some form of database and then > run queries against this set. [PS: For types of queries we would > issue, you can checkout: http://chitika.com/research/ ]
> Question for you: Which one of the following would you suggest? > 1) Load 500M rows in Mysql with 50 columns -- and index each column > [Sounds lame!] > 2) Load 500M rows as documents in mongo and index each column that > will be used in a WHERE clause (basically all 50 columns) > 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the > whole set.
> PS: The 500M is a starting set. The actual number of rows will go > into the billions.
> - Alden.
> -- > You received this message because you are subscribed to the Google Groups "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
Sounds like you also have to deal with the storage layer as a separate
problem if you are going to have billions of records. You probably
want something that can scale simply. Mongo may not be an ideal fit
because of your constantly increasing data. You may want something
where you can just add notes to increase storage very simply without
manually partitioning your data. I believe you have to manually
partition with Mongo.
Hadoop is not something you can store rows in from what I understand,
it consists of a file system and a map reduce engine. HBase is the
database on top of Hadoop.
Ankur
On Aug 27, 1:07 am, "Alden DoRosario (Chitika)" <adorosa...@gmail.com>
wrote:
> Hi,
> Just wanted recommendations from people on a task I'm trying to
> achieve. Any help would be greatly appreciated.
> I have about 500M log file entries each representing an "ad
> impression" (we are an advertising company). Each "hit" has about 50
> attributes to it (example: Country, State, City, Adsize, Browser, OS,
> etc) .. I want to load all 500M into some form of database and then
> run queries against this set. [PS: For types of queries we would
> issue, you can checkout:http://chitika.com/research/]
> Question for you: Which one of the following would you suggest?
> 1) Load 500M rows in Mysql with 50 columns -- and index each column
> [Sounds lame!]
> 2) Load 500M rows as documents in mongo and index each column that
> will be used in a WHERE clause (basically all 50 columns)
> 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the
> whole set.
> PS: The 500M is a starting set. The actual number of rows will go
> into the billions.
On Fri, Aug 27, 2010 at 12:13 PM, Ankur <ankursethi...@googlemail.com> wrote: > Sounds like you also have to deal with the storage layer as a separate > problem if you are going to have billions of records. You probably > want something that can scale simply. Mongo may not be an ideal fit > because of your constantly increasing data. You may want something > where you can just add notes to increase storage very simply without > manually partitioning your data. I believe you have to manually > partition with Mongo.
Version 1.6 of MongoDB adds production ready support for auto-sharding (read auto-partitioning). So no, you don't need to manually partition, and adding new shards is easy.
> Hadoop is not something you can store rows in from what I understand, > it consists of a file system and a map reduce engine. HBase is the > database on top of Hadoop.
> Ankur
> On Aug 27, 1:07 am, "Alden DoRosario (Chitika)" <adorosa...@gmail.com> > wrote: >> Hi, >> Just wanted recommendations from people on a task I'm trying to >> achieve. Any help would be greatly appreciated.
>> I have about 500M log file entries each representing an "ad >> impression" (we are an advertising company). Each "hit" has about 50 >> attributes to it (example: Country, State, City, Adsize, Browser, OS, >> etc) .. I want to load all 500M into some form of database and then >> run queries against this set. [PS: For types of queries we would >> issue, you can checkout:http://chitika.com/research/]
>> Question for you: Which one of the following would you suggest? >> 1) Load 500M rows in Mysql with 50 columns -- and index each column >> [Sounds lame!] >> 2) Load 500M rows as documents in mongo and index each column that >> will be used in a WHERE clause (basically all 50 columns) >> 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the >> whole set.
>> PS: The 500M is a starting set. The actual number of rows will go >> into the billions.
>> - Alden.
> -- > You received this message because you are subscribed to the Google Groups "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
> Sounds like you also have to deal with the storage layer as a separate > problem if you are going to have billions of records. You probably > want something that can scale simply. Mongo may not be an ideal fit > because of your constantly increasing data. You may want something > where you can just add notes to increase storage very simply without > manually partitioning your data. I believe you have to manually > partition with Mongo.
> Hadoop is not something you can store rows in from what I understand, > it consists of a file system and a map reduce engine. HBase is the > database on top of Hadoop.
> Ankur
> On Aug 27, 1:07 am, "Alden DoRosario (Chitika)" <adorosa...@gmail.com> > wrote: >> Hi, >> Just wanted recommendations from people on a task I'm trying to >> achieve. Any help would be greatly appreciated.
>> I have about 500M log file entries each representing an "ad >> impression" (we are an advertising company). Each "hit" has about 50 >> attributes to it (example: Country, State, City, Adsize, Browser, OS, >> etc) .. I want to load all 500M into some form of database and then >> run queries against this set. [PS: For types of queries we would >> issue, you can checkout:http://chitika.com/research/]
>> Question for you: Which one of the following would you suggest? >> 1) Load 500M rows in Mysql with 50 columns -- and index each column >> [Sounds lame!] >> 2) Load 500M rows as documents in mongo and index each column that >> will be used in a WHERE clause (basically all 50 columns) >> 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the >> whole set.
>> PS: The 500M is a starting set. The actual number of rows will go >> into the billions.
>> - Alden.
> -- > You received this message because you are subscribed to the Google Groups "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
As I know, Mongo's JavaScript-based MapReduce is singlethreaded. So
with large dataset it may be a bottleneck here. But you can write your
own MapReduce application using Mongo as database.
Can you provide an example of query?
On Aug 27, 9:07 am, "Alden DoRosario (Chitika)" <adorosa...@gmail.com>
wrote:
> Hi,
> Just wanted recommendations from people on a task I'm trying to
> achieve. Any help would be greatly appreciated.
> I have about 500M log file entries each representing an "ad
> impression" (we are an advertising company). Each "hit" has about 50
> attributes to it (example: Country, State, City, Adsize, Browser, OS,
> etc) .. I want to load all 500M into some form of database and then
> run queries against this set. [PS: For types of queries we would
> issue, you can checkout:http://chitika.com/research/]
> Question for you: Which one of the following would you suggest?
> 1) Load 500M rows in Mysql with 50 columns -- and index each column
> [Sounds lame!]
> 2) Load 500M rows as documents in mongo and index each column that
> will be used in a WHERE clause (basically all 50 columns)
> 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the
> whole set.
> PS: The 500M is a starting set. The actual number of rows will go
> into the billions.
I presume by "hadoop plugin" that means you will be releasing a Mongo InputReader, InputSplit, etc? That would be really nice. Is there a targeted version of MongoDB when this Hadoop interface will be ready? Thanks.
On Fri, Aug 27, 2010 at 6:29 AM, Eliot Horowitz <eliothorow...@gmail.com> wrote: > Mongo can probably handle it internally with indexes, but we're also > working on an official hadoop plugin, so you can always use mongo as > the storage layer, and use hadoop for processing.
> On Fri, Aug 27, 2010 at 1:07 AM, Alden DoRosario (Chitika) > <adorosa...@gmail.com> wrote: >> Hi, >> Just wanted recommendations from people on a task I'm trying to >> achieve. Any help would be greatly appreciated.
>> I have about 500M log file entries each representing an "ad >> impression" (we are an advertising company). Each "hit" has about 50 >> attributes to it (example: Country, State, City, Adsize, Browser, OS, >> etc) .. I want to load all 500M into some form of database and then >> run queries against this set. [PS: For types of queries we would >> issue, you can checkout: http://chitika.com/research/ ]
>> Question for you: Which one of the following would you suggest? >> 1) Load 500M rows in Mysql with 50 columns -- and index each column >> [Sounds lame!] >> 2) Load 500M rows as documents in mongo and index each column that >> will be used in a WHERE clause (basically all 50 columns) >> 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the >> whole set.
>> PS: The 500M is a starting set. The actual number of rows will go >> into the billions.
>> - Alden.
>> -- >> You received this message because you are subscribed to the Google Groups "mongodb-user" group. >> To post to this group, send email to mongodb-user@googlegroups.com. >> To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. >> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
> I presume by "hadoop plugin" that means you will be releasing a Mongo > InputReader, InputSplit, etc? That would be really nice. Is there a > targeted version of MongoDB when this Hadoop interface will be ready? > Thanks.
> On Fri, Aug 27, 2010 at 6:29 AM, Eliot Horowitz <eliothorow...@gmail.com> > wrote: > > Mongo can probably handle it internally with indexes, but we're also > > working on an official hadoop plugin, so you can always use mongo as > > the storage layer, and use hadoop for processing.
> > On Fri, Aug 27, 2010 at 1:07 AM, Alden DoRosario (Chitika) > > <adorosa...@gmail.com> wrote: > >> Hi, > >> Just wanted recommendations from people on a task I'm trying to > >> achieve. Any help would be greatly appreciated.
> >> I have about 500M log file entries each representing an "ad > >> impression" (we are an advertising company). Each "hit" has about 50 > >> attributes to it (example: Country, State, City, Adsize, Browser, OS, > >> etc) .. I want to load all 500M into some form of database and then > >> run queries against this set. [PS: For types of queries we would > >> issue, you can checkout: http://chitika.com/research/ ]
> >> Question for you: Which one of the following would you suggest? > >> 1) Load 500M rows in Mysql with 50 columns -- and index each column > >> [Sounds lame!] > >> 2) Load 500M rows as documents in mongo and index each column that > >> will be used in a WHERE clause (basically all 50 columns) > >> 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the > >> whole set.
> >> PS: The 500M is a starting set. The actual number of rows will go > >> into the billions.
> >> - Alden.
> >> -- > >> You received this message because you are subscribed to the Google > Groups "mongodb-user" group. > >> To post to this group, send email to mongodb-user@googlegroups.com. > >> To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com> > . > >> For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
> > -- > > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > > To post to this group, send email to mongodb-user@googlegroups.com. > > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com> > . > > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com> > . > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
> As I know, Mongo's JavaScript-based MapReduce is singlethreaded. So
> with large dataset it may be a bottleneck here. But you can write your
> own MapReduce application using Mongo as database.
> Can you provide an example of query?
> On Aug 27, 9:07 am, "Alden DoRosario (Chitika)" <adorosa...@gmail.com>
> wrote:
> > Hi,
> > Just wanted recommendations from people on a task I'm trying to
> > achieve. Any help would be greatly appreciated.
> > I have about 500M log file entries each representing an "ad
> > impression" (we are an advertising company). Each "hit" has about 50
> > attributes to it (example: Country, State, City, Adsize, Browser, OS,
> > etc) .. I want to load all 500M into some form of database and then
> > run queries against this set. [PS: For types of queries we would
> > issue, you can checkout:http://chitika.com/research/]
> > Question for you: Which one of the following would you suggest?
> > 1) Load 500M rows in Mysql with 50 columns -- and index each column
> > [Sounds lame!]
> > 2) Load 500M rows as documents in mongo and index each column that
> > will be used in a WHERE clause (basically all 50 columns)
> > 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the
> > whole set.
> > PS: The 500M is a starting set. The actual number of rows will go
> > into the billions.
I think that actually shows where mongo + hadoop would be a great fit.
- Store data in mongo - use mongo as the input into hadoop, using mongo's index for CC="US" && ISMOBILE=1 - use hadoop to do the aggregration - store result in mongo
On Fri, Aug 27, 2010 at 3:44 PM, Alden DoRosario (Chitika) <
> > Can you provide an example of query? > SELECT OS, SUM(IMPS), SUM(CLICKS) FROM TABLE WHERE CC = 'US' AND > ISMOBILE = 1 GROUP BY OS; (this would be the query used for: > Android Users 80% More Valuable Than iPhone Users -->
> On Aug 27, 12:30 pm, Nikolay <khit...@gmail.com> wrote: > > As I know, Mongo's JavaScript-based MapReduce is singlethreaded. So > > with large dataset it may be a bottleneck here. But you can write your > > own MapReduce application using Mongo as database.
> > Can you provide an example of query?
> > On Aug 27, 9:07 am, "Alden DoRosario (Chitika)" <adorosa...@gmail.com> > > wrote:
> > > Hi, > > > Just wanted recommendations from people on a task I'm trying to > > > achieve. Any help would be greatly appreciated.
> > > I have about 500M log file entries each representing an "ad > > > impression" (we are an advertising company). Each "hit" has about 50 > > > attributes to it (example: Country, State, City, Adsize, Browser, OS, > > > etc) .. I want to load all 500M into some form of database and then > > > run queries against this set. [PS: For types of queries we would > > > issue, you can checkout:http://chitika.com/research/]
> > > Question for you: Which one of the following would you suggest? > > > 1) Load 500M rows in Mysql with 50 columns -- and index each column > > > [Sounds lame!] > > > 2) Load 500M rows as documents in mongo and index each column that > > > will be used in a WHERE clause (basically all 50 columns) > > > 3) Load 500M rows into Hadoop -- and then run Map/Reduce on the > > > whole set.
> > > PS: The 500M is a starting set. The actual number of rows will go > > > into the billions.
> > > - Alden.
> -- > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com> > . > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote: > On Fri, Aug 27, 2010 at 12:13 PM, Ankur <ankursethi...@googlemail.com> wrote: > > Sounds like you also have to deal with the storage layer as a separate > > problem if you are going to have billions of records. You probably > > want something that can scale simply. Mongo may not be an ideal fit > > because of your constantly increasing data. You may want something > > where you can just add notes to increase storage very simply without > > manually partitioning your data. I believe you have to manually > > partition with Mongo.
> Version 1.6 of MongoDB adds production ready support for auto-sharding > (read auto-partitioning). So no, you don't need to manually partition, > and adding new shards is easy.
> On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote: > > On Fri, Aug 27, 2010 at 12:13 PM, Ankur <ankursethi...@googlemail.com> > wrote: > > > Sounds like you also have to deal with the storage layer as a separate > > > problem if you are going to have billions of records. You probably > > > want something that can scale simply. Mongo may not be an ideal fit > > > because of your constantly increasing data. You may want something > > > where you can just add notes to increase storage very simply without > > > manually partitioning your data. I believe you have to manually > > > partition with Mongo.
> > Version 1.6 of MongoDB adds production ready support for auto-sharding > > (read auto-partitioning). So no, you don't need to manually partition, > > and adding new shards is easy.
> Tested for the said billions of documents?
> Bill
> -- > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com> > . > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
OK, so I have an ad-network background and can probably hit these
points here. (and I'm talking about experience with > 500M records /
day)
The MySQL method (#1), will suck for a variety of reasons. The biggest
issue you're going to face is that MySQL doesn't really scale
horizontally. You can't put 500 M records on 10 servers and then get a
reasonable roll-up. What you end up having to do is bulk inserts and
roll-ups on each server. You then pass those roll-ups on to another
server which then summarizes the data. (sound familiar?)
Of course, once you're designed this whole process of "bulk insert ->
summarize -> export -> bulk insert -> summarize", you come to this
great realization... you've built a map-reduce engine on top of MySQL
(yay!).
So now you get options #2 & #3. Hadoop will definitely work for this
problem. I know of working installations that not only do stats, but
also do targeting based using Hadoop. So this is indeed a known usage
for Hadoop. Of course, if you want to query the result, you'll have to
dump those output files somewhere (typically in to an RDBMS)
Where does Mongo fall in this?
That's a little more complex. Well, with the release of auto-sharding,
it's definitely looking like MongoDB can handle this type of
functionality. The fact that map-reduce is "single-threaded" can be
limiting in terms of performance (or you can just use more smaller
servers). However, the big win that Mongo gives you is data
flexibility.
You say that you have 50 fields, but are they always present?
(probably not)
Do they change? (probably with every client)
Do you query against all of them all of the time? (probably not)
So the benefit Mongo has is that it can accept all of your data, run
your map-reduces and the output is actually a database into which you
can query some more (holy smokes!). Or you can mongoexport the reduced
data to CSV and quickly drop it into MySQL for the finishing touches
on your reports. Mongo doesn't need a "data dictionary" or all kinds
of Metadata to parse your input files.
Oh and it's relatively easy to configure.
Picking MongoDB over other similar tools is really about learning
curve and ramp-up time. MongoDB is relatively simple compared to many
of the other tools out there (like Hadoop). Give me sudo, ssh and 10
boxes and we can have an auto-partitioned cluster available in
probably 30 minutes. Mongo can clock in at tens of thousands writes /
sec (especially across shards, even on "commodity hardware"). So
you're talking about an hour or two to insert 50M records and start
running sample roll-ups.
So for the cost of about a day, you can actually test if MongoDB is
right for you. That means a lot me :)
On Aug 29, 10:02 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Yes - there are people with many billion documents.
> On Fri, Aug 27, 2010 at 4:08 PM, Bill de hÓra <b...@dehora.net> wrote:
> > On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote:
> > > On Fri, Aug 27, 2010 at 12:13 PM, Ankur <ankursethi...@googlemail.com>
> > wrote:
> > > > Sounds like you also have to deal with the storage layer as a separate
> > > > problem if you are going to have billions of records. You probably
> > > > want something that can scale simply. Mongo may not be an ideal fit
> > > > because of your constantly increasing data. You may want something
> > > > where you can just add notes to increase storage very simply without
> > > > manually partitioning your data. I believe you have to manually
> > > > partition with Mongo.
> > > Version 1.6 of MongoDB adds production ready support for auto-sharding
> > > (read auto-partitioning). So no, you don't need to manually partition,
> > > and adding new shards is easy.
> > Tested for the said billions of documents?
> > Bill
> > --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongodb-user@googlegroups.com.
> > To unsubscribe from this group, send email to
> > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > .
> > For more options, visit this group at
> >http://groups.google.com/group/mongodb-user?hl=en.
> OK, so I have an ad-network background and can probably hit these
> points here. (and I'm talking about experience with > 500M records /
> day)
> The MySQL method (#1), will suck for a variety of reasons. The biggest
> issue you're going to face is that MySQL doesn't really scale
> horizontally. You can't put 500 M records on 10 servers and then get a
> reasonable roll-up. What you end up having to do is bulk inserts and
> roll-ups on each server. You then pass those roll-ups on to another
> server which then summarizes the data. (sound familiar?)
> Of course, once you're designed this whole process of "bulk insert ->
> summarize -> export -> bulk insert -> summarize", you come to this
> great realization... you've built a map-reduce engine on top of MySQL
> (yay!).
> So now you get options #2 & #3. Hadoop will definitely work for this
> problem. I know of working installations that not only do stats, but
> also do targeting based using Hadoop. So this is indeed a known usage
> for Hadoop. Of course, if you want to query the result, you'll have to
> dump those output files somewhere (typically in to an RDBMS)
> Where does Mongo fall in this?
> That's a little more complex. Well, with the release of auto-sharding,
> it's definitely looking like MongoDB can handle this type of
> functionality. The fact that map-reduce is "single-threaded" can be
> limiting in terms of performance (or you can just use more smaller
> servers). However, the big win that Mongo gives you is data
> flexibility.
> You say that you have 50 fields, but are they always present?
> (probably not)
> Do they change? (probably with every client)
> Do you query against all of them all of the time? (probably not)
> So the benefit Mongo has is that it can accept all of your data, run
> your map-reduces and the output is actually a database into which you
> can query some more (holy smokes!). Or you can mongoexport the reduced
> data to CSV and quickly drop it into MySQL for the finishing touches
> on your reports. Mongo doesn't need a "data dictionary" or all kinds
> of Metadata to parse your input files.
> Oh and it's relatively easy to configure.
> Picking MongoDB over other similar tools is really about learning
> curve and ramp-up time. MongoDB is relatively simple compared to many
> of the other tools out there (like Hadoop). Give me sudo, ssh and 10
> boxes and we can have an auto-partitioned cluster available in
> probably 30 minutes. Mongo can clock in at tens of thousands writes /
> sec (especially across shards, even on "commodity hardware"). So
> you're talking about an hour or two to insert 50M records and start
> running sample roll-ups.
> So for the cost of about a day, you can actually test if MongoDB is
> right for you. That means a lot me :)
> On Aug 29, 10:02 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> > Yes - there are people with many billion documents.
> > On Fri, Aug 27, 2010 at 4:08 PM, Bill de hÓra <b...@dehora.net> wrote:
> > > On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote:
> > > > On Fri, Aug 27, 2010 at 12:13 PM, Ankur <ankursethi...@googlemail.com>
> > > wrote:
> > > > > Sounds like you also have to deal with the storage layer as a separate
> > > > > problem if you are going to have billions of records. You probably
> > > > > want something that can scale simply. Mongo may not be an ideal fit
> > > > > because of your constantly increasing data. You may want something
> > > > > where you can just add notes to increase storage very simply without
> > > > > manually partitioning your data. I believe you have to manually
> > > > > partition with Mongo.
> > > > Version 1.6 of MongoDB adds production ready support for auto-sharding
> > > > (read auto-partitioning). So no, you don't need to manually partition,
> > > > and adding new shards is easy.
> > > Tested for the said billions of documents?
> > > Bill
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > > "mongodb-user" group.
> > > To post to this group, send email to mongodb-user@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com>
> > > .
> > > For more options, visit this group at
> > >http://groups.google.com/group/mongodb-user?hl=en.
> On Fri, Aug 27, 2010 at 4:08 PM, Bill de hÓra <b...@dehora.net> wrote: > On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote: > > On Fri, Aug 27, 2010 at 12:13 PM, Ankur > <ankursethi...@googlemail.com> wrote: > > > Sounds like you also have to deal with the storage layer > as a separate > > > problem if you are going to have billions of records. You > probably > > > want something that can scale simply. Mongo may not be an > ideal fit > > > because of your constantly increasing data. You may want > something > > > where you can just add notes to increase storage very > simply without > > > manually partitioning your data. I believe you have to > manually > > > partition with Mongo.
> > Version 1.6 of MongoDB adds production ready support for > auto-sharding > > (read auto-partitioning). So no, you don't need to manually > partition, > > and adding new shards is easy.
> Tested for the said billions of documents?
> Bill
> --
> You received this message because you are subscribed to the > Google Groups "mongodb-user" group. > To post to this group, send email to > mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user > +unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
> -- > You received this message because you are subscribed to the Google > Groups "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to mongodb-user > +unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
About the single threaded issue: would it be possible (and advisable)
to run multiple MongoDB processes on the same machine? Consider this
setup: four machines each running three MongoDB processes each,
configured so that there are four shards and each shard is replicated
three times (on separate machines). This is similar to how a Hadoop
cluster of four machines would work (assuming the standard replication
factor of three), although Hadoop decouples processing from storage,
and you're not bound to one thread per process, and so on, so it can
probably schedule jobs more efficiently.
yours,
T#
On Aug 31, 12:15 pm, Bill de hÓra <b...@dehora.net> wrote:
> On Sun, 2010-08-29 at 11:02 -0400, Eliot Horowitz wrote:
> > Yes - there are people with many billion documents.
> I know, but I was asking specifically about auto-sharding. I'll assume
> you meant auto-sharding above.
> Bill
> > On Fri, Aug 27, 2010 at 4:08 PM, Bill de hÓra <b...@dehora.net> wrote:
> > On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote:
> > > On Fri, Aug 27, 2010 at 12:13 PM, Ankur
> > <ankursethi...@googlemail.com> wrote:
> > > > Sounds like you also have to deal with the storage layer
> > as a separate
> > > > problem if you are going to have billions of records. You
> > probably
> > > > want something that can scale simply. Mongo may not be an
> > ideal fit
> > > > because of your constantly increasing data. You may want
> > something
> > > > where you can just add notes to increase storage very
> > simply without
> > > > manually partitioning your data. I believe you have to
> > manually
> > > > partition with Mongo.
> > > Version 1.6 of MongoDB adds production ready support for
> > auto-sharding
> > > (read auto-partitioning). So no, you don't need to manually
> > partition,
> > > and adding new shards is easy.
> > Tested for the said billions of documents?
> > Bill
> > --
> > You received this message because you are subscribed to the
> > Google Groups "mongodb-user" group.
> > To post to this group, send email to
> > mongodb-user@googlegroups.com.
> > To unsubscribe from this group, send email to mongodb-user
> > +unsubscribe@googlegroups.com.
> > For more options, visit this group at
> > http://groups.google.com/group/mongodb-user?hl=en.
> > --
> > You received this message because you are subscribed to the Google
> > Groups "mongodb-user" group.
> > To post to this group, send email to mongodb-user@googlegroups.com.
> > To unsubscribe from this group, send email to mongodb-user
> > +unsubscribe@googlegroups.com.
> > For more options, visit this group at
> >http://groups.google.com/group/mongodb-user?hl=en.
On Thu, Sep 2, 2010 at 3:08 AM, Theo <icon...@gmail.com> wrote: > About the single threaded issue: would it be possible (and advisable) > to run multiple MongoDB processes on the same machine? Consider this > setup: four machines each running three MongoDB processes each, > configured so that there are four shards and each shard is replicated > three times (on separate machines). This is similar to how a Hadoop > cluster of four machines would work (assuming the standard replication > factor of three), although Hadoop decouples processing from storage, > and you're not bound to one thread per process, and so on, so it can > probably schedule jobs more efficiently.
> yours, > T#
> On Aug 31, 12:15 pm, Bill de hÓra <b...@dehora.net> wrote: > > On Sun, 2010-08-29 at 11:02 -0400, Eliot Horowitz wrote: > > > Yes - there are people with many billion documents.
> > I know, but I was asking specifically about auto-sharding. I'll assume > > you meant auto-sharding above.
> > Bill
> > > On Fri, Aug 27, 2010 at 4:08 PM, Bill de hÓra <b...@dehora.net> wrote: > > > On Fri, 2010-08-27 at 12:15 -0400, Michael Dirolf wrote: > > > > On Fri, Aug 27, 2010 at 12:13 PM, Ankur > > > <ankursethi...@googlemail.com> wrote: > > > > > Sounds like you also have to deal with the storage layer > > > as a separate > > > > > problem if you are going to have billions of records. You > > > probably > > > > > want something that can scale simply. Mongo may not be an > > > ideal fit > > > > > because of your constantly increasing data. You may want > > > something > > > > > where you can just add notes to increase storage very > > > simply without > > > > > manually partitioning your data. I believe you have to > > > manually > > > > > partition with Mongo.
> > > > Version 1.6 of MongoDB adds production ready support for > > > auto-sharding > > > > (read auto-partitioning). So no, you don't need to manually > > > partition, > > > > and adding new shards is easy.
> > > Tested for the said billions of documents?
> > > Bill
> > > --
> > > You received this message because you are subscribed to the > > > Google Groups "mongodb-user" group. > > > To post to this group, send email to > > > mongodb-user@googlegroups.com. > > > To unsubscribe from this group, send email to mongodb-user > > > +unsubscribe@googlegroups.com. > > > For more options, visit this group at > > > http://groups.google.com/group/mongodb-user?hl=en.
> > > -- > > > You received this message because you are subscribed to the Google > > > Groups "mongodb-user" group. > > > To post to this group, send email to mongodb-user@googlegroups.com. > > > To unsubscribe from this group, send email to mongodb-user > > > +unsubscribe@googlegroups.com. > > > For more options, visit this group at > > >http://groups.google.com/group/mongodb-user?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com<mongodb-user%2Bunsubscribe@google groups.com> > . > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.