We tried _id (ObjectIds) as well as our preferred keys
> - Index after is better than index before hand
So far we have been trying to index while importing.
We can give that another try.
> - If you already preshard the data, turn the balancer off first
I would shut down config server and mongos for the import.
Is that what you mean?
> - You should break the import data in the same way that you preshard
Of course.
> and use mongoimport to load them up
> - Your data should be sorted by shard key if possible
OK
Biggest question: will it be worth it?
cheers,
Torsten
Why is that?
The ObjectIds should be quite different across the machines and so
hopefully fall into different chunks.
> - You can leave your config server and mongos up and do the import via
> mongos.
Confused - that's what I was doing before.
mongo1: shardsrv mongos 2*mongoimport configsrv
mongo2: shardsrv mongos 2*mongoimport configsrv
mongo3: shardsrv mongos 2*mongoimport configsrv
mongo4: shardsrv mongos 2*mongoimport
mongo5: shardsrv mongos 2*mongoimport
mongo6: shardsrv mongos 2*mongoimport
Or do you mean...
Splitting up the pre-sharded dataset across the nodes. Then turn off
balancing. But instead of using --dbpath use mongos? Wouldn't --dbpath
be faster? Wouldn't writes still get routed to other shards with
mongos?
> - To turn off balancer,
> > use config
> > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
> true)
Ah ... OK.
cheers,
Torsten
True ... but even with our preferred sharding key [user, time] it
doesn't behave much better.
> - You can use --dbpath but you have to take mongod offline.
That's fine.
> I just recommended another way without taking down mongod. As you will
> perform mongoimport splitted by shard key, mongos should route
> requests to one server per mongoimport.
But doesn't that depend on what chunks are configured in the config server?
> - Do you have mongostat, iostat, db.stats() during import process?
Certainly. With the current non-pre-sharded import...
- mongostat shows looong "holes" with no ops at all. I assume that's
the balancer - but not sure. numbers were much better in the beginning
of the import.
- iostat shows quite uneven activity across the nodes.
- db.stats() we are monitoring over time. the following shows the
objects graphed:
2 main options:
- try 1.7.5
- pre-split the collection into a lot of chunks, let the balancer
move them around, then insert.
this will prevent migrates.
I would not mess with --dbpath or turning off the balancer, that's
much more complicate than you need to do.
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
1.6.5
> You should shard on user,time as you want to do.
_id was just for testing whether that improves things.
> The speed is probably because of migrations.
>
> 2 main options:
> - try 1.7.5
Anything particular why? ...or just trying the latest and greatest?
> - pre-split the collection into a lot of chunks, let the balancer
> move them around, then insert.
> this will prevent migrates.
The collection data is already split up in various files:
collectionA-1.json.gz
collectionA-2.json.gz
...
We are iterating over those files and handing them to mongoimport.
How would I "let the balancer move them around" before insert?
> I would not mess with --dbpath or turning off the balancer, that's
> much more complicate than you need to do.
Certainly not eager to. It would be a last resort measure.
But we were to go down that road - would it be faster?
cheers,
Torsten
Lots of sharding improvements in 1.7.x
Though you should probably wait for 1.7.6 or use the nightly.
>> - pre-split the collection into a lot of chunks, let the balancer
>> move them around, then insert.
>> this will prevent migrates.
>
> The collection data is already split up in various files:
>
> collectionA-1.json.gz
> collectionA-2.json.gz
> ...
>
> We are iterating over those files and handing them to mongoimport.
> How would I "let the balancer move them around" before insert?
That's not quite what I meant.
I meant actual split the mongo chunks up:
http://www.mongodb.org/display/DOCS/Splitting+Chunks
So you could call split 1000 times (make sure you pick the points reasonably).
Then mogno will balanced those 1000 chunks.
Once its done, start the import again.
>> I would not mess with --dbpath or turning off the balancer, that's
>> much more complicate than you need to do.
>
> Certainly not eager to. It would be a last resort measure.
> But we were to go down that road - would it be faster?
This would not be faster than doing the pre-splitting.
That's going to give you the best results.
Let's assume I start a fresh import. The db is empty.
I have 6 machines and say 600 users.
I could do 5 splits then, on 100, 200, 300, 400, 500 which would give
me 6 segments in the key space.
The balancer then would assign those evenly to my 6 shards.
And when I now import everything should get distributed evenly without
any costly migrations.
Did I get that idea right?
admin.runCommand( { split : obj.shardcollection , middle : { _id : key } } )
100 times to pre-split the key space into smaller chunks.
Looking at db.printShardingStatus() though there is only one big chunk
for every collection
{ "u" : { $minKey : 1 } } -->> { "u" : { $maxKey : 1 } } on :
shard0000 { "t" : 1000, "i" : 0 }
and all key are pointing to a single shard.
What am I missing?
(And what's the { "t" : 1000, "i" : 0 } btw)
cheers,
Torsten
shard0001 { "t" : 34000, "i" : 0 }
What's "t" and what's "i" ?
I thought that's what the balancer is for - to distribute the chunks
evenly across the nodes.
In 1.8 we've done some work to make the initial loading of a collection faster.
You may want to try with the 1.7 nightly to compare.
Now I am confused. Isn't that what the pre-sharding was for?
I know the keyspace and I evenly distributed it across the nodes.
> In 1.8 we've done some work to make the initial loading of a collection faster.
> You may want to try with the 1.7 nightly to compare.
How confident would you be to go into production with a 1.7 nightly at
this stage?
cheers,
Torsten
Now I'm confused :) Yes, that's what pre-sharding is for.
Was there an issue after you did that?
If you pre-split into a large number of chunks initially, were they
evenly distributed?
Long thread I know.
>> In 1.8 we've done some work to make the initial loading of a collection faster.
>> You may want to try with the 1.7 nightly to compare.
>
> How confident would you be to go into production with a 1.7 nightly at
> this stage?
>
> cheers,
> Torsten
>
Really depends on your comfort and how much testing you can do beforehand.
I certainly wouldn't just throw it into production without testing it first.
:)
I think were the confusion started for me was that after splitting I
still had to
to manually move chunks (of data that is not even imported yet)
> Was there an issue after you did that?
Still quite early in the import but so far it looks better.
> If you pre-split into a large number of chunks initially, were they
> evenly distributed?
Well, I did the distribution by splitting and assigning the chunks to
the shards - so yes :)
cheers,
Torsten
How many splits did you do?
Once there were more than 8 chunks, it should have started moving them.
Did that not happen?
How long did you wait?
Can you send the mongos logs from that period?
I created 100 splits.
> Once there were more than 8 chunks, it should have started moving them.
When exactly? After splitting ...or after I started the import?
> Did that not happen?
> How long did you wait?
Here is what I did:
I started with a fresh mongo cluster. Then did the 100 pre-splits.
Looked at the config server. Distribution of the chunks was not good.
Waited a couple of minutes (5?). Distribution of the chunks was still not good.
Then moved the chunks manually so the config server showed a good
distribution of chunks across the shards.
And then I started the import.
> Can you send the mongos logs from that period?
Sure can do ....or is the above expected behavior?
So would have taken 25 minutes to be even.
After 5 minutes, should have been 90/10 or so.
db.locks.find()
shows that the balancer is getting in the way.
Now I have turned off balancing
db.settings.update({_id:"balancer"},{$set : {stopped:true}}, true)
Is there a way of aborting current migration operations? I still see
the locks in the config db.
cheers,
Torsten
Are you sure your shard key is optimal?
What is the shard key?
What is the data distribution?
What order are you inserting in?
Pretty sure it's not optimal yet.
> What is the shard key?
Currently it's only user. It should better be user, time.
> What is the data distribution?
Not evenly enough I guess.
Just so I understand correctly: once the size of the data of all the
docs that belong to a certain key range become bigger than the defined
chunk size, the balancer will kick in and split the chunk in the
middle and transfers a half of the documents onto the most empty
shard. That correct?
> What order are you inserting in?
No particular order. We have many files. Within these files the shard
key will probably be monotonically increasing though.
>> Is there a way of aborting current migration operations? I still see
>> the locks in the config db.
Is there? I just want to confirm it's the migrations.
cheers,
Torsten
As for killing migrates, you can kill the "moveChunk" command on the shard.
Not quite. I have 2 imports per machine in a 3 node cluster setup.
So a total of 6 imports. They should be working mostly on different ranges.
Disk activity graphs seem to back up that assumption.
> So a couple of things you can try.
> - pre-splitting a lot more so you don't have to split during insertion
(nod)
> - running multiple import jobs at the same time from different ends
> of the ranges
see above.
Seems like we are peaking at 40,000 inserts/s until the memory is
full. When we hit the disk it seems to go down to 12.000 inserts/s.
https://skitch.com/tcurdt/rqfw4/stats
Which probably is something we could live with. I just would like to
get to the bottom of where "mongostat" shows phases of 0 inserts/s
over several seconds. Right now I am just assuming it's the
migrations.
> As for killing migrates, you can kill the "moveChunk" command on the shard.
That a individual process? Or how do I do that?
cheers,
Torsten
- a mongostat excerpt showing the insert pauses
- the logs of all 3 nodes
- stats.tsv which is count/objects total/per collection/per shard over time
- stats.html which is a munin dashboard
If you graph the stats.tsv you can see nicely the bump when migrations
were turned off.
The pauses are still there though. And the insert speed is again
flattening after a while.
Which now could be because of an imbalance when comparing to the munin stats)
My next try would be to pre-shard into more chunks (1000) and then
leave the balancer off for the import.
But the insert pauses are still concerning.
cheers,
Torsten
1) Balancer - this should be much better in 1.7.5 so may want to test
with 1.8.0 when it comes out (or 1.8.0-rc0)
2) Hitting machine ram/index limits - are you adding indexes on the
data before you insert? Do you have mongostat from the shards, or
just mongos?
When is the 1.8.0rc0 expected to come out?
> 2) Hitting machine ram/index limits - are you adding indexes on the
> data before you insert?
Yes, this import has indexes before the insert.
The idea was that since there should be not much balancing because of
the pre-sharding, having the index during the import should not make
that much of a difference on the overall time. (Which might not have
been a good idea) Indexing while importing makes the process a bit
more predictable though.
> Do you have mongostat from the shards, or
> just mongos?
I just realized this could be the problem. It's a mongostat from one
mongos - there are 3.
But IIRC mongostat in 1.6.5 does not support a list of mongos yet.
Couple of weeks
>> 2) Hitting machine ram/index limits - are you adding indexes on the
>> data before you insert?
>
> Yes, this import has indexes before the insert.
>
> The idea was that since there should be not much balancing because of
> the pre-sharding, having the index during the import should not make
> that much of a difference on the overall time. (Which might not have
> been a good idea) Indexing while importing makes the process a bit
> more predictable though.
If the indexes or data are bigger than ram, than creating the indexes
after will be much faster.
You may want to try that.
>> Do you have mongostat from the shards, or
>> just mongos?
>
> I just realized this could be the problem. It's a mongostat from one
> mongos - there are 3.
>
> But IIRC mongostat in 1.6.5 does not support a list of mongos yet.
You can use mongostat from 1.7.5 with servers from 1.6.5
Alright ... another test just finished.
1000 splits up front and indexes last.
The import took only 6 hours. Which is fantastic! Pretty much linear
import speed.
Creating the indexes took about 4h. Still good!
But splitting up the keyspace up front (on an empty database!) took
almost 6 hours. Which is neither acceptable nor understandable. How
can moving zero data take so long? I would have expected that to be
instant.
Glad you see it that way too :)
> Do you have the log for that?
Logs are on your way.
cheers,
Torsten
Also, can you send the code that you used for pre-splitting?
On it's way via email.
> Also, can you send the code that you used for pre-splitting?
https://gist.github.com/626ff7549261d83e8855
cheers,
Torsten
sent via email
cheers,
Torsten
thanks for the long-ish reply :)
> However when you have to do a massive data load - like what you are
> doing, we have to think outside the box.
That's how the idea of the manual pre-sharding came to life.
<snip/>
> In this approach, we continuously process new/changed OLTP data to
> load it into data warehouses and such,
> without trying to read the whole transactional system from the
> begining or a high watermark point every time.
Right. We do not necessarily want to do this continuously.
But we need it for iterations on the data model and one has to test
worst case recovery scenarios.
<snip/>
> Given this, let me ask you a few questions -
> - What is your requirement to complete the data load? What are the
> driving factors for that time constraint?
Please, see below.
> - Are there any other constraints? (e.g. throughput, storage space,
> hardware/network, etc.)?
Well, I guess in the real world all those constraint apply - at least somehow :)
We currently have the following objectives
- find maximal app throughput with out current hardware
- find the fastest way to fill the cluster from our json document
...from scratch
> - Can you give some idea on your document structure?
At the stage of the initial import it's really simple and flat. Along
the lines of
{
user: 1,
a: 1,
b: 2,
c: 3,
d: 4,
}
> - You mentioned using the objectid to shard your data? Is that a
> requirement?
No, we had big problems with the data and load distribution across the
shards. So we started experimenting. Using _id for sharding was one of
the tests. We really want to shard on the userid. And the last
pre-sharded test was based on the userid.
That said: our application performance dropped from 2ms per query to
200ms per query when switching from _id to userid. Which I don't fully
understand yet. We have large indexes - too large for keeping them in
all in memory. I assume that will be related. But that's for a
different thread.
> - Do you have any other application based option for a unique way to
> identify a row?
> (in other words, do you have a user/primary/application key)
Yes, we do. Most documents can identified uniquely. There are a few
that also vary too much - we would need to use a md5 for those. We
also thought about creating a custom/non-objectid _id key for that
very reason. This means though, that quite some logic needs to moved
into the application layer. It would be great to avoid that.
> - What are the specs on your hardware?
Mid-range servers with 32GB of RAM. Need to look up the exact specs if
of interest.
> - Is there any option to reconfigure/reallocate your hardware in a
> different way if required?
WDYM? Well, I am certain we could add more shards and improve the
situation. But I fear if we cannot support this app/data combination
with a 6 machine cluster then mongo isn't the right tool.
> - How often do you need to this kind of data load?
Ideally? once ...but we're still testing various data models. So more
realistically - we are iterating. A turn around of 10h hours would be
nice.
> From what you implied in your post, it seems you have already done
> it atleast one which took 24 hours.
Indeed. The last import was faster though.
pre-sharding: 6h (which in my book should take 0h)
importing: 6h
indexing: 4-5h
If only the pre-sharding (on an empty db!) would not take that long -
I would be happy.
The import and the indexing was perfectly distributed when pre-sharded.
> So is this a one time thing or a repeated thing?
In theory it's a one time thing - but this would also need to work as
a repeated thing.
> - Once you have loaded the data how do you expect to use it?
Lots of upserts and reads. Not finally sure about the ratio yet.
> - Do you know the query pattern?
Yes, we do. We are replaying a full day of user actions as fast as
possible to find the maximum throughput we can sustain.
> - Is there any subsequent append or update of the data?
> - Is the data being loaded from one or more clients or from the Mongo
> cluster?
This is only for the app itself. This thread so far focused only on
the initial import.
> So here's what I would suggest on what I know - it can be refined once
> we know more from the answers to the above questions.
> I am assuming that you
> - Select a user-defined key from your data if possible (or generate
> one if necessary).
Done.
> - Use an appropriate hashing algorithm that is able to generate a good
> distribution. I have found MD5 to give pretty good results.
This is another option we also thought about. (see above)
> - Next pre-shard your data - since we know that the key will always be
> 32 characters with each character's domain being 0-9A-F
> One algorithm can be '0xxxxx..' through '200000000..' should go to
> shard 1 and so on. Or you can do round-robin shard - by having
> '0xxxx...' to go to shard 1, '1xxxxx....' to go to shard 2, etc.
> (where xxxx = any character - so you have to create the range
> appropriately)
That's how we did the last import.
> - I would suggest creating the auxilliary indexes upfront and not in
> the end.
What do you mean by "auxilliary indexes" ?
cheers,
Torsten
Just out of curiosity with no pre-splitting and using _id as sharding
key. No indexes up front.
While it started OK the performance is now terrible again.
insert query update delete getmore command mapped vsize
res faults locked % idx miss % netIn netOut conn repl time
ams-db007 0 0 0 0 0 1 0m 395m
9m 0 0 0 970b 771b 4 RTR 08:58:00
ams-db008 8 0 0 0 0 1 0m 371m
8m 0 0 0 942b 771b 3 RTR 08:58:00
ams-db009 0 0 0 0 0 1 0m 234m
4m 0 0 0 62b 767b 1 RTR 08:58:00
ams-db007 0 0 0 0 0 1 0m 395m
9m 0 0 0 1k 771b 4 RTR 08:58:01
ams-db008 0 0 0 0 0 1 0m 371m
7m 0 0 0 1k 771b 3 RTR 08:58:01
ams-db009 0 0 0 0 0 1 0m 234m
4m 0 0 0 62b 767b 1 RTR 08:58:01
While the above is an exceptionally poor sample, insert performance
generally hovers at around 160 insert/s - across the cluster.
Seems like also in 1.8 the balancing is just killing the performance.
I don't see the sharding improvements I was hoping for.
cheers,
Torsten
But even then it seems the impact of balancing is affecting the system
way too much.
I was hoping things had improved in that regard going from 1.6.5 to 1.8
Happy to share the munin graphs if you like.
Would love to ...but that's nothing I can just switch on, right? Needs
to happen in the app layer I presume.
>> You can see from the logs for example if migrates were happening and
>> how long they were taking if any were running.
Well, even if those are migrations... shouldn't they be as fast as
possible if the cluster is idle?
What I was wondering too - should one see migration activity in
mongostat? ...or is that activity kept out of those stats?
cheers,
Torsten
Yes, though we're going to add an option to do it in the server:
http://jira.mongodb.org/browse/SERVER-2001
> Well, even if those are migrations... shouldn't they be as fast as
> possible if the cluster is idle?
Yes, the question is what the bound is.
Did you check disk/io/network on both sides of the migrate?
> What I was wondering too - should one see migration activity in
> mongostat? ...or is that activity kept out of those stats?
You should see a lot of commands, but there is no explicit migrate
column, nor do the inserts show up.