We tried _id (ObjectIds) as well as our preferred keys
> - Index after is better than index before hand
So far we have been trying to index while importing.
We can give that another try.
> - If you already preshard the data, turn the balancer off first
I would shut down config server and mongos for the import.
Is that what you mean?
> - You should break the import data in the same way that you preshard
Of course.
> and use mongoimport to load them up
> - Your data should be sorted by shard key if possible
OK
Biggest question: will it be worth it?
cheers,
Torsten
Why is that?
The ObjectIds should be quite different across the machines and so
hopefully fall into different chunks.
> - You can leave your config server and mongos up and do the import via
> mongos.
Confused - that's what I was doing before.
mongo1: shardsrv mongos 2*mongoimport configsrv
mongo2: shardsrv mongos 2*mongoimport configsrv
mongo3: shardsrv mongos 2*mongoimport configsrv
mongo4: shardsrv mongos 2*mongoimport
mongo5: shardsrv mongos 2*mongoimport
mongo6: shardsrv mongos 2*mongoimport
Or do you mean...
Splitting up the pre-sharded dataset across the nodes. Then turn off
balancing. But instead of using --dbpath use mongos? Wouldn't --dbpath
be faster? Wouldn't writes still get routed to other shards with
mongos?
> - To turn off balancer,
> > use config
> > db.settings.update({_id:"balancer"},{$set : {stopped:true}},
> true)
Ah ... OK.
cheers,
Torsten
True ... but even with our preferred sharding key [user, time] it
doesn't behave much better.
> - You can use --dbpath but you have to take mongod offline.
That's fine.
> I just recommended another way without taking down mongod. As you will
> perform mongoimport splitted by shard key, mongos should route
> requests to one server per mongoimport.
But doesn't that depend on what chunks are configured in the config server?
> - Do you have mongostat, iostat, db.stats() during import process?
Certainly. With the current non-pre-sharded import...
- mongostat shows looong "holes" with no ops at all. I assume that's
the balancer - but not sure. numbers were much better in the beginning
of the import.
- iostat shows quite uneven activity across the nodes.
- db.stats() we are monitoring over time. the following shows the
objects graphed:
2 main options:
- try 1.7.5
- pre-split the collection into a lot of chunks, let the balancer
move them around, then insert.
this will prevent migrates.
I would not mess with --dbpath or turning off the balancer, that's
much more complicate than you need to do.
> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
1.6.5
> You should shard on user,time as you want to do.
_id was just for testing whether that improves things.
> The speed is probably because of migrations.
>
> 2 main options:
> - try 1.7.5
Anything particular why? ...or just trying the latest and greatest?
> - pre-split the collection into a lot of chunks, let the balancer
> move them around, then insert.
> this will prevent migrates.
The collection data is already split up in various files:
collectionA-1.json.gz
collectionA-2.json.gz
...
We are iterating over those files and handing them to mongoimport.
How would I "let the balancer move them around" before insert?
> I would not mess with --dbpath or turning off the balancer, that's
> much more complicate than you need to do.
Certainly not eager to. It would be a last resort measure.
But we were to go down that road - would it be faster?
cheers,
Torsten
Lots of sharding improvements in 1.7.x
Though you should probably wait for 1.7.6 or use the nightly.
>> - pre-split the collection into a lot of chunks, let the balancer
>> move them around, then insert.
>> this will prevent migrates.
>
> The collection data is already split up in various files:
>
> collectionA-1.json.gz
> collectionA-2.json.gz
> ...
>
> We are iterating over those files and handing them to mongoimport.
> How would I "let the balancer move them around" before insert?
That's not quite what I meant.
I meant actual split the mongo chunks up:
http://www.mongodb.org/display/DOCS/Splitting+Chunks
So you could call split 1000 times (make sure you pick the points reasonably).
Then mogno will balanced those 1000 chunks.
Once its done, start the import again.
>> I would not mess with --dbpath or turning off the balancer, that's
>> much more complicate than you need to do.
>
> Certainly not eager to. It would be a last resort measure.
> But we were to go down that road - would it be faster?
This would not be faster than doing the pre-splitting.
That's going to give you the best results.
Let's assume I start a fresh import. The db is empty.
I have 6 machines and say 600 users.
I could do 5 splits then, on 100, 200, 300, 400, 500 which would give
me 6 segments in the key space.
The balancer then would assign those evenly to my 6 shards.
And when I now import everything should get distributed evenly without
any costly migrations.
Did I get that idea right?
admin.runCommand( { split : obj.shardcollection , middle : { _id : key } } )
100 times to pre-split the key space into smaller chunks.
Looking at db.printShardingStatus() though there is only one big chunk
for every collection
{ "u" : { $minKey : 1 } } -->> { "u" : { $maxKey : 1 } } on :
shard0000 { "t" : 1000, "i" : 0 }
and all key are pointing to a single shard.
What am I missing?
(And what's the { "t" : 1000, "i" : 0 } btw)
cheers,
Torsten
shard0001 { "t" : 34000, "i" : 0 }
What's "t" and what's "i" ?
I thought that's what the balancer is for - to distribute the chunks
evenly across the nodes.
In 1.8 we've done some work to make the initial loading of a collection faster.
You may want to try with the 1.7 nightly to compare.
Now I am confused. Isn't that what the pre-sharding was for?
I know the keyspace and I evenly distributed it across the nodes.
> In 1.8 we've done some work to make the initial loading of a collection faster.
> You may want to try with the 1.7 nightly to compare.
How confident would you be to go into production with a 1.7 nightly at
this stage?
cheers,
Torsten
Now I'm confused :) Yes, that's what pre-sharding is for.
Was there an issue after you did that?
If you pre-split into a large number of chunks initially, were they
evenly distributed?
Long thread I know.
>> In 1.8 we've done some work to make the initial loading of a collection faster.
>> You may want to try with the 1.7 nightly to compare.
>
> How confident would you be to go into production with a 1.7 nightly at
> this stage?
>
> cheers,
> Torsten
>
Really depends on your comfort and how much testing you can do beforehand.
I certainly wouldn't just throw it into production without testing it first.
:)
I think were the confusion started for me was that after splitting I
still had to
to manually move chunks (of data that is not even imported yet)
> Was there an issue after you did that?
Still quite early in the import but so far it looks better.
> If you pre-split into a large number of chunks initially, were they
> evenly distributed?
Well, I did the distribution by splitting and assigning the chunks to
the shards - so yes :)
cheers,
Torsten
How many splits did you do?
Once there were more than 8 chunks, it should have started moving them.
Did that not happen?
How long did you wait?
Can you send the mongos logs from that period?
I created 100 splits.
> Once there were more than 8 chunks, it should have started moving them.
When exactly? After splitting ...or after I started the import?
> Did that not happen?
> How long did you wait?
Here is what I did:
I started with a fresh mongo cluster. Then did the 100 pre-splits.
Looked at the config server. Distribution of the chunks was not good.
Waited a couple of minutes (5?). Distribution of the chunks was still not good.
Then moved the chunks manually so the config server showed a good
distribution of chunks across the shards.
And then I started the import.
> Can you send the mongos logs from that period?
Sure can do ....or is the above expected behavior?
So would have taken 25 minutes to be even.
After 5 minutes, should have been 90/10 or so.
db.locks.find()
shows that the balancer is getting in the way.
Now I have turned off balancing
db.settings.update({_id:"balancer"},{$set : {stopped:true}}, true)
Is there a way of aborting current migration operations? I still see
the locks in the config db.
cheers,
Torsten
Are you sure your shard key is optimal?
What is the shard key?
What is the data distribution?
What order are you inserting in?
Pretty sure it's not optimal yet.
> What is the shard key?
Currently it's only user. It should better be user, time.
> What is the data distribution?
Not evenly enough I guess.
Just so I understand correctly: once the size of the data of all the
docs that belong to a certain key range become bigger than the defined
chunk size, the balancer will kick in and split the chunk in the
middle and transfers a half of the documents onto the most empty
shard. That correct?
> What order are you inserting in?
No particular order. We have many files. Within these files the shard
key will probably be monotonically increasing though.
>> Is there a way of aborting current migration operations? I still see
>> the locks in the config db.
Is there? I just want to confirm it's the migrations.
cheers,
Torsten