Reindexing in the background?

304 views
Skip to first unread message

Daniel Karp

unread,
Aug 17, 2010, 2:26:31 AM8/17/10
to mongodb-user
After updating to 1.6, we ran reIndex on all of our collections, and
our total index size dropped by 50%. Since we regularly delete from
some of our collections, we'd like to be able to reindex regularly as
well.

We also noticed that indexes created with background:true, reindex in
the background as well. That led to the thought that if we could have
all indexes with background:true, we could reindex a table entirely in
the background. Unfortunately, as far as I can tell, the _id_ index
cannot be created with background:true, because it cannot be
deleted.

Is there any way to reindex in the background, possibly by changing
the _id_ index so that it indexes in the background?

Alternatively, I seem to remember some discussion suggesting that
Mongo 1.6 would handle deletes better, so as to make reindexing less
important. Is that true now?

Eliot Horowitz

unread,
Aug 17, 2010, 9:50:36 AM8/17/10
to mongod...@googlegroups.com
There is a case open for background re-indexing, though not sure when
it'll happen: http://jira.mongodb.org/browse/SERVER-787

1.6 does handle deletes better, so this may not be as much of an issue
in 1.6 as 1.4

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Mircea Pasoi

unread,
Sep 15, 2010, 11:00:03 PM9/15/10
to mongodb-user
We are in a very similar situation (we regularly delete from some of
our collections) and all our indexes get fragmented over time and
don't fit in RAM anymore.
We have a process to reindex everything in background with no
performance hit (we make sure there's always an index for a set of
keys by creating duplicates), except for the "_id" index which keeps
on growing in size.

Is there something we can do for the "_id" index, so that we can
periodically recreate it with no performance/locking issues?


On Aug 17, 6:50 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> There is a case open forbackgroundre-indexing, though not sure when
> it'll happen:http://jira.mongodb.org/browse/SERVER-787
>
> 1.6 does handle deletes better, so this may not be as much of an issue
> in 1.6 as 1.4
>
> On Tue, Aug 17, 2010 at 2:26 AM, Daniel  Karp <danielk...@gmail.com> wrote:
>
>
>
>
>
>
>
> > After updating to 1.6, we ran reIndex on all of our collections, and
> > our total index size dropped by 50%.  Since we regularly delete from
> > some of our collections, we'd like to be able to reindex regularly as
> > well.
>
> > We also noticed that indexes created withbackground:true, reindex in
> > thebackgroundas well.  That led to the thought that if we could have
> > all indexes withbackground:true, we could reindex a table entirely in
> > thebackground.  Unfortunately, as far as I can tell, the _id_ index
> > cannot be created withbackground:true, because it cannot be
> > deleted.
>
> > Is there any way to reindex in thebackground, possibly by changing

Eliot Horowitz

unread,
Sep 15, 2010, 11:22:51 PM9/15/10
to mongod...@googlegroups.com
What version are you on?
With 1.6 it shouldn't be much of a problem.

Mircea Pasoi

unread,
Sep 16, 2010, 12:11:37 AM9/16/10
to mongodb-user
We're using a dev branch from August 23rd (it's after 1.6 was
launched). It's still a problem, I just recreated our database and
saved 250MB on indexes (gone from 1.55GB total index size tot 1.30GB),
mostly on "_id".

On Sep 15, 8:22 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> What version are you on?
> With 1.6 it shouldn't be much of a problem.
>
>
>
>
>
>
>
> On Wed, Sep 15, 2010 at 11:00 PM, Mircea Pasoi <mircea.pa...@gmail.com> wrote:
> > We are in a very similar situation (we regularly delete from some of
> > our collections) and all our indexes get fragmented over time and
> > don't fit in RAM anymore.
> > We have a process to reindex everything inbackgroundwith no

Eliot Horowitz

unread,
Sep 16, 2010, 12:16:34 AM9/16/10
to mongod...@googlegroups.com
Are you using ObjectIds are a custom _id ?
If ObjectIds, what kind of deletes do you do? Random or oldest

Mircea Pasoi

unread,
Sep 16, 2010, 12:27:21 AM9/16/10
to mongodb-user
We're using normal ObjectIds.

The deletes are not quite random. For each user we delete the "oldest"
documents in the database based on a custom ordering (it's not
"oldest" as in based on creation timestamp, we use a different
attribute to order things)

kevin

unread,
Sep 16, 2010, 12:33:55 AM9/16/10
to mongod...@googlegroups.com
can i insert data when background index is going on in 1.6?
i got errors in 1.4 saying i can't insert data when background indexing in on

Eliot Horowitz

unread,
Sep 16, 2010, 12:36:46 AM9/16/10
to mongod...@googlegroups.com
You should be able to insert data both in 1.4 and 1.6...
Do you know what the exact error was in 1.4?

kevin

unread,
Sep 16, 2010, 12:55:37 AM9/16/10
to mongod...@googlegroups.com
it said something like, insert not allowed or insert disabled during background index creation process.
dont have the exact message, i moved to 1.6 and havent tried it since.

Mircea Pasoi

unread,
Sep 16, 2010, 2:13:15 AM9/16/10
to mongodb-user
Do you have any ideas on how we could periodically recreate the "_id"
index?

Eliot Horowitz

unread,
Sep 16, 2010, 9:30:50 AM9/16/10
to mongod...@googlegroups.com
.reIndex() will do it - but it will block.
how long does it take to reIndex?

I'm not sure that's the best plan though.
I think even though there is wasted space - you're better of not re-indexing.
If you're deleting oldest items - there will be a cap on the size
based on how often you flush from the back.

Mircea Pasoi

unread,
Sep 16, 2010, 4:54:56 PM9/16/10
to mongodb-user
It takes 20-30minutes and it blocks, which is something we're trying
to avoid.

Eliot Horowitz

unread,
Sep 16, 2010, 10:20:52 PM9/16/10
to mongod...@googlegroups.com
How much wasted space do you end up with in the index if you let it go
for a while?

Mircea Pasoi

unread,
Sep 21, 2010, 8:09:28 PM9/21/10
to mongodb-user
It's hard to tell because we reindex daily every index, except for the
_id ones. Also, our dataset is slowly growing, so I'm not sure I can
quantify this. Should there be an upper bound on how big the indexes
get?

Eliot Horowitz

unread,
Sep 21, 2010, 9:20:15 PM9/21/10
to mongod...@googlegroups.com
The upper bound is going to be on how much you delete - but there
should be an upper bound based on the app.
Hard for me to say since I can't know all the details - but that's
what I'm curious about.

Mircea Pasoi

unread,
Nov 3, 2010, 10:02:18 PM11/3/10
to mongodb-user
Didn't reach an upper-bound yet. It seems we have no choice but to
periodically reindex every collection, but that block the database.
Any chance of background indexing for _id any time soon?

Eliot Horowitz

unread,
Nov 4, 2010, 3:34:45 AM11/4/10
to mongod...@googlegroups.com
Can you try the 1.7 nightly tomorrow?
How big is the index now?
How many objects?

Mircea Pasoi

unread,
Nov 4, 2010, 3:48:11 AM11/4/10
to mongodb-user
What exactly happens in the 1.7 nightly tomorrow, are we going to be
able to background reindex on _id?

We have 12M documents and the total index size ranges from 2GB (fresh
replica) to 3GB (defragmented replica after 1 day). The majority of
the difference is from the _id indexes. As a side node, we delete
between 1K and 100K documents every 30mins and we reindex everything
in background (except _id) daily.
We used to have 20M+ documents in the database a week ago and the
defragmentation issue was much more accute back then. We had to shrink
the database to make it work on our 7.5GB RAM instances.

Eliot Horowitz

unread,
Nov 4, 2010, 8:25:08 PM11/4/10
to mongod...@googlegroups.com
Better online cleaning when a delete happens.
So won't shrink immediately but if the index is sparse should start shrinking

Mircea Pasoi

unread,
Nov 4, 2010, 9:26:24 PM11/4/10
to mongodb-user
Ok, I'll try it on one of the replicas and see how it goes. Is the
nightly from http://www.mongodb.org/downloads good, or do I need to
wait for the next one?
> > For more options, visit this group...
>
> read more »

Eliot Horowitz

unread,
Nov 5, 2010, 12:45:23 AM11/5/10
to mongod...@googlegroups.com
That one is fine.

Mircea Pasoi

unread,
Nov 8, 2010, 1:48:00 AM11/8/10
to mongodb-user
I installed the dev build on one of the machines and created a fresh
replica. I did the same thing on another machine with 1.6.3 installed.
They both started with 1.5GB of indexes, out of which ~400MB for _id
indexes.

Here are the results after 3 days:
- dev build - 2.1GB of indexes, 501M of _id indexes
- 1.6.3 - 2.3GB of indexes, 643M of _id indexes
It seems the dev build is better at fragmentation, but the indexes are
still growing.

I also noticed that there'a pretty big difference between running
reIndex() and creating a replica from scratch. Today I ran reIndex()
on all the collections in a replica and the indexes dropped to 1.9GB,
while a completely fresh replica has only 1.5GB of indexes. Where is
the difference coming from?

At this point, it seems the best option is to recreate replicas from
scratch every few days. What would be easiest way to automate this?


On Nov 4, 8:45 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> That one is fine.
>
>
>
>
>
>
>
> On Thu, Nov 4, 2010 at 9:26 PM, Mircea Pasoi <mircea.pa...@gmail.com> wrote:
> > Ok, I'll try it on one of the replicas and see how it goes. Is the
> > nightly fromhttp://www.mongodb.org/downloadsgood, or do I need to
> > wait for the next one?
>
> > On Nov 4, 5:25 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> >> Better online cleaning when a delete happens.
> >> So won't shrink immediately but if the index is sparse should start shrinking
>
> >> On Thu, Nov 4, 2010 at 3:48 AM, Mircea Pasoi <mircea.pa...@gmail.com> wrote:
> >> > What exactly happens in the 1.7 nightly tomorrow, are we going to be
> >> > able tobackgroundreindex on _id?
>
> >> > We have 12M documents and the total index size ranges from 2GB (fresh
> >> > replica) to 3GB (defragmented replica after 1 day). The majority of
> >> > the difference is from the _id indexes. As a side node, we delete
> >> > between 1K and 100K documents every 30mins and we reindex everything
> >> > inbackground(except _id) daily.
> >> > We used to have 20M+ documents in the database a week ago and the
> >> > defragmentation issue was much more accute back then. We had to shrink
> >> > the database to make it work on our 7.5GB RAM instances.
>
> >> > On Nov 4, 12:34 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> >> >> Can you try the 1.7 nightly tomorrow?
> >> >> How big is the index now?
> >> >> How many objects?
>
> >> >> On Wed, Nov 3, 2010 at 10:02 PM, Mircea Pasoi <mircea.pa...@gmail.com> wrote:
> >> >> > Didn't reach an upper-bound yet. It seems we have no choice but to
> >> >> > periodically reindex every collection, but that block the database.
> >> >> > Any chance ofbackgroundindexingfor _id any time soon?
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages