Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Terrible iteration speed over rows.
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
jordan  
View profile  
 More options Aug 3 2012, 1:17 pm
From: jordan <neutrin...@gmail.com>
Date: Fri, 3 Aug 2012 10:17:26 -0700 (PDT)
Local: Fri, Aug 3 2012 1:17 pm
Subject: Terrible iteration speed over rows.

I'm trying to remove records that match a very simple query (involving one
unindexed field).

There are about 266 million entries in the database (single instance, one
machine), and about 200 million of them will wind up being deleted.

It looks like my command: db.collection.remove({ "field":"value" }) is
deleting about 500 records per second.

By my calculations, this will take about 5 days to complete.

Is mongo simply not the correct database for 266 million "small" documents?
Am I running the wrong query? It seems inadvisable to keep an index on
every field, but sometimes I will have to run queries involving un-indexed
fields.

Thank You--
--Jordan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes Freeman  
View profile  
 More options Aug 3 2012, 1:31 pm
From: Wes Freeman <freeman....@gmail.com>
Date: Fri, 3 Aug 2012 13:31:48 -0400
Local: Fri, Aug 3 2012 1:31 pm
Subject: Re: [mongodb-user] Terrible iteration speed over rows.

Can you share some mongostat output?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jordan  
View profile  
 More options Aug 3 2012, 2:48 pm
From: jordan <neutrin...@gmail.com>
Date: Fri, 3 Aug 2012 11:48:09 -0700 (PDT)
Local: Fri, Aug 3 2012 2:48 pm
Subject: Re: [mongodb-user] Terrible iteration speed over rows.

I have switched to using a simple python script that iterates over a cursor
for all records. This seems to be about 10X faster than remove().

*I can post mongostat output once this query finishes* (about one day). I
don't want to destroy data while I'm iterating.

This is mongostat for the query that is executing currently (iterate
manually). Stay tuned for an update.

insert  query update delete getmore command flushes mapped  vsize    res
non-mapped locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn      
time
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:47
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:44:48
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:49
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:50
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:44:51
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:52
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:44:53
     0      0      0      0       1       1       0   258g   518g  1.41g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:54
     0      0      0      0       1       1       0   258g   518g  1.41g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:55
     0      0      0      0       0       1       0   258g   518g  1.41g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:44:56
insert  query update delete getmore command flushes mapped  vsize    res
non-mapped locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn      
time
     0      0      0      0       1       1       0   258g   518g  1.41g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:57
     0      0      0      0       1       1       1   258g   518g  1.41g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:44:58
     0      0      0      0       0       1       0   258g   518g  1.41g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:44:59
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:00
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:01
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:45:02
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:03
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:04
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:45:05
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:06
insert  query update delete getmore command flushes mapped  vsize    res
non-mapped locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn      
time
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     1|0    62b     1k     4  
14:45:07
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:08
     0      0      0      0       1       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0   107b     4m     4  
14:45:09
     0      0      0      0       0       1       0   258g   518g  1.42g  
    260g        0          0       0|0     0|0    62b     1k     4  
14:45:10


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes Freeman  
View profile  
 More options Aug 3 2012, 2:53 pm
From: Wes Freeman <freeman....@gmail.com>
Date: Fri, 3 Aug 2012 14:53:11 -0400
Local: Fri, Aug 3 2012 2:53 pm
Subject: Re: [mongodb-user] Terrible iteration speed over rows.

Were you running remove() from the shell before? Or from a driver?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Mikola  
View profile  
 More options Aug 3 2012, 2:55 pm
From: Jeremy Mikola <jmik...@gmail.com>
Date: Fri, 3 Aug 2012 11:55:41 -0700 (PDT)
Local: Fri, Aug 3 2012 2:55 pm
Subject: Re: Terrible iteration speed over rows.

On Friday, August 3, 2012 1:17:26 PM UTC-4, jordan wrote:

> Is mongo simply not the correct database for 266 million "small"
> documents? Am I running the wrong query? It seems inadvisable to keep an
> index on every field, but sometimes I will have to run queries involving
> un-indexed fields.

Without an index for the query, you're inviting a table scan across all 266
million documents. Even if your entire working set fits in memory, the
initial page faults are going to cause disk to be read into memory the
first time. Additionally, each insert/delete is going to require updating
each index on the collection.

Depending on your use case and if this is a one-time migration, an
alternative approach may be to insert the 66 million documents you intend
to keep into a new collection and then recreate the indexes you need and
swap the collections. You could use the lack of concurrency in db.eval()<http://www.mongodb.org/display/DOCS/Server-side+Code+Execution#Server...> to
your advantage to drop the old collection and rename the newly created
collection atomically. Also, consider these are small documents, inserting
them fresh would help avoid the fragmentation you would expiration with a
mass-deletion (unless you're willing to compact()<http://www.mongodb.org/display/DOCS/compact+Command>
 afterwards).


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jordan  
View profile  
 More options Aug 3 2012, 3:38 pm
From: jordan <neutrin...@gmail.com>
Date: Fri, 3 Aug 2012 12:38:30 -0700 (PDT)
Local: Fri, Aug 3 2012 3:38 pm
Subject: Re: Terrible iteration speed over rows.

Wes,

*Were you running remove() from the shell before? Or from a driver? *
This was in the mongo shell.

Jeremy,

*Even if your entire working set fits in memory, the initial page faults
are going to cause disk to be read into memory the first time. *
By "small" document, I'm talking about 500 bytes per document. This gives a
total DB size of nearly 200GB (with indexes). Certainly too large for
memory.

*Additionally, each insert/delete is going to require updating each index
on the collection...*
*Depending on your use case and if this is a one-time migration, an
alternative approach may be to insert the 66 million documents you intend
to keep into a new collection and then recreate the indexes you need and
swap the collections. *
I believe that this reasoning does indeed apply to my use case. The python
script (which is 10x faster) is scanning the rows and inserting the wanted
entries into an entirely new Mongod instance (on another server). I was
surprised that even including network latency,. this approach is still
faster.
If updates to large collections (IE, Remove) are, indeed, considerably
slow, is this the generally accepted alternative?

Thank You
--Jordan


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremy Mikola  
View profile  
 More options Aug 3 2012, 4:46 pm
From: Jeremy Mikola <jmik...@gmail.com>
Date: Fri, 3 Aug 2012 13:46:53 -0700 (PDT)
Local: Fri, Aug 3 2012 4:46 pm
Subject: Re: Terrible iteration speed over rows.

It's one alternative. Basically, when removing documents, we need to (a)
clear out any references to those documents in the collection's indexes and
(b) add those documents' disk locations to the free list (DB internals). If
all of your indexes cannot be contained in memory, (a) may involve frequent
disk access. Meanwhile, (b) is definitely going to involve scattered disk
access. Overall, this remove operation would be bound by disk IO.

Another option would be to drop all indexes on the collection, proceed with
the remove query, and then recreate indexes for the 66 million retained
documents. The benefit of this is highly dependent on your schema (how many
indexes) and system (combined index size vs. memory) and may provide
neglible improvement in practice.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Wes Freeman  
View profile  
 More options Aug 3 2012, 4:54 pm
From: Wes Freeman <freeman....@gmail.com>
Date: Fri, 3 Aug 2012 16:54:52 -0400
Local: Fri, Aug 3 2012 4:54 pm
Subject: Re: [mongodb-user] Re: Terrible iteration speed over rows.

Yeah, it sounds like you were churning index and data into RAM to do the
remove. I assume your indexes are bigger than RAM, also? So you're page
faulting on the index updates as well as the remove. Dropping the indexes
probably would have helped if this remove operation takes precedence over
queries.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »