Add safe=True to your insert statement (or look in the mongod logs
during the inserts). I suspect you are having duplicate key errors due
to pymongo inserting ObjectId instances into your documents. An
instance of ObjectId is created for the _id field on the client side
if you don't provide a value for that field yourself.
On Thu, Sep 20, 2012 at 3:05 PM, Ian <
ian...@gmail.com> wrote:
> I'm new to MongoDB and pymongo so it's possible I'm making a mistake here,
> but basically I want to generate a collection of 150,000,000 docs on an
> external host. I'm using the following code to randomly generate and select
> docs for bulk insertion, then inserting using a generator:
>
> <code>
> from datetime import datetime
> from dateutil.relativedelta import relativedelta
> from pymongo import Connection
> import random
>
>
> def create_generator(iterable):
> for i in iterable:
> yield i
>
>
> authors = ['me', 'you', 'abigail', 'barney', 'charlie', 'dennis', 'edward',
> 'frank', 'gregory', 'hank', 'ian']
> titles = ['One fish, two seas', 'Genetics', 'Party like it is 1999',
> 'Politics', 'Religion', 'Fashion', 'Games']
> tags = ['fun', 'interesting', 'boring', 'odd', 'fantastic', 'unique',
> 'terrible', 'funny', 'striking', 'amazing']
>
> conn = Connection(LOG_SERVER)
> coll = conn['big_db']['big_collection']
> now = datetime.now()
>
> # Create unique docs
> new_docs = list()
> doc_gen = create_generator(range(1500000))
> for i in doc_gen:
> if i % 1000 == 0:
> now = now - relativedelta(days=1)
> new_docs.append({"author": random.choice(authors), "title":
> random.choice(titles), "tag": random.sample(tags, random.randint(1,
> len(tags))), "time": now})
>
> # Bulk insert subsets of docs
> sample_gen = create_generator(range(1000))
> for i in sample_gen:
> coll.insert(random.sample(new_docs, 150000))
> </code>
>
>
> What happens is that the first bulk insert is successful and the collection
> has an additional 150,000 docs. Every bulk insert that comes after this
> generates the 150,000 docs just like the first bulk insert, but only 0-30
> docs are actually inserted into the collection. I've run the same code but
> manually running the bulk insert and get the same behavior, even when
> reducing the bulk insert count all the way down to 500. Running this code on
> the MongoDB host directly has the same behavior.
>
> Is there something else I should be doing during these bulk inserts?
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to
mongod...@googlegroups.com
> To unsubscribe from this group, send email to
>
mongodb-user...@googlegroups.com
> See also the IRC channel --
freenode.net#mongodb