Sequential auto-incrementing with PyMongo and Python 3

1,554 views
Skip to first unread message

Yaroslav Kyrpych

unread,
Feb 17, 2014, 1:40:10 PM2/17/14
to mongod...@googlegroups.com
Hello,
 
Schema design I am working on requires sequential auto-incrementing. I know it's not advised in MongoDB. However, I need to create document with sequential key:value pairs that will be used as referential join between other related collections. I am not sure if it's better to implement it in Mongo or Python and how. Any advice is highly appreciated.
 
Thank you,
 
Yaroslav

Alfonso Pinto

unread,
Feb 17, 2014, 3:22:55 PM2/17/14
to Yaroslav Kyrpych, mongod...@googlegroups.com
I had the same requirement in a Java project.
I solved it creating a collection to hold the sequences, basically it has a name for the sequence and last value provided.
The you can use findAndModify with $inc to request new value for sequence.
Also, to improve performance, you can request a batch (100 or 200 ids): instead of using $inc with 1, use it with 100 or 200.
This will give you a range to work: from lastId+1 to the result of find and modify. When you run out of ids, just run another findAndModify.

I implemented this as a service in the Java application and works perfectly.

-- 
Alfonso Pinto


En 17 de febrero de 2014 at 19:40:17, Yaroslav Kyrpych (yaroslav...@gmail.com) escrito:

--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yaroslav Kyrpych

unread,
Feb 18, 2014, 12:57:11 AM2/18/14
to Alfonso Pinto, mongod...@googlegroups.com
Alfonso,

Thank you for responding! Could you expand a bit more on this sentence: "I solved it creating a collection to hold the sequences, basically it has a name for the sequence and last value provided." I was thinking of creating list in Python and adding sequenced numeric values to it to keep track of the latest one, and then use that to generate next number in sequence for id in Mongo.

Thanks again,

Yaroslav 

Alfonso Pinto

unread,
Feb 18, 2014, 6:19:19 AM2/18/14
to mongod...@googlegroups.com
db.incrementer.find()
{ "_id" : "stagedCdr", "sequence" : NumberLong("445000000011718766") }

then you get next sequence:

 //Use driver’s equivalent
db.incrementer.update({_id:”stagedCdr”},{$inc:{sequence:1}})

But that means one call to db each time you need a new number.
If you use a variable to hold how many numbers you want to cache you can do this:

var numbersNeeded=100
var first = db.incrementer.update({_id:”stagedCdr”},{$inc:{sequence:$numbersNeeded}}) - $numbersNeeded+1
var last = first -1 + $numbersNeeded;

Of course you need to check if you have in memory cached that specific sequence and you can get more cached numbers or if you need to go again to database.
In my case I’ve this encapsulated in a Java service and the entry point is a method called getNextId that is thread safe.


-- 
Alfonso Pinto


En 18 de febrero de 2014 at 06:57:11, Yaroslav Kyrpych (yaroslav...@gmail.com) escrito:

A. Jesse Jiryu Davis

unread,
Feb 18, 2014, 9:36:19 AM2/18/14
to mongod...@googlegroups.com
You can ensure that multiple Python threads and processes will each get unique ids with findAndModify():

>>> import pymongo
>>> client = pymongo.MongoClient()
>>> client.test.ids.drop()
>>> client.test.ids.find_and_modify({}, {'$inc': {'i': 1}}, upsert=True, new=True)
{u'i': 1, u'_id': ObjectId('53036f61ca10a7ebe5caf875')}
>>> client.test.ids.find_and_modify({}, {'$inc': {'i': 1}}, upsert=True, new=True)
{u'i': 2, u'_id': ObjectId('53036f61ca10a7ebe5caf875')}
>>> client.test.ids.find_and_modify({}, {'$inc': {'i': 1}}, upsert=True, new=True)
{u'i': 3, u'_id': ObjectId('53036f61ca10a7ebe5caf875')}

But you're talking about auto-incrementing ids and relational joins, so it sounds to me like you're trying to make MongoDB act like a relational database. It is not a relational database. What is the fundamental problem you are trying to solve? Perhaps you can use MongoDB to solve it in a more natural way.

Yaroslav Kyrpych

unread,
Feb 18, 2014, 9:54:27 AM2/18/14
to mongod...@googlegroups.com, mongod...@googlegroups.com
Mongo is inherently rigid by not allowing dynamic increase of data over time, i.e. there is 16M limit on the size. The data will grow over time, and the only way around it is to have joins, i.e. separate data into smaller pieces. In addition, if I use embedding which is another way of doing it, I would be pulling tons of unnecessary data for most queries. 

Yaroslav Kyrpych

unread,
Feb 18, 2014, 9:54:49 AM2/18/14
to mongod...@googlegroups.com, mongod...@googlegroups.com
Thank you again!

A. Jesse Jiryu Davis

unread,
Feb 18, 2014, 12:39:31 PM2/18/14
to mongod...@googlegroups.com
Perhaps this schema-design tutorial would help, it talks about how to store related data in multiple documents:


If you're concerned about pulling unnecessary data over the network, check out the "projection" feature. It lets you select which parts of the document to retrieve from the server:

Yaroslav Kyrpych

unread,
Feb 18, 2014, 2:18:05 PM2/18/14
to mongod...@googlegroups.com, mongod...@googlegroups.com
Thank you

Asya Kamsky

unread,
Feb 19, 2014, 7:23:24 PM2/19/14
to mongodb-user
Yaroslav:

Sorry, I'm going to have to object to this characterization.

If the data grows over time and you run into 16MB document limit then you may have misunderstood how documents work.

A document is not analogous to a table in RDMBS, it's more analogous to a row.  It represents an entity and all inherent attributes of that entity can (and in many cases should) be stored in that document.  But that does NOT mean that everything that's ever related to the entity should be stuffed inside of that poor document!

Now you correctly point out that embedding would be pulling tons of unnecessary data for most queries, so than why would 16MB limit even matter?

Asya

Yaroslav Kyrpych

unread,
Feb 19, 2014, 8:06:59 PM2/19/14
to mongod...@googlegroups.com, mongodb-user
Just because amount of data grows over time. Imagine collecting of temperature for specific location almost on continuous basis, i.e. every minute, and over century it adds up. This is just hypothetical example. I don't stuff there anything except for readings of temp.

Asya Kamsky

unread,
Feb 19, 2014, 10:54:18 PM2/19/14
to mongodb-user
I don't see why you would ever want to embed the temperature readings
inside a single object - wouldn't each reading be its own entity?
Unless you wanted to get this information back with the object, it
makes no sense to embed it.

Asya
Reply all
Reply to author
Forward
0 new messages