Multiple Databases but same Schema Module?

Skip to first unread message

Alexander Morano

unread,
Jul 18, 2012, 5:32:58 PM7/18/12
to mongoen...@googlegroups.com
Aliasing and multiple database support works great, even if the server is the same; however, what about if the definition is the same?

IOW, the following code works 100% great:

[code] 
import mongoengine as me
asset = me.connect('asset', alias='default', host='mongodb://nexusone/asset')
resource = me.connect('resource', alias='resource', host='mongodb://nexusone/resource')
class Person(me.Document):
meta  = {'allow_inheritance': 0, 'db_alias' : 'resource'}
name  = me.StringField()
age  = me.IntField()

class User( me.DynamicDocument ):
meta  = {'allow_inheritance':0}
user  = me.StringField( required=1, unique=1, max_length=20 )

a = Person(age=29)
a.save()

u = User.objects(user='moranoa').first()
print u.user
[/code]

but what if I wanted to use the User collection in the first database?

trying to inject 'db_alias':'resource' into the meta for the User class does not change it's connection.

It almost seems (I have not dug too far) that the meta dict and the class itself are cached elsewhere, and the meta dict is not re-read for updated information.

The use case here is simple:

We have a single schema laid out, but want to sync items from one to another, OR, we have data in another data center, like MySQL, Filemaker, etc...  and want to be able to copy/update from that to more than one database.

For example: We currently have data in our primary production database in SQL, but we have Maya, Nuke, etc... tools built off using Mongo (via Mongoengine).

During every 10 or so mins, we look at the live production in SQL, and check a dirty field for updates, we then copy parts of those records to the mongo layer.

What we don't want to do is mix out development and live production data areas, so as we make new tools against the Mongo layer, we want to have it hit a seperate database, but that has the exact same schema as the live.

In theory we can get around this by making a new module, copied from the schema in the live, but that throws a monkey wrench into update once, reuse, because we'd have to copy sections out of the development version into the live.

One way around this, and what SQLObject does, is put the meta info as a class instance and refer to it pre-op. That way, if I wanted to stupidly change the connection at runtime, I could.

I dont think, nor suggest, going that way, I am just bringing up a design point: It would be nice to be able to use a single module as schema definition for multiple connections.

Currently, no matter what I do, it will always use the first connection the meta dict had at the time of initialization.

Is any of this use case valid for anyone? Or am I over-complicating it?

I wish I could share my DBO factory, to show you guys exactly what we are doing. It wraps Filemaker, MySQL and Mongo great, just this last bit that would be nice to have.

Cheers.

Ross Lawley

unread,
Jul 19, 2012, 3:52:51 AM7/19/12
to mongoen...@googlegroups.com
Hi,

Sorry, can you clarify your intent here?  You want to change the database related to a Document class at runtime?  Normally people do use the same module but have different settings for production, test, development, q&a and then run those instances on different machines.  Looks like thats not an option for you?

I dont like the idea of changing the database related to a Document at runtime as I think it will lead to bugs or data not stored in the correct place if not handled correctly.  Perhaps a context manager could be used to switch it and handle the automatic converting back.

Ross

--
 
 
 

Alexander Morano

unread,
Jul 19, 2012, 4:56:07 PM7/19/12
to mongoen...@googlegroups.com
Indeed. I don't want to change it at runtime per se. 

We run a local branch, an alpha, beta and stable branch. Development areas are not the issue. We need to be able to use the same schema (shared module.py) on multiple databases from whatever area

I have multiple databases, some of which use the same collections. I'd rather NOT make multiple copies of the schema in my module.py to accomidate being able to connect to those multiple databases during a single python interpreter session.

A quick use case is simply syncing from one database to another with the same collection/schema.

I also do not like the idea of changing a database connection at runtime. What I tried to do is do it ahead of time, but realized that only a single copy of the module.py will be in memory, with whatever db_alias I set the first time.

I realize this is a difficult idea, since python loads module.py into memory and, even if I could change the connection, that would cause a slew of other issues, since we track sessions from the live production database. All the sudden session data would go into a development base, then back, then switch again, etc... as we updated the connection.

I do not propose that AT ALL.

There should be a way to have two different databases connected, using the same schema, so that something like a simple sync could be accommodated.

I am trying to keep the code once, reuse principle in effect here.

Otherwise, yes, I could solve this by making a copy of the module, call it module2, and set the connection on that.

It just does not seem correct to have XX module copies for as many databases that share the same schema.

As for a context manager, that is what my DBO factory is doing with SQL, FM, and Postgres.

We just can't do the same thing with ME, at least, how I have been trying (meta_dict).

Thus, I am not advocating a major design change (unless?) I am merely trying to get a good idea on how to accomplish this with ME.

Cheers for the reply!

Ross Lawley

unread,
Jul 20, 2012, 4:40:52 AM7/20/12
to mongoen...@googlegroups.com
Hi,

Ah ok - I think I get your intent here - this is so you can migrate the data across from production to dev using pymongo code, rather than say a mongodump.

No problems will look at implementing a context manager to allow you to temp switch a db alias or similar.

Sudo code:

    for post in Post.objects(created_gte=last_week):
        with SwitchConnection(db_alias="default", connection={}):
            Post(post).save()

Hmm thats quite verbose...

Any suggestions?

Ross

--
 
 
 

Alexander Morano

unread,
Jul 20, 2012, 12:27:18 PM7/20/12
to mongoen...@googlegroups.com
Yes, exactly.

Sorry my first post was muddled =p

I definitely appreciate the look-see.

As for implementation, yes I agree, little verbose.

What about just explicit setting in the filter itself:

[code]
for post in Post.objects(created_gte=last_week):
      Post(post, db_alias="default").save()
[/code]

Not sure that is "better". 

I suppose it simply accomplishes wrapping the verbosity of the context away from the user directly.
 

On Friday, July 20, 2012 1:40:52 AM UTC-7, Rozza wrote:
Hi,

paddy....@gmail.com

unread,
Dec 14, 2012, 3:18:49 PM12/14/12
to mongoen...@googlegroups.com
Hi Ross,
     I have a similar requirement. Have you already completed the solution that you have mentioned in your email? If so, in which version of mongoengine is it available?

My requirement:
  • The schema is the same.
  • Use multiple databases
  • Values in the URL determine which database is used. The database will be set in the beginning of a HTTP request. The database will NOT change during the course of a HTTP transaction.

It looks like the above cannot be done with django+mongoengine and one single connection to mongod. Am I correct?
As a workaround, I was thinking of using multiple connections. Is it possible to create new connections? I'm having problems creating new connections with register_connection.

Appreciate your expert advice.

Thanks,
PC

Ross Lawley

unread,
Dec 15, 2012, 8:34:12 AM12/15/12
to mongoen...@googlegroups.com
Hi Paddy,

Unfortunately I haven't got round to this yet.  Connections are just stored in a dict so you could manually manipulate it. However, it is in module space so a context manager seems the safest way to achieve this.

Ross
--
 
 
 
Reply all
Reply to author
Forward
0 new messages