*Long time lurker; first time poster*. My partner and I are standarizing on Django / Python for the front end of www.ookles.com, our new startup. We're both php folk and I've convinced him that Django is better than ruby (based, honestly, on my respect for Simon and Adrian rather than a real analysis; transitive geek fu; any who).
The issue at hand for us is we have essentially one application: Ookles which is a highly interactive web site. Now having scaled a MySQL app (lets not go down the postgres discussion; we're using MySQL 5 so we have views and stored procs) to the terabyte level at my last gig, I learned the lesson of "DO NOT; REPEAT DO NOT; PUT EVERYTHING IN ONE DB". But it at least feels like Django is really oriented around the concept of one db. To wit:
I've been playing with django today (really like it so far!). To start a project, you define your database model with a set of sub-classes representing each database table. Once you've defined everything you can do a couple of things, either " django-admin.py sql app" which will spit out an sql script for you to create your database tables with or you can do " django-admin.py install app" and that will create all the tables for you. Fantastic stuff and if you can get your head around doing it this way, then it saves time by having the model setup and the database.
So far, Django feels much more solid than rails. Yes, you have to do more but you feel good about doing it. Relying on rails magic is sometimes a little hard to swallow :)
(From my co founder Mike)
Now I haven't hacked Django much myself yet (I've been working on the back end tools, db loader and overall schema). What support does Django have for multiple db stuff?
I apologize if this is a real newbie question. I did run through the basic docs (great job btw; thank you). Also we'll need to write a baseline simple db backed, captcha class near immediately to keep out the spam bots. Any interest in our contributing that back?
Thanks Scott -- ------------------------------------------------------- J. Scott Johnson *** Now Available for Consulting *** blog: http://fuzzyblog.com/ fuzzygr...@gmail.com aim: fuzzygroup cell: 857 222 6459 -------------------------------------------------------
On 12/29/05, Scott johnson <fuzzygr...@gmail.com> wrote:
> Now I haven't hacked Django much myself yet (I've been working on the back > end tools, db loader and overall schema). What support does Django have for > multiple db stuff?
Hey Scott,
Welcome!
Django doesn't support multiple DBs out of the box, but this is something we (Simon particularly) have been very keen on adding. Fortunately, the database code is rather nicely abstracted, so most of the refactoring would be in just two modules.
To what extent would you be using multiple DBs? Would you be employing the technique whereby certain records (e.g. users 1 to 1,000,000) are in certain databases, or would you be doing more of a standard setup, in which you'd want database reads to be spread evenly across multiple DBs? Go ahead and explain the setup, and we can get started on designing the feature.
> Also we'll need to write a baseline simple > db backed, captcha class near immediately to keep out the spam bots. Any > interest in our contributing that back?
On Dec 29, 2005, at 9:10 AM, Adrian Holovaty wrote:
> ... or would you be doing more of a standard setup, > in which you'd want database reads to be spread evenly across multiple > DBs? Go ahead and explain the setup, and we can get started on > designing the feature.
I've always though that this particular -- and common -- use case should be delegated to the DB level using one of the many excellent replication/distribution tools for your database. For example, you could easily do read distribution with pg_pool or sqlrelay, and it would be transparent to Django. I don't see a good reason to tackle replication in Django itself as that's more or less a solved problem.
On 29 Dec 2005, at 20:29, Jacob Kaplan-Moss wrote:
> I've always though that this particular -- and common -- use case > should be delegated to the DB level using one of the many excellent > replication/distribution tools for your database. For example, you > could easily do read distribution with pg_pool or sqlrelay, and it > would be transparent to Django. I don't see a good reason to > tackle replication in Django itself as that's more or less a solved > problem.
I disagree. There's a lot more to separate databases than just replication - when you scale big there are all kinds of ways things might need to be partitioned. You might want to keep "cheap" data (like traffic logs for user's weblogs) on a different DB cluster from expensive data (like their blog entries themselves). Some data gets accessed all the time while some other data is only ever written - etc etc.
I'd love Django to have a reputation as the web framework that scales. As far as I can tell, big LAMP sites that scale are mostly done as PHP with a whole load of custom scary stuff - connections to multiple databases, memcached, even XMLRPC calls to backend services written in Java. We already have caching and we can do calls to backend services easily but the single database connection assumption is baked right in to the framework.
Unfortunately, I don't have the experience of scaling big to say much more than that. This is where input from people like Scott becomes invaluable :)
>I've always though that this particular -- and common -- use case >should be delegated to the DB level using one of the many excellent >replication/distribution tools for your database. For example, you >could easily do read distribution with pg_pool or sqlrelay, and it >would be transparent to Django. I don't see a good reason to tackle >replication in Django itself as that's more or less a solved problem.
Beside the fact that most of those "excellent" after-market replication solutions just plain suck ;-) - there are very good reasons to have data-driven distribution. This might be controlled by tables - some tables living on different databases or servers - or by content. Like data from older years living in other databases or on other servers. So even if you get those replication solutions to work reliable (which I never was able to do with sqlrelay - the client goes bozo if the server has problems), you _can't_ do those data-driven distribution with after-market tools. This has to be done on application level.
A scenario from my work: a ERP system which produces loads of accounting data. Older data is moved to some external database, because it would put too much load on the active database. Then there are special data aggregates that are stored in another database for faster queries - they are specially prepared. So the application needs to access three databases simultaniously. And in large installations, those databases will be even run on different servers.
So, yes, I do think it would be very useful for Django to be able to access multiple databases via it's ORM. Actually I have already exactly that requirement in some Django project: while trying to build a admin interface for my db-based nameserver and db-based mail-system, I have the problem that DNS and Mail are handled by different databases, so I am blocked on that project currently, as Django won't let me work with both databases ...
> So, yes, I do think it would be very useful for Django to be able > to access multiple databases via it's ORM
incidently, we need support for postgres schemas also - i have an Financial Accounting app where, in multi-company mode, the tables for each company are stored in a separate schema, also for multilingual stuff this is ideal
> Now I haven't hacked Django much myself yet (I've been working on > the back end tools, db loader and overall schema). What support > does Django have for multiple db stuff?
I've started a ticket to track discussions on this issue:
I'm faced with the multiple-schema problem in MySQL, which AFAICT is a lot simpler than actually having multiple databases because at least you don't have to coordinate multiple connections. Actually, it basically already works. I wrote
class Poll(meta.Model): class META: db_table = "myschema.poll"
but then Django tries to quote the whole name (including the period) which of course makes MySQL complain (beause the SQL should be `myschema`.`poll`, not `myschema.poll`). So I hacked DatabaseWrapper.quote_name so that it doesn't quote periods, but is that the "right" way to handle this or is it an ugly hack? It seems to work very well, but it feels like it might be cleaner to add a db_schema attribute or something instead. (Should I submit a patch? :-)
> So I hacked > DatabaseWrapper.quote_name so that it doesn't quote periods, but is > that the "right" way to handle this or is it an ugly hack? It seems to > work very well, but it feels like it might be cleaner to add a > db_schema attribute or something instead. (Should I submit a patch? :-)
Perhaps we could have the best of both worlds, and have the Model constructor split on period, storing db_schema internally if a schema was provided.
Adrian
-- Adrian Holovaty holovaty.com | djangoproject.com | chicagocrime.org