Potential way forward for DATABASE_URL

140 views
Skip to first unread message

Raphael G

unread,
Nov 27, 2022, 2:40:48 PM11/27/22
to Django developers (Contributions to Django itself)

Some base industry background. It's a pretty common convention to share credentials in environment variables. For many PaaS, it's common to use connection URLs to do so. So DATABASE_URL will have a URL like postgres://my_user:mypassword@somedomain/database stuffed into a single environment variable.

Django expects a configuration dictionary for its drivers. So what do people do? People install django-database-url, and pass in the string into that library (or rather, the library will read a blessed environment variable). Absent that they'll need to manually parse out the information and build the configuration dictionary. So if you just have Django time to futz around with urlparse

There have been some discussions about how to make this better, including:

These haven't seen much movement in the past couple of years. A comment in one of these e-mail threads:

> I suspect this is a "good enough is good enough" situation. Something like what Raffaele is talking about, or dsnparse, or whatever would probably be ideal. And for something to be merged into core, I think it'd need to be a more full solution than just dj-database-url.

dj-database-url takes something from an environment variable and provides a configuration dictionary. There's this feeling that having Django directly accept a string would feel more natural and correct. There are also other libraries like dsnparse, and people proposing things like adding a DSN name into settings.

I think of all the options, the third option (the proposal by Tom Forbes) is a very good option. What it looks like in practice is the addition of the following:

  • the ability for database backends to register protocol names for URLs, so that postgres://localhost:5432 will properly map to the django.db.backends.postgresql backend, but people can show up with their own mappings.
  • A configure_db(url) function, that will return a configuration dictionary meant for DATABASES
  • A similar configure_cache(url) function that will give cache configuration dictionaries meant for CACHES

A thing that is notably absent here is any blessing of DATABASE_URL. You have to do

DATABASES = {
'default': configure_db(os.environ['DATABASE_URL'])
}

yourself. It's not "ideal" in that you don't magically get behavior from your URLs, but that also means you're just doing something in a straightforward way that should be easy to debug with some print statements when needed. It feels way less likely for this to be a major design miss.

The motivating examples for the two above being supported is that Heroku will provide DATABASE_URL and REDIS_URL.

The nice thing about this solution is that it doesn't block future design space. We get a configuration dictionary that matches the existing functionality, because the added API is simple it's easy for people to inspect the results, and of course it doesn't preclude people from keeping on with their existing solutions. There isn't even an assumed usage of DATABASE_URL like with dj-database-url! Mostly magic free.

I tried to rebase the PR including the above functionality from a couple years ago, and added some basic documentation. This doesn't try and convince users to use this, but I believe the usage would be sufficient for simple cases.

So my ask here: how do people feel about moving forward with this limited scope? Previous discussions talked about wanting a larger scope for it to get merged into core. I believe that instead targetting a smaller scope means we can at least provide a workable answer to the DATABASE_URL question in the near term. And when consensus coalesces around a good overall answer to settings, the actual URL parsing logic will already be present and even more battle tested.

Jörg Breitbart

unread,
Nov 27, 2022, 3:24:14 PM11/27/22
to django-d...@googlegroups.com
Am 27.11.22 um 13:51 schrieb Raphael G:
> So my ask here: how do people feel about moving forward with this
> limited scope? Previous discussions talked about wanting a larger scope
> for it to get merged into core. I believe that instead targetting a
> smaller scope means we can at least provide a workable answer to
> theDATABASE_URLquestion in the near term. And when consensus coalesces
> around a good overall answer to settings, the actual URL parsing logic
> will already be present and even more battle tested.

+1 from my side, even with reduced versatility or scope it covers
(sometimes a too broad scope just gives you more pros and cons
discussions with a higher probability of sidetracking things into nowhere).

Some background - several years back we used SQLObject as ORM driver
alot in python projects and switched later to SQLAlchemy (still using it
for heavy db lifting projects). I always found the URI connection scheme
appealing and easy to go with.

On django side of things I kinda always wondered why the connection
settings were that verbose to get running, but never wondered hard
enough to question its explicit style.

Well, in terms of conformance to other frameworks I see a clear benefit
here.

Cheers,
Jörg

Tobias McNulty

unread,
Nov 27, 2022, 3:37:37 PM11/27/22
to django-developers
Hi Raphael,

Thanks for taking this on.

Starting with a limited scope seems like a good idea to me.

A couple other things I like about this approach:

- It tackles cache URLs at the same time (it makes sense for them to mirror one another, IMO).
- No implicit usage of DATABASE_URL, but as you said it still supplies an easily searchable answer for "Django DATABASE_URL."

Cheers,
Tobias
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/d990fef4-586b-447f-afd1-b53f23237a3an%40googlegroups.com.

Adam Johnson

unread,
Nov 28, 2022, 11:47:24 PM11/28/22
to django-d...@googlegroups.com
I’m happy with this approach, it’s a little step forwards towards maintainable settings files.

Raphael G

unread,
Nov 29, 2022, 12:45:31 AM11/29/22
to Django developers (Contributions to Django itself)
(I'm very sorry about the threading going on here, I originally replied to the very old mailing thread and then realized it had not generated consensus, so am going to try and make this thread a more focused discussion regarding concensus)

In the other thread people are discussing more generalized setting helpers. I am trying to avoid this, most because I think this work doesn't exclude that work. But also I don't want to introduce more magic personally, nor do the work involved in the settings magic personally. os.environ is straightforward IMO.

Carlton posted the following comment:

> Given that it's a single import I might still lean towards seeing it as an external package, at least for a cycle, so unknowns that come up can be resolved, and folks on an older LTS can opt-in early, etc. 
(But that's not a point of religion.) 

I am OK with putting in work to have it as a separate package for a cycle. The glib comment would be that dj-database-url was that package for many cycles, but this is not very true in practice. This is introducing extra things not originally present for cache configuration, and has this concept of the database backends holding the parsing logic. And of course there's an extremely valid underlying point here: the API really needs to be "right". Personally I believe that Django's very good deprecation strategy means that big mess-ups are fortunately fixable, but it's work for everyone (and likely would involve either some weird hack in the intermediate steps).

I would like to offer an alternative narrative to the background here, that I think more strongly justifies introducing this into Django proper. It is not the real narrative, but it is a narrative.

URL-based configuration conventions exist for database backends and cache backends in various libraries. This lets us pass in credentials as one string rather than a bunch of components to be assembled. But each backend will handle things like configuration options within those URLs differently. Overall URL parsing logic is all very similar, with important differences coming from how the database name might get passed in, how certain connection options get passed in, etc.

So it would be helpful to provide both a method on DatabaseWrapper that does basic URL parsing (to pull out the host/username/password), and for Django's supported DB backends to override this URL parsing method based on whatever convention is being applied by other libraries (or from backend-specific tooling). Same thing for caches. 

Because this is ultimately a bit backend-specific, having this logic close to the actual backend connection logic (so on these classes themselves) is the most natural, more so than having separate dictionaries with mappings to backends. New configuration option? Would be good to make sure it's handled in the URL parser as well, somehow.

Rambling a bit, but really am open to any(?) way forward that leads to "I will not need to install an extra package to handle this, nor am I personally parsing the URL with urllib.parse", and am ready to do the legwork. 

Speaking to that, if we have consensus on the principle, what would be the right step forward? An actual DEP?

Raphael

Carlton Gibson

unread,
Nov 29, 2022, 2:45:12 AM11/29/22
to django-d...@googlegroups.com
Hey Raphael. 

My only query is as we sure the API is correct going forward? 
The answer could be yes there, but I didn't (as yet get to) review the history in depth. 

We **can** deprecate things, but we get an awful lot of complaints and pushback, even changes that are clearly for the good. 
I'd rather measure twice and cut once is all. 
The whole point of the "Do it in a third-party app" approach is that we get to make sure the APIs are right, without adding churn to Django, 
and without being tied to the long-release cycle fixing the unforeseen issues that arise. 

Kind Regards,

Carlton

Raphael G

unread,
Nov 29, 2022, 3:41:41 AM11/29/22
to Django developers (Contributions to Django itself)
Alright, I'm writing up a review aid that tries to re-explain the actual changes in the PR I opened before. This document should go over all of the actual API changes that are exposed to users as well. I believe the API _is_ correct, and that future settings improvements could rely on these to implement their features (so in the larger discussion, this is offering a low-level API while an overarching high-level settings API is still being worked on). At the end of the day there are only so many ways to structure a dictionary containing a hostname, port, username, and password!

But of course the specifics are important here, so I will finish up a review aid and paste that in the PR (along with actually getting that branch passing) discussion and crosslink it here.

Raphael G

unread,
Dec 20, 2022, 10:06:44 AM12/20/22
to Django developers (Contributions to Django itself)
OK after looking at this some more and trying to write up a review aid, I'm giving up on this branch and trying to integrate DATABASE_URL support into Django proper.

A couple reasons:

- I misread the original mailing list thread which made me think there was a consensus on this branch, and there wasn't.
- There are a lot of tiny backend-specific things going on that are in because... well because django-database-url has that backend-specific behavior
- Lots of people want the cache backend as a part of this. I'd be happy to have it. But basically every cache backend has weirdness (what's a service URL for the dummy backend? do we really need one for the file backend?), so ... I don't even know what makes sense there, honestly.

So there's this dual thing of not wanting to "just" vendor django-database-url, but really the original branch that I tried to revive was either "just vendor that library" (battle tested) or "explore doing this for caches as well" (not battle tested).  On top of all of this really apart from postgres I'm having a hard time finding docs for URLs that are "industry standard" for much of anything.

So I'm trying to write up examples or justification for code I'm barely convinced of.

This might be a thing where if there was a workshop day at a conference then a group of people with diverse experiences on various systems could land on a convincing thing and build consensus, along with a wonderful patch. But staring at this just feels like a tarpit for me. Especially given the sort of pressure to get the API right on the first shot. 

Anyways.... if I were the only person working on Django I would sidestep all of this by throwing dj_database_url into django.contrib and relying on the years of usage by everyone as "proof" that the thing works. And I think the branch I tried reviving is "correct", I just don't have the background in most of these backends to know if it's right.

Raphael G

unread,
Dec 20, 2022, 10:10:09 AM12/20/22
to Django developers (Contributions to Django itself)
To be clear, "pressure to get the API right on the first shot" is a statement of fact about adding APIs to a heavily used project in general, not a comment on anything said in this mailing thread. Just that in this case there's a lot of ways to get the ergonomics wrong.

Jörg Breitbart

unread,
Dec 21, 2022, 8:15:06 AM12/21/22
to django-d...@googlegroups.com
> To be clear, "pressure to get the API right on the first shot" is a
> statement of fact about adding APIs to a heavily used project in
> general, not a comment on anything said in this mailing thread. Just
> that in this case there's a lot of ways to get the ergonomics wrong.

Yes and I think thats also a good reason to "bake smaller buns" (german
saying), and not trying to revamp too many things in one go. So how
about doing this iteratively - fixing the DB url thing first (as there
is prior art to learn from), and in a second step trying to deal with
the cache settings?

Cheers,
Jörg
Reply all
Reply to author
Forward
0 new messages