Snowflake db backend

745 views
Skip to first unread message

Scott Fought

unread,
Jan 22, 2021, 4:49:07 PM1/22/21
to Django developers (Contributions to Django itself)

We have written a Snowflake DB backend that we use internally in several projects and are considering releasing it for public use.  My preference would be to incorporate it as a core backend so the nuances of Snowflake SQL are considered when changes are made to the core system.  Is this something we should pursue?


Thanks,


Scott

Tom Forbes

unread,
Jan 22, 2021, 4:57:44 PM1/22/21
to django-d...@googlegroups.com
I think this should definitely be released as a third party package, and if there is enough community interest it might be considered for inclusion. We could definitely update the docs to link to the package though.

On a side note, is Snowflake fast enough for general purpose web apps? When we evaluated it the performance characteristics where similar to Redshift, optimised for large analytical aggregation rather than fast, smaller result sets typically seen with Django apps.

Tom

On 22 Jan 2021, at 21:49, Scott Fought <foug...@apps.disney.com> wrote:



We have written a Snowflake DB backend that we use internally in several projects and are considering releasing it for public use.  My preference would be to incorporate it as a core backend so the nuances of Snowflake SQL are considered when changes are made to the core system.  Is this something we should pursue?


Thanks,


Scott

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3e5034a8-6c75-4f28-9830-2c87b671ae86n%40googlegroups.com.

Adam Johnson

unread,
Jan 22, 2021, 5:22:24 PM1/22/21
to django-d...@googlegroups.com
Yes we would favour a third party backend. Merging the backend straight to core would impose the requirement that all changes to Django core consider Snowflake, which is a fairly steep cost when there's only one organization using it (at current). If you release as a third party package and show some adoption, it could then be argued that it's worth merging.

Additionally Scott - would love to see Disney appear on this page, perhaps the "Platinum" section would feel comfortable: https://www.djangoproject.com/fundraising/ ;)



--
Adam

Scott Fought

unread,
Jan 22, 2021, 5:34:14 PM1/22/21
to Django developers (Contributions to Django itself)
Snowflake (at this point) isn't as fast as PostgreSQL for smaller transactions.  PostgreSQL couldn't handle the performance we needed to drive our apps and we really didn't want to support a hybrid db app.  We give up a little speed for the convenience of having everything in one system.  At a recent Snowflake event, they stated that 30% of their users were using Snowflake for application backends.  They are also working to improve their execution time on smaller transactions.

Honestly, I don't think we'd release this as a third party package.  We have had to code around a lot of core db specific code to get Snow SQL to work.  If there was a hard API for DB backends my response would be different.  It would only make sense for us if this backend were part of standard Django testing and support consideration.

"Merging the backend straight to core would impose the requirement that all changes to Django core consider Snowflake"
That's what has made the process harder than it should be to support a non-core db.  It creates second class system support in favor of the core dbs that are code and tested against.  Remove all of the db backends and make them all third party and you'd see a lot more db support.

Scott

Tim Graham

unread,
Jan 22, 2021, 6:08:39 PM1/22/21
to Django developers (Contributions to Django itself)
Hi Scott,

Recently I've written backends for CockroachDB and Cloud Spanner. Perhaps those databases aren't as different as Snowflake, but I didn't have any major obstacles. I've had good success at contributing database backend API changes to Django to make future maintenance of those backends easier. It sounds like you might like to do the same.

Based on past discussions, I don't expect any more backends will be included in Django soon, nor did I expect any of the backends in core will be removed. That would only increase the maintenance burden since it would require coordinating changes across several repositories. Third party backends have to "catch up" to database API changes and new features at their convenience, but Django's development would be slowed if contributors had to be proficient in every database under the sun that has a backend for Django.

You can read past threads in the mailing list for more discussions:
Removing Oracle from Django core in 3.0
Moving database backends out of the core

Tom Forbes

unread,
Jan 22, 2021, 6:41:35 PM1/22/21
to django-d...@googlegroups.com
> Honestly, I don't think we'd release this as a third party package.  We have had to code around a lot of core db specific code to get Snow SQL to work

This is good feedback that is valuable for us to hear. Even if you don’t release it as a fully maintained package just understanding the pain points you encountered while developing this is great feedback that can help us improve.

Would you be willing to share specific code snippets here, or just general areas that you found challenging while developing this backend?

> That's what has made the process harder than it should be to support a non-core db.  It creates second class system support in favor of the core dbs that are code and tested against.  Remove all of the db backends and make them all third party and you'd see a lot more db support.

I can see a world where even core Django database backends are much more plugin oriented and could definitely bring some advantages. But that’s a non-trivial task with a lot of constraints especially for a fairly community driven project like Django.

Tom

On 22 Jan 2021, at 22:34, Scott Fought <foug...@apps.disney.com> wrote:

Snowflake (at this point) isn't as fast as PostgreSQL for smaller transactions.  PostgreSQL couldn't handle the performance we needed to drive our apps and we really didn't want to support a hybrid db app.  We give up a little speed for the convenience of having everything in one system.  At a recent Snowflake event, they stated that 30% of their users were using Snowflake for application backends.  They are also working to improve their execution time on smaller transactions.

Florian Apolloner

unread,
Jan 23, 2021, 10:36:38 AM1/23/21
to Django developers (Contributions to Django itself)
On Saturday, January 23, 2021 at 12:41:35 AM UTC+1 t...@tomforb.es wrote:
> Honestly, I don't think we'd release this as a third party package.  We have had to code around a lot of core db specific code to get Snow SQL to work

This is good feedback that is valuable for us to hear. Even if you don’t release it as a fully maintained package just understanding the pain points you encountered while developing this is great feedback that can help us improve.

Would you be willing to share specific code snippets here, or just general areas that you found challenging while developing this backend?

+1, if it were possible to show the full code (even if you are not going to maintain it in the longrun) would be very much helpful to improve.

Like Tim I wrote one or two database backends in the past (informix and the likes) and it was not to bad. That said, mine where relatively easy to implement due to their closeness to existing database and (somehwat) conformance with SQL standards. I'd love to see the code for snowflake and maybe we can identify a few areas that would need improvement.

Cheers,
Florian

Scott Fought

unread,
Jan 26, 2021, 5:38:18 PM1/26/21
to Django developers (Contributions to Django itself)
In order to release the code, I need to provide a sustainment plan per our open source committee.  Which is why I was reaching out.  I can share some of the challenges conforming Snowflake to the SQL model Django uses.

It sounds like we may have had similar experiences writing backends, but possibly we differ in completeness.  I was able to write a working backend that met all of our application needs in about 3 weeks.  Conforming to the unit tests is another story.  We pass ~90% of the tests, but that last 10% will take a lot more effort to complete.

For reference, we're using the snowflake-connector-python module, which implements PEP 249, v2.0.

Snowflake does not support lastrowid.  So, we grab the last ID inserted with a 'SELECT MAX(pk_name) FROM table_name'.  This is obviously prone to failure.  Assigning an ID during the INSERT would provide similar results on all backends.

The feature flag `supports_transactions` really means `supports_nested_transactions`.  Snowflake supports a single level of transaction, BEGIN + (ROLLBACK|COMMIT).  Multiple BEGINS contribute to the current (only) transaction.  Since I have to set this flag to False, no transactions are used, even ones that are supported and testing grinds to a crawl with all of the table truncations and repopulation.  Since Django normally operates in auto-commit mode, this isn't a huge deal.  Snowflake also doesn't support save points, but the feature flag seems to do what is expected when disabled.

Snowflake does not support column references without a FROM or JOIN clause.  I've only encountered this used in the unit tests.

I've been able to work around most of the function differences by adding as_snowflake methods to the appropriate classes.  There are a few cases like JSON_OBJECT that don't translate well, which work, but not entirely as expected.

A search for 'connection.vendor == ' in the core code shows one example of where core backends are given privileged access to core Django inner workings that 3rd party backends don't.

Please take these comments as observations, not complaints.  I understand project evolution and the resistance to change something that doesn't seem broken.  Maybe keep some of these thoughts in mind when a change is made to the core backends or the compiler and consider how other backends will need to implement that new feature.

Scott

Florian Apolloner

unread,
Jan 27, 2021, 4:08:13 AM1/27/21
to Django developers (Contributions to Django itself)
Hi Scott,

Thank you for your response, this is very helpful.

On Tuesday, January 26, 2021 at 11:38:18 PM UTC+1 foug...@apps.disney.com wrote:
Snowflake does not support lastrowid.  So, we grab the last ID inserted with a 'SELECT MAX(pk_name) FROM table_name'.  This is obviously prone to failure.  Assigning an ID during the INSERT would provide similar results on all backends.

Uffff, the 'SELECT MAX()' is not going to fly well, you are right. Assigning an ID during INSERT has it's own problems. In postgresql it would be possible because you could just select the next value from the created sequence (more or less), but with IDENTITY columns this might get harder. I do not think there is a sensible way to do this in MySQL at all. While lastrowid support is a nice to have, Django should work (mostly?) without it: https://github.com/django/django/blob/464a4c0c59277056b5d3c1132ac1b4c6085aee08/django/db/models/sql/compiler.py#L1372-L1387 -- the requirement here is that your database is at least able to return values after insert. From the looks of it, it does not I think? Or differently put: Which ways does snowflake offer to get an ID? Solely by providing it directly into insert?

The feature flag `supports_transactions` really means `supports_nested_transactions`.  Snowflake supports a single level of transaction, BEGIN + (ROLLBACK|COMMIT).  Multiple BEGINS contribute to the current (only) transaction.  Since I have to set this flag to False, no transactions are used, even ones that are supported and testing grinds to a crawl with all of the table truncations and repopulation.  Since Django normally operates in auto-commit mode, this isn't a huge deal.  Snowflake also doesn't support save points, but the feature flag seems to do what is expected when disabled.

Hu, which database support nested BEGIN? As far as I am aware Django does never nest BEGINs -- do you have an example for me? I'd very much like to fix this. From a quick glance at the code we only start a transaction if we are not already in one: https://github.com/django/django/blob/master/django/db/transaction.py#L196-L208

Snowflake does not support column references without a FROM or JOIN clause.  I've only encountered this used in the unit tests.

Ok, see below.

I've been able to work around most of the function differences by adding as_snowflake methods to the appropriate classes.  There are a few cases like JSON_OBJECT that don't translate well, which work, but not entirely as expected.

Perfect, this sounds as if our extension system works as intended in this area.

A search for 'connection.vendor == ' in the core code shows one example of where core backends are given privileged access to core Django inner workings that 3rd party backends don't.

We have been working to get rid of those: https://github.com/django/django/commit/275dd4ebbabbbe758c7219a3d666953d5a7b072f#diff-69f332030b6f25f8f4d95842548d847139ee2cc163aef18f1c8d83b90f3f9e5f -- This is solely in 3.2, but Tim can suggest a workaround for earlier versions (he was doing something similar in his cockroachdb tests before 3.2).

Please take these comments as observations, not complaints.  I understand project evolution and the resistance to change something that doesn't seem broken.  Maybe keep some of these thoughts in mind when a change is made to the core backends or the compiler and consider how other backends will need to implement that new feature.

No offense taken at all. If somehow possible I'd like to encourage you to work with us on your pain points. I think at least some of those are addressed or at least addressable. I am happy to offer code and fixes, but I'll need a few more details especially wrt transaction handling.

Cheers,
Florian

Taylor

unread,
Apr 21, 2021, 1:29:46 AM4/21/21
to Django developers (Contributions to Django itself)
Hey Everyone,

Sorry to open up an old thread. 

Tim - were you ever able to open source your Snowflake backend? We would love to use it and even devote resources (developers or funding for contractors) to improving it and getting the tests passing. At Cedar, we were planning on creating our own Snowflake backend, but it would be great to not duplicate work here. What are your thoughts?

Best,
Taylor

Taylor

unread,
Apr 21, 2021, 1:36:02 AM4/21/21
to Django developers (Contributions to Django itself)
Sorry, I meant to write Scott, not Tim. I shouldn't write emails late at night.

- Taylor

Tim Graham

unread,
Dec 4, 2021, 8:29:00 PM12/4/21
to Django developers (Contributions to Django itself)
Cedar Cares, Inc. funded me to write a Snowflake backend. The first alpha for Django 3.2 was released this week. Testers and feedback are welcome!
I didn't find the implementation particularly onerous, but perhaps there are things that the team at Disney implemented which I haven't addressed. I haven't triaged 100% of Django's tests, but I think I've covered the major stuff.

Regarding what Scott pointed out about Django's TestCase assuming support for nested atomic blocks when DatabaseFeatures.supports_transactions=True: I addressed this by running Django's test suite with a forked copy of Django that (among other things) modifies TestCase so that when running with Snowflake, it doesn't do any setup at the class level (which is usually does in its own Atomic [cls_atomics]) and instead does the following for each test (in TestCase._fixture_setup()):
- starts a transaction [self._enter_atomics()]
- populates class-level data [self.setUpTestData()]
- loads initial data from migrated apps; loads fixtures [super()._fixture_setup()]
- runs the test
- rolls back the transaction [self._rollback_atomics(self.atomics)]
This is on the order of 10x faster than not using transactions at all and truncating the tables after each test.

Later I thought that a faster approach could be to do the class-level setup, then run each test in its own transaction, then truncate the tables at the end of the test class. I'm guessing it might be faster for some test classes and not for others, depending on how much work setUpTestData() does.

I hope to incorporate this TestCase work into Django so that users of this backend can take advantage of it too (with a stock Django the backend can't transactions to speed up tests), but this won't happen sooner than Django 4.1 since such a change doesn't qualify for a backport in Django.

I've added some other DatabaseFeatures in my Django fork that I also hope to contribute upstream (enforces_foreign_key_constraints, enforces_unique_constraints, supports_indexes, supports_tz_offsets... all False for Snowflake).
Reply all
Reply to author
Forward
0 new messages