My developers have run amok

160 views
Skip to first unread message

Dan Davis

unread,
Nov 6, 2018, 2:02:57 PM11/6/18
to Django users
My developers, who have come from ColdFusion, are fond of this:

cursor = connection.cursor()
cursor.execute()
data = [r for r in cursor]
return render(request, some_template_name, {'data': data})

The problem with this is that the database cursor reference will be leaked.

How do I integrate into a unit test environment (single-threaded, single-process), a check for no outstanding cursors?

Jason

unread,
Nov 6, 2018, 2:32:52 PM11/6/18
to Django users
refactor to use the ORM for those bits?  and implement a team styleguide that says no direct connections unless its proven to work better?

Dan Davis

unread,
Nov 6, 2018, 3:40:21 PM11/6/18
to Django users


On Tuesday, November 6, 2018 at 2:32:52 PM UTC-5, Jason wrote:
refactor to use the ORM for those bits?  and implement a team styleguide that says no direct connections unless its proven to work better?

I'm looking to identify the memory consuming bits to allow refactor.   Not a bad idea to do as you suggest.
 

Jason

unread,
Nov 6, 2018, 6:22:25 PM11/6/18
to Django users
oh boy...  Good luck.

you can use connection.cursor() as a context manager, which will clean up connections after it exits.  You could grep the project for connection.cursor and add the context manager.


how did none of the devs know to clean up after themselves, either with a context manager or try-finally?

Dan Davis

unread,
Nov 6, 2018, 6:40:15 PM11/6/18
to django...@googlegroups.com
I am less certain this is the problem now.   It may simply be that my project is one of the larger on the server.   I did a quick audit, and although they are new to django, they either used connection.cursor() as a context manager, or called close explicitly.

I will try to develop a middleware around pympler, tracemalloc, or objgraph.

--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/TZ682yh2QfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/0b436954-3e31-4dd7-b1ae-2bf75d72f472%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Davis

unread,
Nov 6, 2018, 6:42:23 PM11/6/18
to django...@googlegroups.com
This will likely do it - tracking as a django middleware, I suspect too much will still be lazy:

Dan Davis

unread,
Feb 26, 2019, 8:18:00 PM2/26/19
to Django users
For the group, the eventual culprit was not complicated.  It is a model with a BinaryField that holds a file's contents. The Django documentation advises against this, but why?   Well, on-premise with big-iron database like Oracle, storing the file in the database is attractive.  So, what's the problem?

      ModelWithFileContents.objects.all()

There you go, even if you are not using the file, its contents will be fetched into memory.
The solution if it must be at the Django level is to use defer() properly, and add it to the ModelManager.

What I did is to make sure the developer *always* used a database view that doesn't have the binary field for most operations.

Derek

unread,
Mar 4, 2019, 2:10:31 AM3/4/19
to Django users
Just because something can be done, does not mean it should be done.

I have always avoided putting large binary files into the DB - why?  Primarily because the information in them is "dead" - very hard to search or filter, which rather defeats the point of using a DB.  Rather leave them on disc and add a pointer to each of them.  For one app, I did convert all the data in their files (typically Excel or PDF) into "raw text" and stored that text in the DB so it was, at least, searchable (but obviously not really viewable in that state).

Also, a call like:

ModelWithFileContents.objects.all()

Can be quite overwhelming unless you know what you are doing; rather look to extract specific fields that you know you want to display/access.  I would imagine that the only time you are going to want to retrieve that BLOB (assuming you stick to your current design) is per individual record "on request".

Dan Davis

unread,
Mar 4, 2019, 9:30:04 AM3/4/19
to django...@googlegroups.com
Derek, it all depends on where you are.  I'm \at a government agency.  The wait time to get an NFS directory (yup, really) where I can write my data between multiple redundant web servers is weeks.  After that, there are size and synchronization issues.   So, there are other reasons than table design to consider.  After all, we don't do the work to put our tables in 1NF, 2NF, 3NF just because - we do it if our data is going to be used for OLAP, but prefer less normalized structures for OLTP.

I just think it is interesting that the standard advice about files in the database has to do with Django's standard operations, and maybe ignores the realities on the ground for some developers.

--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/TZ682yh2QfE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Derek

unread,
Mar 7, 2019, 6:34:53 AM3/7/19
to Django users
I am not sure I understand about the "size and synchronization" but I assume that is to do with them not giving you adequate infrastructure.

I don't think the DB advice is in any way specific to Django - I'd make the same recommendation regardless of the framework being adopted - but obviously there are sometimes constraints that force our hand to do things in non-optimal ways.  You have my sympathies (for what they are worth).

dans...@gmail.com

unread,
May 2, 2019, 11:40:32 AM5/2/19
to Django users
I've come around to your way of thinking now that I'm considering django-storages and AWS deployments.

Robert Wahoo

unread,
May 2, 2019, 11:43:56 AM5/2/19
to django...@googlegroups.com

Currently in the process of evaluating Azure pipeline for CI/CD.  So far, it’s been fairly simple to setup.  Setup an Azure app this morning for a python/Django project, took 5 mins.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.


To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

stephenlig...@gmail.com

unread,
May 2, 2019, 12:31:17 PM5/2/19
to Django users
Maybe also do a hash of the file so you can search the DB with the file itself if necessary, but just a suggestion as I don't know your use cases. 

Dan Davis

unread,
May 22, 2019, 1:30:03 PM5/22/19
to Django users
Sure - setting it up is easy.   There are hard questions about database mutations however.   If you drop column bar from table Fubar, then the old code is still querying for buth Fu and Bar:

SELECT id, fu, bar FROM fubar ....

On-premise, this is the case for maybe 10 seconds, tops.   In the cloud, this could be the case for 30 minutes.

We've decided that this is the developers responsibility, and not the responsibility of DevOps CI/CD, just like the developer needs to either use cache-busting techniques when changing JS and CSS, or educate his users to press Shift-F5 when things go wrong.

Mike Dewhirst

unread,
May 23, 2019, 5:50:45 PM5/23/19
to django users
On 23/05/2019 3:30 am, Dan Davis wrote:
> Sure - setting it up is easy.   There are hard questions about
> database mutations however.   If you drop column bar from table Fubar,
> then the old code is still querying for buth Fu and Bar:
>
> SELECT id, fu, bar FROM fubar ....
>
> On-premise, this is the case for maybe 10 seconds, tops. *In the
> cloud, this could be the case for 30 minutes.*

Are you sure?

My (possibly flawed) understanding is that if you can make the change in
a single transaction, no-one else can see what is happening until it is
committed. Effectively that would be an instantaneous change.

Depending on your data architecture you might need to make some parts
read-only for the 30 minutes prior to committing.

>
> We've decided that this is the developers responsibility, and not the
> responsibility of DevOps CI/CD, just like the developer needs to
> either use cache-busting techniques when changing JS and CSS, or
> educate his users to press Shift-F5 when things go wrong.
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-users...@googlegroups.com
> <mailto:django-users...@googlegroups.com>.
> To post to this group, send email to django...@googlegroups.com
> <mailto:django...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/6cd917c1-814d-4839-9849-898ff03618c9%40googlegroups.com
> <https://groups.google.com/d/msgid/django-users/6cd917c1-814d-4839-9849-898ff03618c9%40googlegroups.com?utm_medium=email&utm_source=footer>.

Dan Davis

unread,
Jun 5, 2019, 4:24:49 PM6/5/19
to django...@googlegroups.com
Are you sure?
My (possibly flawed) understanding is that if you can make the change in
a single transaction, no-one else can see what is happening until it is
committed. Effectively that would be an instantaneous change.
Depending on your data architecture you might need to make some parts
read-only for the 30 minutes prior to committing.
The transaction can be atomic if it is a database such as PostgreSQL that offers DDL changes in a transaction.   The issue is that rather than updating the code for the web application in place, and then running migrations, my CI/CD pipeline will instead bring up an entirely new set of servers.   This makes a more isolated change to the environment, and deployment also will take care of OS patching and such, but the old web servers will still be online and querying the database.   They may will be querying a column that no longer exists, which is a form of "Read after Write" synchronization problem.

I think with an alternative CI/CD, in the cloud or elsewhere, this would not be an issue.
Reply all
Reply to author
Forward
0 new messages