“Abusing BinaryField” warning about binary files in DB

275 views
Skip to first unread message

Antoine Pietri

unread,
Feb 25, 2018, 6:21:42 PM2/25/18
to Django developers (Contributions to Django itself)
Hi!

In the documentation, the BinaryField has a warning called “Abusing
BinaryField” that states:

> Although you might think about storing files in the database, consider that
> it is bad design in 99% of the cases. This field is not a replacement for
> proper static files handling.

https://docs.djangoproject.com/en/2.0/ref/models/fields/#django.db.models.BinaryField

I agree with the intention of this warning: we don't want people to
start using their database for image uploads, large static files, or
thinking they can completely replace proper static file serving with a
databse.

That said, I think this warning is a huge overstatement. I think the
moment you're wondering "maybe this would be a good usecase to store
it in my database", your case for storing files in database might not
be absurd at all. There are tradeoffs, that are documented here, for
instance: https://wiki.postgresql.org/wiki/BinaryFilesInDB . It's
definitely not as clear-cut as "don't do it". People should be aware
of the tradeoffs instead of just dismissing the possibility.

Can I suggest replacing the warning by something like this?:

> Although you might think about storing files in the database, consider that
> it might be a bad design choice. This field is not a replacement for proper
> static files handling.
>
> That said, there might be cases where you do want the guarantees that the
> database offers you for binary files. Be sure to be aware of the
> trade-offs[1] before you decide to do so.
> [1]: https://wiki.postgresql.org/wiki/BinaryFilesInDB

As I'm not subscribed to this mailing-list, I would appreciate to be
CC'd to the responses :-)

Cheers,

-- 
Antoine Pietri

Curtis Maloney

unread,
Feb 25, 2018, 7:06:38 PM2/25/18
to django-d...@googlegroups.com, Antoine Pietri
On 02/26/2018 08:30 AM, Antoine Pietri wrote:
> Can I suggest replacing the warning by something like this?:
>
>> Although you might think about storing files in the database, consider that
>> it might be a bad design choice. This field is not a replacement for proper
>> static files handling.
>>
>> That said, there might be cases where you do want the guarantees that the
>> database offers you for binary files. Be sure to be aware of the
>> trade-offs[1] before you decide to do so.
>> [1]: https://wiki.postgresql.org/wiki/BinaryFilesInDB
> <https://wiki.postgresql.org/wiki/BinaryFilesInDB>

As discussed on IRC, I think the wording here is a bit weak... "it might
be" probably ought be "it is probably".


--
Curtis

Tom Forbes

unread,
Feb 25, 2018, 7:22:47 PM2/25/18
to Antoine Pietri, django-d...@googlegroups.com

Hey Antonie,

Personally I’m quite against changing that warning. I have only ever seen one application where the use of an in-database file is appropriate and they where using the FILESTREAM type in SQL Server which offers some pretty advanced semantics compared to other databases (more akin to Django’s file storage than a BLOB column).

I’ve seen a lot of beginners use BLOB/byte fields where it’s really not needed and struggle with some insane performance issues due to it - especially with Django fetching all columns in a model by default. Also the link you gave (and thanks for linking, it’s an interesting read) is obviously Postgres specific, the issues you might face doing this are very vendor specific and non-portable - sqlite recommends against storing anything larger than 100kb in a row for example.

I feel like the warning should implicitly say “do not do this, really don’t, but if you’re super super super sure you 100% need to then you’re going to disregard this warning anyway”, which the current one does quite well. To put it another way, if you’re at the point where you need to do this you’re way past reading the warning in the Django docs, and we should deter people who might make the wrong choice at the start.

Tom

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/28cec919-ae57-4eed-960b-d598a01c2711%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Adam Johnson

unread,
Feb 25, 2018, 8:23:12 PM2/25/18
to django-d...@googlegroups.com, Antoine Pietri
Did you know Facebook store their assets in MySQL, because it's the fastest replicated super-reliable thing to put them in? https://secure.phabricator.com/book/phabflavor/article/soon_static_resources/ (near the end of 'Caches and Serving Content')

I am in favour of weakening the warning

To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Yo-Yo Ma

unread,
Feb 25, 2018, 10:49:40 PM2/25/18
to Django developers (Contributions to Django itself)
The nice thing about leaving the warning as stern as it is is that anybody who is absolutely sure that they need to store files this way isn’t going to stop because of the warning to begin with; while weaking the warning will most assuredly lead to “Django is Slow” posts by newcomers that didn’t know SELECT * would be slow when there’s 3-5MB of data per row.

Antoine Pietri

unread,
Feb 26, 2018, 7:38:23 AM2/26/18
to django-d...@googlegroups.com
Hey all,

On Mon, Feb 26, 2018 at 1:06 AM, Curtis Maloney <cur...@tinbrain.net> wrote:
> As discussed on IRC, I think the wording here is a bit weak... "it might be"
> probably ought be "it is probably".

Sure, sgtm.

On Mon, Feb 26, 2018 at 1:22 AM, Tom Forbes <t...@tomforb.es> wrote:
> Personally I’m quite against changing that warning. I have only ever seen
> one application where the use of an in-database file is appropriate and they
> where using the FILESTREAM type in SQL Server which offers some pretty
> advanced semantics compared to other databases (more akin to Django’s file
> storage than a BLOB column).

I think you might be underestimating some good use cases of using
databases for binary files. On top of my head, some random examples:

- PDF attached to objects (identification forms, etc)
- small spreadsheets to review
- word documents
- user submitted source code

Those are small files that will have a small performance impact on
your database, and that will greatly benefit from having ACID
guarantees and no need for consistency synchronisation between the
database and the filesystem.

> I’ve seen a lot of beginners use BLOB/byte fields where it’s really not
> needed and struggle with some insane performance issues due to it -
> especially with Django fetching all columns in a model by default.

That's a really good point I hadn't considered. Django fetching
everything by default is a big pitfall that requires to be extra
careful, and people should definitely be made aware of that.

> Also the
> link you gave (and thanks for linking, it’s an interesting read) is
> obviously Postgres specific, the issues you might face doing this are very
> vendor specific and non-portable - sqlite recommends against storing
> anything larger than 100kb in a row for example.

Sure, but it wouldn't be the only thing in the docs with a warning
that states "it might be a bad idea depending on your database
vendor". I don't think we should deter people from doing the things
their vendor empowers them to do, but rather make them aware of the
differences.

> I feel like the warning should implicitly say “do not do this, really don’t,
> but if you’re super super super sure you 100% need to then you’re going to
> disregard this warning anyway”, which the current one does quite well. To
> put it another way, if you’re at the point where you need to do this you’re
> way past reading the warning in the Django docs, and we should deter people
> who might make the wrong choice at the start.

But my whole point is that there *are* cases where beginners might be
scared by that warning although it would be easier and better for them
to use a database for what they are doing.

On Mon, Feb 26, 2018 at 2:22 AM, Adam Johnson <m...@adamj.eu> wrote:
> Did you know Facebook store their assets in MySQL, because it's the fastest
> replicated super-reliable thing to put them in?
> https://secure.phabricator.com/book/phabflavor/article/soon_static_resources/
> (near the end of 'Caches and Serving Content')

Oh, that's neat. Although I don't really expect beginners to have
facebook-level problematics for distributing their assets. Definitely
showcases that it's not always a bad idea though.

> The nice thing about leaving the warning as stern as it is is that
> anybody who is absolutely sure that they need to store files this way
> isn’t going to stop because of the warning to begin with;

Again, the point is that sometimes people are *not* absolutely sure,
but they would greatly benefit from doing what their DB vendor
empowers them to do. This is also the case for experienced people, and
I think we want to point them to the right direction to know more
rather than just dismiss the idea altogether.

> while
> weaking the warning will most assuredly lead to “Django is Slow” posts
> by newcomers that didn’t know SELECT * would be slow when there’s
> 3-5MB of data per row.

Well, sure, if your goal is to stop having people bug you, being
misleadingly dismissive is always a good idea. I'm not sure that's
what we should aim for, though. I'd rather inform people of the
drawbacks so that they can make an informed choice, even if that means
having to deal with people making the wrong choice with good advice.

After reading all your (very interesting) comments, I'd like to update
my documentation suggestion:

> Although you might think about storing files in the database, consider
> that it is probably a bad design choice. This field is not a
> replacement for proper static files handling.
>
> There might be some edge-cases where you do want the guarantees that
> the database offers you for small binary files, depending on your
> database vendor. Be sure to be aware of the general and
> vendor-specific trade-offs and limitations[1][2] before you decide to
> do so.

> You should also consider another performance pitfall: Django fetches
> all rows of a table by default, and thus requires extra-care if you
> create large rows. You should properly limit the amount of data
> fetched from the table by using values() when needed.

> [1]: https://wiki.postgresql.org/wiki/BinaryFilesInDB
> [2]: https://www.sqlite.org/intern-v-extern-blob.html

Cheers,

--
Antoine Pietri
Reply all
Reply to author
Forward
0 new messages