Support for a binary storage field?

1,278 views
Skip to first unread message

Simon G.

unread,
Mar 26, 2007, 6:40:24 AM3/26/07
to Django developers
Hi folks.

Digging through the tickets, it appears we have a few near-duplicates
requesting some form of binary storage inside the database. Whilst I'm
-0 (RDBMSes are not for binary data!), but I can see the use for them
in some circumstances (#2417 lists a few), and these are likely to be
a fairly commonly requested feature.

We have an ancient ticket at #250 from Jacob detailing the need for
this, #652 wants an upload-to-database facility for Image- and
FileFields. Finally, there's a patch in #2417 which implements a small
binary field which JKM says is "very very good". Marc Fargas has added
docs, etc.

So - if we do want a BinaryField we could use #2417 and make it
suitable for larger binary stores (e.g. the VARBINARY used for MySQL
has a max length of 255 bytes - perfect for the small bin. chunks
wanted in #2417, but not for larger data), and then hook it up to
Image/FileFields for #652.

An alternate solution is to check in #2417 for small binary chunks,
and then hold 652 back until we decide if we want a LargeBinaryField
for large binary chunks suitable for file uploads.

Relevant Tickets:

http://code.djangoproject.com/ticket/250
http://code.djangoproject.com/ticket/652
http://code.djangoproject.com/ticket/2417

Cheers,
Simon

Noah Slater

unread,
Mar 26, 2007, 6:48:58 AM3/26/07
to Django developers
Whether you think RDMSSes are for binary data is mostly by the by.

IMO the patch for #2417 is sub-optimal as it:

1) Subclasses the Char field.
2) Does not provide an intelligent manipulator.

The solution to this problem is to provide a form upload field with
the addition of a checkbox to signify deletion of the currently saved
binary data. There is no sensible way to SHOW the current binary data
- we can leave that up to the application designer.

David Danier

unread,
Mar 26, 2007, 6:59:18 AM3/26/07
to django-d...@googlegroups.com
I think a BinaryField could even help getting less data saved into the
database, as some binary data fits well into the database, but must be
saved as ASCII now (converted or by using a different format).

For example the full-history-branch uses pickle (cPickle) to serialize
the data of an object. pickle known multiple methods to store the data.
By default is creates a ASCII-dump, but you can change this behavior.
From the docs (http://docs.python.org/lib/node316.html):
"By default, the pickle data format uses a printable ASCII
representation. This is slightly _more voluminous_ than a binary
representation."
Besides the other methods provide some optimization:
"Protocol version 2 was introduced in Python 2.3. It provides much _more
efficient_ pickling of new-style classes."

Just my 2 cents,
David Danier

Noah Slater

unread,
Mar 26, 2007, 7:14:58 AM3/26/07
to Django developers
Saving binary data as printable ASCII seems extremely hack^H^H^H
suboptimal to me.

To make a case for binary data - how about when you want to store a
small image for a UserProfile.

Saving to the local file-system doesn't work when you are clustering
your application servers.

mario__

unread,
Mar 26, 2007, 9:45:25 AM3/26/07
to Django developers
On 26 mar, 07:14, "Noah Slater" <nsla...@gmail.com> wrote:
> Saving binary data as printable ASCII seems extremely hack^H^H^H
> suboptimal to me.
>

You can have a TextField and save data as base64 or save data as
binary. In postgres binary data is not saved inside database, a link
is saved only, and I'm not sure if these kind of data could be
replicated through slony-1 or similar.

> To make a case for binary data - how about when you want to store a
> small image for a UserProfile.
>
> Saving to the local file-system doesn't work when you are clustering
> your application servers.

it depends the FS you're using.

Noah Slater

unread,
Mar 26, 2007, 10:04:16 AM3/26/07
to Django developers

> it depends the FS you're using.

To get round this problem I am using NFS. Do you have any other
suggestions?

Either way, I still think it would be nice to be able to store binary
blobs directly via models.

mario__

unread,
Mar 26, 2007, 10:18:21 AM3/26/07
to Django developers
On 26 mar, 10:04, "Noah Slater" <nsla...@gmail.com> wrote:
> > it depends the FS you're using.
>
> To get round this problem I am using NFS. Do you have any other
> suggestions?
>

hum, you could do some test using GFS

> Either way, I still think it would be nice to be able to store binary
> blobs directly via models.

I'm agree with you. However, Django must be database independent, so
sqlite, MySQL and others are able to hand binary data? One solution is
write a Field to convert everything _to _base64 on write and convert
everything _from_ base64 on read. That's way isn't necessary what db
are you using.

Jacob Kaplan-Moss

unread,
Mar 26, 2007, 11:03:38 AM3/26/07
to django-d...@googlegroups.com
On 3/26/07, Simon G. <d...@simon.net.nz> wrote:
> So - if we do want a BinaryField we could use #2417 and make it
> suitable for larger binary stores (e.g. the VARBINARY used for MySQL
> has a max length of 255 bytes - perfect for the small bin. chunks
> wanted in #2417, but not for larger data), and then hook it up to
> Image/FileFields for #652.
>
> An alternate solution is to check in #2417 for small binary chunks,
> and then hold 652 back until we decide if we want a LargeBinaryField
> for large binary chunks suitable for file uploads.

+1 on having a BinaryField. I'd actually like to see BinaryField be
the "larger" binary field, and have a SmallBinaryField alongside for
databases with those types.

-1 on allowing File/ImageField to be stored in the database. That's
bad design 99% of the time, and will needlessly complicate file upload
code.

Jacob

Todd O'Bryan

unread,
Mar 26, 2007, 11:22:15 AM3/26/07
to django-d...@googlegroups.com
On Mon, 2007-03-26 at 10:03 -0500, Jacob Kaplan-Moss wrote:
> -1 on allowing File/ImageField to be stored in the database. That's
> bad design 99% of the time, and will needlessly complicate file upload
> code.
>
If people want to do it themselves, it's pretty easy to create a DBFile
model with name and data members, so leaving it out of core doesn't
preclude people from doing it if they're set on the idea. (I just want a
BinaryField; I'll take it however I can get it.)

Marc Fargas Esteve

unread,
Mar 26, 2007, 7:01:39 PM3/26/07
to django-d...@googlegroups.com
Hi,
If you provide a BinaryField it's just a matter of time that "hacks" will start to go out on blogs, the wiki or even django-users to get ImageField and FileField on the database (there's a hack on this already), maybe it's 99% bad but if those fields are provided inside django it will be much better than having lots of hackish ways around.

And anyway, there's still a 1% of cases on which it's good design, normally cases of big applications.

An argument for supporting Image/Field on DB:
    Consider a case of multiple frontends with a big big database, having File and Image fields on filesystem forces you to keep the filesystem in sync among frontends. Now imagine you upload a file which is i.e. the image for an article; The article is inserted on the database and the file on the filesystem. All frontends will **immediatelly** show up the article, but only one will have the image! unless you start playing around with NFS or other networked filesystems.

  It can also be a bit messy to do Point In Time recoveries, with everything on the database you can to a nice PTR without any trouble, if there are things on the filesystem you must make sure both things get recovered to the same point in time, and it's rare to see filesystems backed up **permanently** while point in time recoveries in databases (atleast postgresql) are heavily documented and a good resource for some kind of applications.

  Third case; Imagine having one single directory holding a project but you run multiple instances of it over different databases (yes, doing tricky things to settings), having things on the filesystem makes things a bit harder.

I'm +1 on providing database backed File and Image fields while heavily discouraging it's use on the documentation by providing clear examples of the 99% and 1% sides of the thing so users are aware of which storage method should they choose.

Also +1 on the BinaryField, then atleast if one **really** needs to store things on DB it could be done :)

Cheers,
Marc

Honza Král

unread,
Mar 26, 2007, 7:26:59 PM3/26/07
to django-d...@googlegroups.com
On 3/27/07, Marc Fargas Esteve <tele...@gmail.com> wrote:
> Hi,
> If you provide a BinaryField it's just a matter of time that "hacks" will
> start to go out on blogs, the wiki or even django-users to get ImageField
> and FileField on the database (there's a hack on this already), maybe it's
> 99% bad but if those fields are provided inside django it will be much
> better than having lots of hackish ways around.
>
> And anyway, there's still a 1% of cases on which it's good design, normally
> cases of big applications.

For big applications it's especially important not to have their DB
serving static files. One of the first optimization advice that comes
with Django is:
use separate web server for serving static files and
use separate box for your DB

this would be like doing exact opposite - put the load of serving
static files onto the already busy DB server

>
> An argument for supporting Image/Field on DB:
> Consider a case of multiple frontends with a big big database, having
> File and Image fields on filesystem forces you to keep the filesystem in
> sync among frontends. Now imagine you upload a file which is i.e. the image
> for an article; The article is inserted on the database and the file on the
> filesystem. All frontends will **immediatelly** show up the article, but
> only one will have the image! unless you start playing around with NFS or
> other networked filesystems.

network filesystem or NAS is the 'right solution' here, even if NFS is
not the best solution performance-wise (NFS is slow) its still WAY
faster than DB and can be on a separate box. again, this is important
especially for large projects, that need their DB server to be
unburdened by static junk

> It can also be a bit messy to do Point In Time recoveries, with everything
> on the database you can to a nice PTR without any trouble, if there are
> things on the filesystem you must make sure both things get recovered to the
> same point in time, and it's rare to see filesystems backed up
> **permanently** while point in time recoveries in databases (atleast
> postgresql) are heavily documented and a good resource for some kind of
> applications.

OK, goo point here, but its very minor issue, basically only media
rich servers will use this and they would completely kill their DB
server with the traffic (if it's that high that ''rsync'' cannot be
used every 2 minutes)

> Third case; Imagine having one single directory holding a project but you
> run multiple instances of it over different databases (yes, doing tricky
> things to settings), having things on the filesystem makes things a bit
> harder.

just put the DIR in the settings as well, I see no point here

>
> I'm +1 on providing database backed File and Image fields while heavily
> discouraging it's use on the documentation by providing clear examples of
> the 99% and 1% sides of the thing so users are aware of which storage method
> should they choose.
>
> Also +1 on the BinaryField, then atleast if one **really** needs to store
> things on DB it could be done :)

even though you raised some good arguments, I still believe BLOB's are
mostly evil, and will be misused.

that said, I am -0 on this. its inviting people to shoot themselves in the foot.

I would be +1 on a BIG RED SIGN saying not to use it unless you REALLY
KNOW what you are doing... ;)

>
> Cheers,
> Marc
>
>
> On 3/26/07, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> >
> > On 3/26/07, Simon G. <d...@simon.net.nz> wrote:
> > > So - if we do want a BinaryField we could use #2417 and make it
> > > suitable for larger binary stores (e.g. the VARBINARY used for MySQL
> > > has a max length of 255 bytes - perfect for the small bin. chunks
> > > wanted in #2417, but not for larger data), and then hook it up to
> > > Image/FileFields for #652.
> > >
> > > An alternate solution is to check in #2417 for small binary chunks,
> > > and then hold 652 back until we decide if we want a LargeBinaryField
> > > for large binary chunks suitable for file uploads.
> >
> > +1 on having a BinaryField. I'd actually like to see BinaryField be
> > the "larger" binary field, and have a SmallBinaryField alongside for
> > databases with those types.
> >
> > -1 on allowing File/ImageField to be stored in the database. That's
> > bad design 99% of the time, and will needlessly complicate file upload
> > code.
> >
> > Jacob
> >
> > > >
> >
>


--
Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

Noah Slater

unread,
Mar 27, 2007, 5:17:49 AM3/27/07
to Django developers
> this would be like doing exact opposite - put the load of serving
> static files onto the already busy DB server

All of your arguments are based on the assumption that files would be
served from the database.

The only sane environment would be to store the files in the database
as a reference but serve them from a filesystem/memcahce/squid cache.
Only when the cached resource expires do you make a round-trip to the
database again.

mario__

unread,
Mar 27, 2007, 8:24:12 AM3/27/07
to Django developers
On Mar 26, 7:26 pm, "Honza Král" <honza.k...@gmail.com> wrote:
>
> network filesystem or NAS is the 'right solution' here, even if NFS is
> not the best solution performance-wise (NFS is slow) its still WAY
> faster than DB and can be on a separate box. again, this is important

GFS should be better.

>

Simon G.

unread,
Mar 28, 2007, 6:57:59 AM3/28/07
to Django developers
Ok, my reading of the general consensus here is that everyone thinks a
BinaryField is a good idea, so I've kept #2417 as accepted, with the
patch needing a few improvements.

The second issue - a (say) BinaryStorageField for large bin. data
hooked up to Image/File uploads seems to be one that's wanted by a few
people, and strongly disliked by a number of others (about 50/50), so
I've marked it as wontfix for the time being. As Todd suggested, it
shouldn't be too hard to implement yourself if you do need this
functionality, however, if someone wants to write up a proposal /
initial code for this, then please do so and re-open #652

-- Simon

Todd O'Bryan

unread,
Mar 28, 2007, 8:17:47 AM3/28/07
to django-d...@googlegroups.com
On Wed, 2007-03-28 at 03:57 -0700, Simon G. wrote:
> Ok, my reading of the general consensus here is that everyone thinks a
> BinaryField is a good idea, so I've kept #2417 as accepted, with the
> patch needing a few improvements.
>
Is this BinaryField the BLOB variety or the BINARY/VARBINARY variety?

I think Jacob expressed a preference for BinaryField allowing lots of
data and SmallBinaryField allowing the user to set the max size with a
parameter, and I think that's probably the right way to go to remain
consistent with what's already there.

Todd

Reply all
Reply to author
Forward
0 new messages