Digging through the tickets, it appears we have a few near-duplicates
requesting some form of binary storage inside the database. Whilst I'm
-0 (RDBMSes are not for binary data!), but I can see the use for them
in some circumstances (#2417 lists a few), and these are likely to be
a fairly commonly requested feature.
We have an ancient ticket at #250 from Jacob detailing the need for
this, #652 wants an upload-to-database facility for Image- and
FileFields. Finally, there's a patch in #2417 which implements a small
binary field which JKM says is "very very good". Marc Fargas has added
docs, etc.
So - if we do want a BinaryField we could use #2417 and make it
suitable for larger binary stores (e.g. the VARBINARY used for MySQL
has a max length of 255 bytes - perfect for the small bin. chunks
wanted in #2417, but not for larger data), and then hook it up to
Image/FileFields for #652.
An alternate solution is to check in #2417 for small binary chunks,
and then hold 652 back until we decide if we want a LargeBinaryField
for large binary chunks suitable for file uploads.
Relevant Tickets:
http://code.djangoproject.com/ticket/250
http://code.djangoproject.com/ticket/652
http://code.djangoproject.com/ticket/2417
Cheers,
Simon
IMO the patch for #2417 is sub-optimal as it:
1) Subclasses the Char field.
2) Does not provide an intelligent manipulator.
The solution to this problem is to provide a form upload field with
the addition of a checkbox to signify deletion of the currently saved
binary data. There is no sensible way to SHOW the current binary data
- we can leave that up to the application designer.
For example the full-history-branch uses pickle (cPickle) to serialize
the data of an object. pickle known multiple methods to store the data.
By default is creates a ASCII-dump, but you can change this behavior.
From the docs (http://docs.python.org/lib/node316.html):
"By default, the pickle data format uses a printable ASCII
representation. This is slightly _more voluminous_ than a binary
representation."
Besides the other methods provide some optimization:
"Protocol version 2 was introduced in Python 2.3. It provides much _more
efficient_ pickling of new-style classes."
Just my 2 cents,
David Danier
To make a case for binary data - how about when you want to store a
small image for a UserProfile.
Saving to the local file-system doesn't work when you are clustering
your application servers.
You can have a TextField and save data as base64 or save data as
binary. In postgres binary data is not saved inside database, a link
is saved only, and I'm not sure if these kind of data could be
replicated through slony-1 or similar.
> To make a case for binary data - how about when you want to store a
> small image for a UserProfile.
>
> Saving to the local file-system doesn't work when you are clustering
> your application servers.
it depends the FS you're using.
To get round this problem I am using NFS. Do you have any other
suggestions?
Either way, I still think it would be nice to be able to store binary
blobs directly via models.
hum, you could do some test using GFS
> Either way, I still think it would be nice to be able to store binary
> blobs directly via models.
I'm agree with you. However, Django must be database independent, so
sqlite, MySQL and others are able to hand binary data? One solution is
write a Field to convert everything _to _base64 on write and convert
everything _from_ base64 on read. That's way isn't necessary what db
are you using.
+1 on having a BinaryField. I'd actually like to see BinaryField be
the "larger" binary field, and have a SmallBinaryField alongside for
databases with those types.
-1 on allowing File/ImageField to be stored in the database. That's
bad design 99% of the time, and will needlessly complicate file upload
code.
Jacob
For big applications it's especially important not to have their DB
serving static files. One of the first optimization advice that comes
with Django is:
use separate web server for serving static files and
use separate box for your DB
this would be like doing exact opposite - put the load of serving
static files onto the already busy DB server
>
> An argument for supporting Image/Field on DB:
> Consider a case of multiple frontends with a big big database, having
> File and Image fields on filesystem forces you to keep the filesystem in
> sync among frontends. Now imagine you upload a file which is i.e. the image
> for an article; The article is inserted on the database and the file on the
> filesystem. All frontends will **immediatelly** show up the article, but
> only one will have the image! unless you start playing around with NFS or
> other networked filesystems.
network filesystem or NAS is the 'right solution' here, even if NFS is
not the best solution performance-wise (NFS is slow) its still WAY
faster than DB and can be on a separate box. again, this is important
especially for large projects, that need their DB server to be
unburdened by static junk
> It can also be a bit messy to do Point In Time recoveries, with everything
> on the database you can to a nice PTR without any trouble, if there are
> things on the filesystem you must make sure both things get recovered to the
> same point in time, and it's rare to see filesystems backed up
> **permanently** while point in time recoveries in databases (atleast
> postgresql) are heavily documented and a good resource for some kind of
> applications.
OK, goo point here, but its very minor issue, basically only media
rich servers will use this and they would completely kill their DB
server with the traffic (if it's that high that ''rsync'' cannot be
used every 2 minutes)
> Third case; Imagine having one single directory holding a project but you
> run multiple instances of it over different databases (yes, doing tricky
> things to settings), having things on the filesystem makes things a bit
> harder.
just put the DIR in the settings as well, I see no point here
>
> I'm +1 on providing database backed File and Image fields while heavily
> discouraging it's use on the documentation by providing clear examples of
> the 99% and 1% sides of the thing so users are aware of which storage method
> should they choose.
>
> Also +1 on the BinaryField, then atleast if one **really** needs to store
> things on DB it could be done :)
even though you raised some good arguments, I still believe BLOB's are
mostly evil, and will be misused.
that said, I am -0 on this. its inviting people to shoot themselves in the foot.
I would be +1 on a BIG RED SIGN saying not to use it unless you REALLY
KNOW what you are doing... ;)
>
> Cheers,
> Marc
>
>
> On 3/26/07, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> >
> > On 3/26/07, Simon G. <d...@simon.net.nz> wrote:
> > > So - if we do want a BinaryField we could use #2417 and make it
> > > suitable for larger binary stores (e.g. the VARBINARY used for MySQL
> > > has a max length of 255 bytes - perfect for the small bin. chunks
> > > wanted in #2417, but not for larger data), and then hook it up to
> > > Image/FileFields for #652.
> > >
> > > An alternate solution is to check in #2417 for small binary chunks,
> > > and then hold 652 back until we decide if we want a LargeBinaryField
> > > for large binary chunks suitable for file uploads.
> >
> > +1 on having a BinaryField. I'd actually like to see BinaryField be
> > the "larger" binary field, and have a SmallBinaryField alongside for
> > databases with those types.
> >
> > -1 on allowing File/ImageField to be stored in the database. That's
> > bad design 99% of the time, and will needlessly complicate file upload
> > code.
> >
> > Jacob
> >
> > > >
> >
>
--
Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585
All of your arguments are based on the assumption that files would be
served from the database.
The only sane environment would be to store the files in the database
as a reference but serve them from a filesystem/memcahce/squid cache.
Only when the cached resource expires do you make a round-trip to the
database again.
GFS should be better.
>
The second issue - a (say) BinaryStorageField for large bin. data
hooked up to Image/File uploads seems to be one that's wanted by a few
people, and strongly disliked by a number of others (about 50/50), so
I've marked it as wontfix for the time being. As Todd suggested, it
shouldn't be too hard to implement yourself if you do need this
functionality, however, if someone wants to write up a proposal /
initial code for this, then please do so and re-open #652
-- Simon
I think Jacob expressed a preference for BinaryField allowing lots of
data and SmallBinaryField allowing the user to set the max size with a
parameter, and I think that's probably the right way to go to remain
consistent with what's already there.
Todd