Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

how to secure documents in server

0 views
Skip to first unread message

RAZZ

unread,
Jul 18, 2008, 6:05:52 AM7/18/08
to
Hello, Can anyone suggest me solution?

I Need to manage different types of documents (doc,xls,ppt etc) in
server. I have folder structure to maintain these documents in server.

Say folder1 is having all doc files; folder2 is having all xls files
and so on.


Now these documents should not be able to get access through the url
by directly typing path.
E-g if I try to access directly www.mywebsite.com/folder1/xyz.doc it
will open the document in browser itself.
At the same time these documents should be access only through our
website once they are login. But without login also if you know the
path you can get these documents how should I avoid it?

How can I provide security to these documents in server?

GArlington

unread,
Jul 18, 2008, 6:08:59 AM7/18/08
to

Depending on webserver your should look at .htacceess for Apache or
httpd.ini for IIS...

RAJ

unread,
Jul 18, 2008, 6:14:11 AM7/18/08
to

well we are using yahoo server and it doesn't allow .htaccess to
upload or manipulate by developers
so is there any other way? i just want that doc or xls files should
not be able to open directly unless person has properly login.

Captain Paralytic

unread,
Jul 18, 2008, 6:32:14 AM7/18/08
to
> not be able to open directly unless person has properly login.- Hide quoted text -
>
> - Show quoted text -

You're not going to be able to do much on yahoo server I'm afraid. The
most common way to do this is to store the files outside of the web
root and use a php script to deliver the file.

I suggest you change hosts. There are much better value ones out there.

RAZZ

unread,
Jul 18, 2008, 6:36:39 AM7/18/08
to

> You're not going to be able to do much on yahoo server I'm afraid. The
> most common way to do this is to store the files outside of the web
> root and use a php script to deliver the file.
>
> I suggest you change hosts. There are much better value ones out there.

thank you for response can you suggest me bit in details regarding
"storing files outside of the web root and use a php script to deliver
the file"?

Captain Paralytic

unread,
Jul 18, 2008, 6:50:59 AM7/18/08
to

Actually another way to do it is to store the files in a BLOB field in
a database and delivering them from there. Here is a tutorial for that
and you could adapt it for the file system version:
http://www.php-mysql-tutorial.com/php-mysql-upload.php

RAZZ

unread,
Jul 18, 2008, 7:31:16 AM7/18/08
to

That was really very good option but i have documents or doc files
which contains images and tables while downloading text are fine but
images and tables are coming in some encrypted format?

Captain Paralytic

unread,
Jul 18, 2008, 7:41:16 AM7/18/08
to

I don't understand??? What difference does it make what the document
contains? A binary file is a binary file is a binary file! It can
contain anything whatsoever???

The Natural Philosopher

unread,
Jul 18, 2008, 11:10:36 AM7/18/08
to

Pur ALL thes documents as large BLOB objects in a database: thats one
easy place to store them and one access methodd needed to restrict
access to what you want.

The Natural Philosopher

unread,
Jul 18, 2008, 11:11:59 AM7/18/08
to


As ling as they are encapuslated IN the file, that doesn't matter. a
data base will store any file.

Bart Van der Donck

unread,
Jul 18, 2008, 11:52:47 AM7/18/08
to
Captain Paralytic wrote:

> Actually another way to do it is to store the files in a BLOB field in
> a database and delivering them from there. Here is a tutorial for that
> and you could adapt it for the file system version:
> http://www.php-mysql-tutorial.com/php-mysql-upload.php

I'm surprised this document doesn't mention how disastrous it can be
for the performance of a database. Only use for tiny binary data and a
limited amount of records, I'ld say... I would even vote to dismiss
LONGBLOB; it often creates more problems than it solves.

--
Bart

Pugi!

unread,
Jul 18, 2008, 1:05:20 PM7/18/08
to
I do not know the details of your provider or host but if you can
store your documents outside of your documentroot, no one can access
your files directly. You can use php to store them and retrieve them.
I store the filename and mimetype in database (and some other
information), files are stored in a directory outside documentroot
where apache has read/write access (because users are allowed to
upload documents) (in my case the documents are even stored on another
server with NFS share). Once you obtained the filename and mimetype
from database and path from config file:

header("Cache-Control: max-age=60");
header('Content-type: ' . $filemime);
header("Content-Disposition: attachment; filename=\"" . $filename .
"\"");
readfile($filepath . $filename);

It not only downloads the file but also asks if you want to open it
with the associated program (MS Word or OO Writer for *.doc, ...)

Pugi!

J.O. Aho

unread,
Jul 18, 2008, 1:47:00 PM7/18/08
to
RAZZ wrote:
> Hello, Can anyone suggest me solution?

There been those who already mentioned to store the files outside the web
servers "document root", this is the most secure method (of course depending
on the security of the script/application that supplies the file, in worst
case this can endanger the security of the whole server).

.htaccess and similar web server restrictions has the draw back that not
everyone offers this and it can be easy to do it the wrong way when
unexperienced with web server configuration.

The idea of storing binary files in a database is quite good, but it will
affect the sql server in a negative way, specially the larger the binary files
are.

A fourth method is to encrypt the files and store them in the "document root",
and the special download script decodes the file when downloaded by someone
with access to get the decrypted file. (this can be combined with all the
other methods too), this way someone accessing the file directly can't use it.

A lot simpler way is to rename the files to something quite random (md5 hash
the name, don't forget to salt it), store the hashed filename in a database
table where you have the original filename too. The download script in this
case will take an argument of the original filename, look in the database for
the hashed name, provides the file to the user (with header you send it as the
original name), this way you can't get the file with direct download unless
you know the hashed file name. If you combine this one with the previous
method, you should have a quite good false security on the files.

--

//Aho

The Natural Philosopher

unread,
Jul 18, 2008, 2:58:33 PM7/18/08
to
J.O. Aho wrote:
>
> The idea of storing binary files in a database is quite good, but it
> will affect the sql server in a negative way, specially the larger the
> binary files are.
>
Ok, why should it take longer to pull a large file out of one locatin in
a database than one location in a filesssytem?

IME the things that slow databases down are not getting data out of
them, its performing complex relational queries.

Paul Lautman

unread,
Jul 18, 2008, 3:36:53 PM7/18/08
to
Bart Van der Donck wrote:
> Captain Paralytic wrote:
>
>> Actually another way to do it is to store the files in a BLOB field
>> in a database and delivering them from there. Here is a tutorial for
>> that and you could adapt it for the file system version:
>> http://www.php-mysql-tutorial.com/php-mysql-upload.php
>
> I'm surprised this document doesn't mention how disastrous it can be
> for the performance of a database.
It doesn't because it isn't

> Only use for tiny binary data and a
> limited amount of records, I'ld say... I would even vote to dismiss
> LONGBLOB; it often creates more problems than it solves.

I usually chunk the files into BLOBs

Michael Fesser

unread,
Jul 18, 2008, 3:38:04 PM7/18/08
to
.oO(The Natural Philosopher)

>J.O. Aho wrote:
>>
>> The idea of storing binary files in a database is quite good, but it
>> will affect the sql server in a negative way, specially the larger the
>> binary files are.
>>
>Ok, why should it take longer to pull a large file out of one locatin in
>a database than one location in a filesssytem?

Just think about what steps are required in order to get a file 1) from
a DBMS, 2) from a location outside the doc root, 3) directly with a URL:

1. Storage file -> DBMS -> Socket -> Script -> Webserver -> Browser
2. File -> Script -> Webserver -> Browser
3. File -> Webserver -> Browser

>IME the things that slow databases down are not getting data out of
>them, its performing complex relational queries.

The DB also has to access the disk. Additional overhead is caused by the
SQL processing itself and the transfer of the data to the requesting
script.

Micha

Paul Lautman

unread,
Jul 18, 2008, 3:48:48 PM7/18/08
to

I have tested this and I have found it slightly slower to get files from a
database table than from the file system. Then again, it is slightly slower
building pages dynamically with php/MySQL than it is to serve fixed html
pages. So basically, when I find that storing files in a database is the
best way to handle the application I am writing, that's the way I do it.


Jerry Stuckle

unread,
Jul 18, 2008, 4:09:46 PM7/18/08
to
Bart Van der Donck wrote:

You're just using the database for what it's made for - storing and
accessing data. It's not at all disastrous - in fact, if you get enough
files in the database, performance may actually improve over that file
system's.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstu...@attglobal.net
==================

Jerry Stuckle

unread,
Jul 18, 2008, 4:12:07 PM7/18/08
to
J.O. Aho wrote:
> RAZZ wrote:
>> Hello, Can anyone suggest me solution?
>
> There been those who already mentioned to store the files outside the
> web servers "document root", this is the most secure method (of course
> depending on the security of the script/application that supplies the
> file, in worst case this can endanger the security of the whole server).
>
> .htaccess and similar web server restrictions has the draw back that not
> everyone offers this and it can be easy to do it the wrong way when
> unexperienced with web server configuration.
>
> The idea of storing binary files in a database is quite good, but it
> will affect the sql server in a negative way, specially the larger the
> binary files are.
>

A common misconception by those who haven't used databases for storing
large amounts of data. Properly configured, the database will have
excellent performance.

> A fourth method is to encrypt the files and store them in the "document
> root", and the special download script decodes the file when downloaded
> by someone with access to get the decrypted file. (this can be combined
> with all the other methods too), this way someone accessing the file
> directly can't use it.
>
> A lot simpler way is to rename the files to something quite random (md5
> hash the name, don't forget to salt it), store the hashed filename in a
> database table where you have the original filename too. The download
> script in this case will take an argument of the original filename, look
> in the database for the hashed name, provides the file to the user (with
> header you send it as the original name), this way you can't get the
> file with direct download unless you know the hashed file name. If you
> combine this one with the previous method, you should have a quite good
> false security on the files.
>
>
>

Even worse performance than storing the data in the database in the
first place. More overhead for the scripting language, while no
significant savings on the database end.

Jerry Stuckle

unread,
Jul 18, 2008, 4:14:07 PM7/18/08
to

Paul,

But try putting 100K files in a directory on the file system and see how
much it slows things down. Whereas the database will hardly notice any
performance decrease.

Jorge

unread,
Jul 18, 2008, 4:17:09 PM7/18/08
to
On Jul 18, 8:58 pm, The Natural Philosopher <a...@b.c> wrote:
> J.O. Aho wrote:
>
> > The idea of storing binary files in a database is quite good, but it
> > will affect the sql server in a negative way, specially the larger the
> > binary files are.
>
> Ok, why should it take longer to pull a large file out of one locatin in
> a database than one location in a filesssytem?
>

I think the point is that retrieving such a large data chunk from a db
might momentarily impact the performance of forthcoming db operations,
think about what happens to the sql database caches.

--Jorge.

Joost Diepenmaat

unread,
Jul 18, 2008, 4:23:20 PM7/18/08
to
Jerry Stuckle <jstu...@attglobal.net> writes:

> But try putting 100K files in a directory on the file system and see
> how much it slows things down. Whereas the database will hardly
> notice any performance decrease.

That really depends on the filesystem. But yeah, most common file
systems don't like that. In any case, neither relational databases nor
normal file systems are optimized for this kind of use - especially
not if the blobs are large.

In other words, your mileage may vary. See also
http://perspectives.mvdirona.com/2008/06/30/FacebookNeedleInAHaystackEfficientStorageOfBillionsOfPhotos.aspx

--
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/

The Natural Philosopher

unread,
Jul 18, 2008, 4:23:29 PM7/18/08
to

yes, but that is pretty insignificant with e,g. the disk speed issues
and download bandwidth, which are the same in both cases.

> Micha

Bart Van der Donck

unread,
Jul 18, 2008, 4:25:43 PM7/18/08
to
Jerry Stuckle wrote:

> Bart Van der Donck wrote:
>
>> Captain Paralytic wrote:
>
>>> http://www.php-mysql-tutorial.com/php-mysql-upload.php
>
>> I'm surprised this document doesn't mention how disastrous it can be
>> for the performance of a database. Only use for tiny binary data and a
>> limited amount of records, I'ld say... I would even vote to dismiss
>> LONGBLOB; it often creates more problems than it solves.
>

> You're just using the database for what it's made for - storing and
> accessing data.  It's not at all disastrous - in fact, if you get enough
> files in the database, performance may actually improve over that file
> system's.

I would be interested to see some articles or benchmarks about this
issue. Got any ? From my experience I've actually always encountered
the opposite (MySQL and MS Access) whose performance dramatically
decreases with larger BLOBS. I'm working with many GB's of pictures
for whom I store nothing in tables (ID of the record = name of the
picture / application ties pics to IDs). I've good experiences with
this approach, even under heavy load. But I'm always interested to
learn how this strategy could be improved.

--
Bart

Paul Lautman

unread,
Jul 18, 2008, 4:39:48 PM7/18/08
to
Jerry Stuckle wrote:
> Paul Lautman wrote:
>> The Natural Philosopher wrote:
>>> J.O. Aho wrote:
>>>> The idea of storing binary files in a database is quite good, but
>>>> it will affect the sql server in a negative way, specially the
>>>> larger the binary files are.
>>>>
>>> Ok, why should it take longer to pull a large file out of one
>>> locatin in a database than one location in a filesssytem?
>>>
>>> IME the things that slow databases down are not getting data out of
>>> them, its performing complex relational queries.
>>
>> I have tested this and I have found it slightly slower to get files
>> from a database table than from the file system. Then again, it is
>> slightly slower building pages dynamically with php/MySQL than it is
>> to serve fixed html pages. So basically, when I find that storing
>> files in a database is the best way to handle the application I am
>> writing, that's the way I do it.
>>
>>
>
> Paul,
>
> But try putting 100K files in a directory on the file system and see
> how much it slows things down. Whereas the database will hardly
> notice any performance decrease.

I have always found it slightly slower to get the equivalent file from the
database rather than from the file system. But as I say, it doesn't bother
me. If the application is generally better with the files in a database,
that's where they go. If the application is easier with them on disk, then I
put them there. Likewise, if something works better with static html pages I
will use them. When it comes to down to it, we have a vast range of
technologies at our disposal. I look upon my role as being good at picking
the right one for the right task. There is always a balance to be struck
between speed of processing, functionality, ease of maintenance, ...


Jerry Stuckle

unread,
Jul 18, 2008, 8:33:31 PM7/18/08
to

Over 20 years of experience doing it, starting with DB2 on mainframes.

But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

BTW - benchmarks tell exactly one thing - how a database runs UNDER
THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
more.

With that said, under live conditions, I've seen virtually no slowdown
when accessing blob data in a database. And in some cases it actually
runs faster.

Jerry Stuckle

unread,
Jul 18, 2008, 8:34:58 PM7/18/08
to

Yes, but with that many files in a directory, even Linux slows down
quite a bit. It isn't made to handle that many different files.

But for a good database, you're just starting.

Jerry Stuckle

unread,
Jul 18, 2008, 8:35:47 PM7/18/08
to

Not at all, if the database is properly configured.

The Natural Philosopher

unread,
Jul 19, 2008, 1:38:45 AM7/19/08
to
Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
is to.."

Advantages of the database...

- one point backup of all data
- definitely not directly accessible via HTML
- has much better indexing and searching than a flat file system in a
directory.
- possibly simpler integration with other bits of data assciated with te
file to be served )i.e. you MIGHT want a decsription of what it is).

On the downside, its a few more machine cycles and possibly a lot more
RAM to serve it up.


HOWEVER it is perfectly possible to have separate database on even a
separate machine to do the serving, if it gets too onerous.

Paul Lautman

unread,
Jul 19, 2008, 3:32:57 AM7/19/08
to
The Natural Philosopher wrote:
> Advantages of the database...
>
> - one point backup of all data
> - definitely not directly accessible via HTML
> - has much better indexing and searching than a flat file system in a
> directory.
> - possibly simpler integration with other bits of data assciated with
> te file to be served )i.e. you MIGHT want a decsription of what it
> is).
Also, and this is the bit I really like, when you delete the record the file
automatically goes with it.


Paul Lautman

unread,
Jul 19, 2008, 3:36:15 AM7/19/08
to
Jerry Stuckle wrote:
> Paul Lautman wrote:
>> The Natural Philosopher wrote:
>>> J.O. Aho wrote:
>>>> The idea of storing binary files in a database is quite good, but
>>>> it will affect the sql server in a negative way, specially the
>>>> larger the binary files are.
>>>>
>>> Ok, why should it take longer to pull a large file out of one
>>> locatin in a database than one location in a filesssytem?
>>>
>>> IME the things that slow databases down are not getting data out of
>>> them, its performing complex relational queries.
>>
>> I have tested this and I have found it slightly slower to get files
>> from a database table than from the file system. Then again, it is
>> slightly slower building pages dynamically with php/MySQL than it is
>> to serve fixed html pages. So basically, when I find that storing
>> files in a database is the best way to handle the application I am
>> writing, that's the way I do it.
>>
>>
>
> Paul,
>
> But try putting 100K files in a directory on the file system and see
> how much it slows things down. Whereas the database will hardly
> notice any performance decrease.

Actually I guess I ought to qualify my timings comment. I have no proof that
it is the database that was slowing things down per-se. To serve the images
required invoking a load of script, which wasn't going to help and of course
the MySQL installation was on a shared server, so no opportunity to optimise
the settings for this task.


Michael Fesser

unread,
Jul 19, 2008, 5:04:42 AM7/19/08
to
.oO(The Natural Philosopher)

>Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
>is to.."
>
>Advantages of the database...
>
>- one point backup of all data
>- definitely not directly accessible via HTML
>- has much better indexing and searching than a flat file system in a
>directory.
>- possibly simpler integration with other bits of data assciated with te
>file to be served )i.e. you MIGHT want a decsription of what it is).
>
>On the downside, its a few more machine cycles and possibly a lot more
>RAM to serve it up.

Some more pros and cons:

http://groups.google.com/group/alt.php.sql/msg/c0e4dd4f90eafa84

Micha

Jorge

unread,
Jul 19, 2008, 5:45:29 AM7/19/08
to
On Jul 19, 7:38 am, The Natural Philosopher <a...@b.c> wrote:
>
> Yes. Exactly. The key is to not get religious about it ..."the RIGHT way
> is to.."
>

In fact, a filesystem is a ~DBMS that handles just one type of data
(files). But the amount of metadata that a filesystem (easily) keeps/
provides about its data (the files) is limited, while there's no limit
to the amount of metadata that can be (easily) saved/retrieved in a
DBMS. Both are (most likely) equally well optimized to do their jobs
efficiently. The APIs to get to the data are completely different. One
is pretty familiar and the other is not so much. I love the idea of
single file backups (as in a DBMS). OTOH, the filesystem approach
suits better for incremental backups.

--Jorge.

Jerry Stuckle

unread,
Jul 19, 2008, 9:16:05 AM7/19/08
to

Which is not entirely accurate...

The Natural Philosopher

unread,
Jul 19, 2008, 11:21:22 AM7/19/08
to
Good point.

The Natural Philosopher

unread,
Jul 19, 2008, 11:24:31 AM7/19/08
to
Shows a lot of bis there and many usupported assertions. Some of which
ARE wrong.


Bart Van der Donck

unread,
Jul 21, 2008, 4:23:37 AM7/21/08
to
Jerry Stuckle wrote:

> [...]


> But don't count MS Access in there.  Use a real database.  MySQL
> qualifies.  And it has to be configured properly.

Not the real communism ? [*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.

> BTW - benchmarks tell exactly one thing - how a database runs UNDER
> THOSE CONDITIONS.  Change the conditions and benchmarks aren't valid any
> more.
>
> With that said, under live conditions, I've seen virtually no slowdown
> when accessing blob data in a database.  And in some cases it actually
> runs faster.

I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMG> to screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).
- Output with <img>.

(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

It is my experience that (1) has huge memory benefits compared to
(2).

The difference between (3) and (4) is not so clear; especially because
MySQL probably optimizes this processus. I think in practice you would
see that (3) is faster for environment A, and (4) for environment B;
but never with real considerable differences.

And (1) and (2) are much more important since they count for 99.x% of
the queries in my case.

[*] -"Communism is great." -"But look how things went in the USSR."
-"That was not the real communism."
[**] Many tendencies in MS Access are a good thermometer for general
database issues; MS Access is just the first that fails :-)

--
Bart

The Natural Philosopher

unread,
Jul 21, 2008, 4:53:18 AM7/21/08
to
Bart Van der Donck wrote:

Unnecessary: Just..

> - Output with <img>.

..pointing to a second php script that loads the BLOB and spits it out.


>
> (3) Update & delete actions without BLOB:
> - Update/delete instructions stay out of DB, affects file system only.
>
> (4) Update & delete actions with BLOB:
> - Update/delete instructions stay out of file system, affects DB only
>
> It is my experience that (1) has huge memory benefits compared to
> (2).

Well the way you have it, it duplicates the file in its entirety, which
is inefficient.

The way I do it, it streams off the database via the unix socket into
PHP memory space, and is outputted from there via the web server to the
network.

VERY little extra PHP or CPU activity is required, but I grant you its
probably held in PHP and SQL type memory areas as well as disk cache
memory. Its probably NOT held i e.g.apache memory though..apache or
whatever will read the stdout of the CGI script that spits it, and juts
pass the bytes...and memory is cheap. Cheaper than CPU anyway.

Reading a record has to be something a database is highly optimised for.

Bart Van der Donck

unread,
Jul 21, 2008, 5:21:13 AM7/21/08
to
The Natural Philosopher wrote:

> Bart Van der Donck wrote:
>> (1) Read actions without BLOB:
>> - Application does not load any BLOB data from database.
>> - Application uses a var holding the system-path (usr/my/path/to/
>> pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
>> - If yes, use URL-path in stead of system-path and output inside an
>> <IMG> to screen.
>> - No binary data has to be handled; the major memory use here (if any)
>> is the -e check for file existance. But even this could be skipped
>> with a workaround.
>
>> (2) Read actions with BLOB:
>> - Load BLOB from column (already a memory-intensive task of its own).
>> - Store in some folder (id.).

>> It is my experience that (1) has huge memory benefits compared to
>> (2).
>


> The way I do it, it streams off the database via the unix socket into
> PHP memory space, and is outputted from there via the web server to the
> network.
>
> VERY little extra PHP  or CPU activity is required, but I grant you its
> probably held in PHP and SQL type memory areas as well as disk cache
> memory. Its probably NOT held i e.g.apache memory though..apache or
> whatever will read the  stdout of the CGI script that spits it, and juts
> pass the bytes...and memory is cheap. Cheaper than CPU anyway.

All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

--
Bart

AlmostBob

unread,
Jul 21, 2008, 5:35:53 AM7/21/08
to
"Bart Van der Donck" <ba...@nijlen.com> wrote in message
news:591ca336-5b7d-4c06...@c65g2000hsa.googlegroups.com...
The Natural Philosopher wrote:

--
Bart


But BArt
View source
shows the true path to your image, not good


The Natural Philosopher

unread,
Jul 21, 2008, 6:05:50 AM7/21/08
to

There is no better: it depends on the requirements.

Your way there is no chance to protect the image directory from random
downloads for example.

In my case the user may be a user with far greater access than the
general public, and have access to internal data - like plans drawings
and specifications.

I don't want script kiddies stealing vital info: Putting them in a
database is one giant leap in that sense.

execution speed and efficiency is only one of many many issues.

In my case the above, plus a general requirement to try and get all
important corporate data in the data base, under one backup regime, were
more significant. I especially did NOT want user accessible image files
that might get deleted by accident. I could protect the database area by
making it only accessible by root or the mysql daemon: direct access to
download areas had to be at lest readable, and if uploaded, wrteable, by
the permissions the web server and php ran at.


In practice at moderate loads the download speeds are far more dominant
that CPU or RAM limitations. And indeed the ability to make a special
download script that re-sizes the images on the fly, turned out to be a
better way to go than storing thumbnails of varying sizes. One trades
disk space for processing overhead.

As a practicing engineer all my working life, it still amazes me that
people will always come up with what amounts to a religious statement
about any particular implementation, that it is universally 'better'.

If that were the case, it would be universally adopted instantly.

Jerry has (for once) made an extremely valid point about directory sizes
as well. Databases are far better at finding things quickly in large
amounts of data: far better than a crude directory search. Once the
overhead in scanning the directory exceeds the extra download
efficiency, you are overall on a loser with flat files.

AND if you run into CPU or RAM limitations, its a lot easier to - say -
move your database to a honking new machine, or upgrade the one you have
than completely re-write all your applications to use the database, that
used to use a file.

I am NOT claiming that a database is te 'right' answer in all cases,
just pointing out that it may be a decision you want to make carefully,
as it is somewhat hard to change later on, and in most cases the extra
overhead on using it is more than compensated by the benefits,
particularly in access control.

Which was the primary concern of the OP.

Jerry Stuckle

unread,
Jul 21, 2008, 6:46:33 AM7/21/08
to
Bart Van der Donck wrote:
> Jerry Stuckle wrote:
>
>> [...]
>> But don't count MS Access in there. Use a real database. MySQL
>> qualifies. And it has to be configured properly.
>
> Not the real communism ? [*] I partly agree for MS Access [**], but I
> have reasons to believe that my MySQL databases are set up properly.
> This is not a thing I do myself, but sysadmins in one of the giant
> datacenters who stick to one config for the entire park.
>

Not necessarily. Sysadmins cannot correctly set up a system in the
dark. They need communications from the developers on what data is
being stored, how it is being handled, etc.

Unfortunately, most sysadmins know very little about how to tune a
database (not just MySQL) and the results is poor response.

>> BTW - benchmarks tell exactly one thing - how a database runs UNDER
>> THOSE CONDITIONS. Change the conditions and benchmarks aren't valid any
>> more.
>>
>> With that said, under live conditions, I've seen virtually no slowdown
>> when accessing blob data in a database. And in some cases it actually
>> runs faster.
>
> I think the question is how BLOBs are handled. My situation is a
> browser-based application that consists of many read actions (public
> +intranet) and few update/delete actions (admin). Now suppose:
>
> (1) Read actions without BLOB:
> - Application does not load any BLOB data from database.
> - Application uses a var holding the system-path (usr/my/path/to/
> pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
> - If yes, use URL-path in stead of system-path and output inside an
> <IMG> to screen.
> - No binary data has to be handled; the major memory use here (if any)
> is the -e check for file existance. But even this could be skipped
> with a workaround.
>

Wrong - binary data is still handled.

> (2) Read actions with BLOB:
> - Load BLOB from column (already a memory-intensive task of its own).
> - Store in some folder (id.).
> - Output with <img>.
>

Not very intensive at all. And you don't store it in some folder.

> (3) Update & delete actions without BLOB:
> - Update/delete instructions stay out of DB, affects file system only.
>

Yep.

> (4) Update & delete actions with BLOB:
> - Update/delete instructions stay out of file system, affects DB only
>

Yep.

> It is my experience that (1) has huge memory benefits compared to
> (2).
>

Memory is nothing nowadays. Sure, you need more memory for the database
to effectively handle large blobs. But a few more megabytes is nothing.


> The difference between (3) and (4) is not so clear; especially because
> MySQL probably optimizes this processus. I think in practice you would
> see that (3) is faster for environment A, and (4) for environment B;
> but never with real considerable differences.
>
> And (1) and (2) are much more important since they count for 99.x% of
> the queries in my case.
>

And the difference is much less than you claim.

> [*] -"Communism is great." -"But look how things went in the USSR."
> -"That was not the real communism."
> [**] Many tendencies in MS Access are a good thermometer for general
> database issues; MS Access is just the first that fails :-)
>
> --
> Bart
>

Databases are optimized for retrieving data - especially from large
groups of data. File systems are just low level databases which handle
small amounts of data (a few files) very well.

One of the big differences is that as your data grows, the database
efficiency remains fairly static. However, file system performance
degrades. Eventually, the file system will actually perform worse than
the database does. Try putting 100K files in one directory. Good luck.
But a database handles 100M rows with ease.

And no, MS Access is not a real database, and is not a good thermometer
for anything other than how bad it really is. Real databases work in an
entirely different way and perform much differently.

Jerry Stuckle

unread,
Jul 21, 2008, 6:52:45 AM7/21/08
to

It's easier for YOU. And you THINK your way is better. But you've
never really tried with lots of images, have you? In fact, I suspect
you've never really checked it at all with a real database which has
been designed and configured to do this type of operation.

So all you really have to go on is your opinion.

OTOH, some of us have been doing it for years (over 20, in my case,
starting with DB2 on mainframes), and have both designed databases and
configured RDBMS's to handle these operations efficiently. We've seen
the difference in performance, and it isn't what you claim.

Bart Van der Donck

unread,
Jul 21, 2008, 8:01:44 AM7/21/08
to
Jerry Stuckle wrote:

> Bart Van der Donck wrote:
>
>>    SELECT id FROM table;
>>    print "<img src=url/to/$id.jpg>";
>

> It's easier for YOU.  And you THINK your way is better.  But you've
> never really tried with lots of images, have you?  

Yes I have, and the tests with BLOBs were disastrous for my case
(although I must admit this study was done already 9 years ago).

Perhaps you're right that my requirements were a bit particular; I'm
facing a read load of a few MB/sec and a modest update/delete load
only peaking at nightly cronjobs. Images are spread on the machine
over 57 directories, the largest directory is holding 22,241 images at
this moment. Maybe it's BSD or the running shell that is optimal (?);
one thing I know -and tested well enough- is that my MySQL cannot
handle this kind of BLOB "abuse" under such conditions.

I can understand it might be desirable that the URL to the image must
be unknown, like Natural Philosopher said, or other requirements which
make this or that approach more preferable. In my case the binaries
are about hotel photos having their telephone number as the name of
the JPG's. This level of protection is acceptable here; performance
critera are more crucial.

> In fact, I suspect you've never really checked it at all with
> a real database which has been designed and configured to do
> this type of operation.
> So all you really have to go on is your opinion.

It's unwise to draw a conclusion from something you only suspect.

But you're right, it's my opinion, but based on experience and
proceeded by quite some study and benchmarks. I think that, for my
case, it was the best possible design under the given requirements.

--
Bart

Message has been deleted

Jerry Stuckle

unread,
Jul 24, 2008, 9:49:42 PM7/24/08
to
Jones wrote:

> On Mon, 21 Jul 2008 06:46:33 -0400, Jerry Stuckle <jstu...@attglobal.net>
> wrote:
>
>> Not necessarily. Sysadmins cannot correctly set up a system in the
>> dark. They need communications from the developers on what data is
>> being stored, how it is being handled, etc.
>
> Once upon a time the term, "system analyst" actually meant something.
> And then Alan Sugar started selling desktop PC's to everyone and now
> everyone thinks they're a "software engineer" just because they can hack
> a few lines of PHP or type ./configure.
>
> The "developers" should have worked it all out before the project even started.
> Thats the REAL problem - here presumably and elsewhere for certain.
>

No, there are still sysadmins, who are responsible for system tuning.
It isn't just the needs of the database developers which needs to be
taken into consideration - there are others, also.

Of course, you're right - nowadays there are too many "system
administrators" who only hold that title because they failed Programming
101.

Jerry Stuckle

unread,
Jul 24, 2008, 10:49:55 PM7/24/08
to
Bart Van der Donck wrote:
> Jerry Stuckle wrote:
>
>> Bart Van der Donck wrote:
>>
>>> SELECT id FROM table;
>>> print "<img src=url/to/$id.jpg>";
>> It's easier for YOU. And you THINK your way is better. But you've
>> never really tried with lots of images, have you?
>
> Yes I have, and the tests with BLOBs were disastrous for my case
> (although I must admit this study was done already 9 years ago).
>

How many is a lot? I've done it with over 50M images (several terabytes
- but that was a mainframe) in a database with no performance
degradation. But the database and RDBMS were designed to do it, also.

And this was under live conditions, averaging > 10K queries/second.

> Perhaps you're right that my requirements were a bit particular; I'm
> facing a read load of a few MB/sec and a modest update/delete load
> only peaking at nightly cronjobs. Images are spread on the machine
> over 57 directories, the largest directory is holding 22,241 images at
> this moment. Maybe it's BSD or the running shell that is optimal (?);
> one thing I know -and tested well enough- is that my MySQL cannot
> handle this kind of BLOB "abuse" under such conditions.
>

Do it all in one directory. That's what the database effectively does.
And it means you don't need to sort images into different directories,
create new directories when the images get too large...

> I can understand it might be desirable that the URL to the image must
> be unknown, like Natural Philosopher said, or other requirements which
> make this or that approach more preferable. In my case the binaries
> are about hotel photos having their telephone number as the name of
> the JPG's. This level of protection is acceptable here; performance
> critera are more crucial.
>
>> In fact, I suspect you've never really checked it at all with
>> a real database which has been designed and configured to do
>> this type of operation.
>> So all you really have to go on is your opinion.
>
> It's unwise to draw a conclusion from something you only suspect.
>
> But you're right, it's my opinion, but based on experience and
> proceeded by quite some study and benchmarks. I think that, for my
> case, it was the best possible design under the given requirements.
>
> --
> Bart

Yep, but your "study" and "benchmarks" were not necessarily accurate.
So neither are your conclusions.

Tune the RDBMS and design the database correctly, and there is virtually
no overhead. After all, all a file system is is a dumb dbms.

Geoff Berrow

unread,
Jul 26, 2008, 7:20:29 AM7/26/08
to
Message-ID: <g6bf1b$rm5$1...@registered.motzarella.org> from Jerry Stuckle
contained the following:

> After all, all a file system is is a dumb dbms.

Don't you mean, a file system is a database?

--
Geoff Berrow 0110001001101100010000000110
001101101011011001000110111101100111001011
100110001101101111001011100111010101101011
http://slipperyhill.co.uk

Jerry Stuckle

unread,
Jul 26, 2008, 10:15:12 AM7/26/08
to
Geoff Berrow wrote:
> Message-ID: <g6bf1b$rm5$1...@registered.motzarella.org> from Jerry Stuckle
> contained the following:
>
>> After all, all a file system is is a dumb dbms.
>
> Don't you mean, a file system is a database?
>

No, the files are a database. A file system is a dump database
management system.

Jerry Stuckle

unread,
Jul 26, 2008, 10:18:30 AM7/26/08
to
Jerry Stuckle wrote:
> Geoff Berrow wrote:
>> Message-ID: <g6bf1b$rm5$1...@registered.motzarella.org> from Jerry Stuckle
>> contained the following:
>>
>>> After all, all a file system is is a dumb dbms.
>>
>> Don't you mean, a file system is a database?
>>
>
> No, the files are a database. A file system is a dump database
> management system.
>

Whoops - mistype. That should be "A file system is a dumB database
management system". But come to think of it, it is kind of a dump, also :-)

0 new messages