Hi,
Thanks for all the feedback - much appreciated.
@Bevan:
> For my problem, we are storing many chunks of XML monitoring data in a MySQL database (recently switched
> engine from legacy use of MyISAM to InnoDb). As volume on our system has increased, the disk volume
> needed is growing at a ridiculous rate.
Partitioned? Eg a database table per month of data? Then you can just throw away old tables when they are no longer required.
@Gary
> Regarding compression, I've used java.util.zip.GZIPOutputStream and it seems pretty efficient.
Thanks, will have a look :)
@All
I'm thinking of storing the files in the database for now. I will however store it in a separate database so that I can move it to its own machine later if required.
Viewing a database as just a cute API to the filesystem - then this is basically storing it on the filesystem... If I store it on the filesystem then that would imply that the reduced throughput due to the database (API) is large enough to make it worthwhile to investigate a different approach.
Storing the files in the database will therefore probably reduce the throughput, but at least I can have one backup strategy for now.
@craigmj
> The BIG downside of storing files in the OS is that you can't shard/scale your server unless you then
> proceed to copy the files as well.
Good point. But I guess IF the path is stored in the database, then that will serve as a type of lookup table and one can migrate the files one at a time. But yeah, a lot of things that will need to happen each time the architecture changes (which I think was the point you were trying to bring across)
@David
> We store compressed data in the db where possible (e.g. when the data
> is text and does not need to be searched). I think the CPU time spent
> decompressing the data on read will generally be less than the IO time
> required to fetch more disk blocks + the data will use less buffer
> space on the db server. Also its easy to add more app servers if you
> need more CPU for the decompression and harder to scale the db.
Thanks, my thinking as well. "when the data is text and does not need to be searched" - good point, even though I'm sticking with the database for now I do acknowledge that there isn't really a big functional benefit in having it in the database over the filesystem (its just for easy scalability and management for now, and most people would most likely have a database API already written so nothing new needs to be done)
Many thanks all - appreciated :)
Serdyn du Toit