Mongoengine auto zipped fields

13 views
Skip to first unread message

gen chen

unread,
Mar 14, 2013, 10:17:24 PM3/14/13
to mongoeng...@googlegroups.com
Hi Guys,  I want to use mongodb as my page base for my crawler which will crawl and store millions of web pages every day. After doing some test, I found the crawler consumed a lot of disk spaces every day so I want to store the html crawled in a zipped way to reduce the disk usage.  Is there any good way to zip/unzip the field automatically or I have to extend mongoengine myself. How to extend this such kind of features?

Russ Weeks

unread,
Mar 14, 2013, 11:24:51 PM3/14/13
to mongoeng...@googlegroups.com
Subclass BinaryField and override to_mongo and to_python to zip/unzip respectively?

But I would first explore storing your large collection on a compressible filesystem.  NTFS on windows, btrfs on linux.  That way you wouldn't have to do anything in your application, and performance would probably be better.

-Russ


On Thu, Mar 14, 2013 at 7:17 PM, gen chen <chenge...@gmail.com> wrote:
Hi Guys,  I want to use mongodb as my page base for my crawler which will crawl and store millions of web pages every day. After doing some test, I found the crawler consumed a lot of disk spaces every day so I want to store the html crawled in a zipped way to reduce the disk usage.  Is there any good way to zip/unzip the field automatically or I have to extend mongoengine myself. How to extend this such kind of features?

--
You received this message because you are subscribed to the Google Groups "MongoEngine Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongoengine-us...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

gen chen

unread,
Mar 15, 2013, 12:09:29 AM3/15/13
to mongoeng...@googlegroups.com, rwe...@newbrightidea.com
Thanks Russ Weeks. I will think about btrfs later. I will extend BinaryField first.

在 2013年3月15日星期五UTC+8上午11时24分51秒,Russ Weeks写道:
Reply all
Reply to author
Forward
0 new messages