On 11/09/2012 3:30 AM, hovavo wrote:
> Thanks for the answer Matt.
> I am no big expert on GAE nor Whoosh, but i do have big interest in
> making this work.
> So i guess i am that person...
Cool! Please clone/pull the latest repo and take a look at the current
BlobProperty implementation in src/whoosh/filedb/gae.py.
The current implementation loads the entire property into memory using
BytesIO. Hopefully the Blobstore API provides a more file-like access.
I've added docstrings for the base Storage class in
src/whoosh/filedb/filestore.py, so hopefully it's fairly
straightforward. Anything you need to know about the Whoosh side, let me
know.
Also, your new storage class should have the following class attribute:
supports_mmap = False
> But as for my other half of me question -
> Does it sound right to you that i am hitting the 1MB limit after indexing such small data set?
> It currently happens after around 1500 extremely short documents.
It sounds odd, but I'd have to try the data to see. It might have
something to do with the new "compound segment" format, where Whoosh
writes separate files and then combines them into a single file, then
deletes the original files. If you're using the default branch of the
repo, you can try opening a writer with the compound=False keyword arg
to prevent this, e.g.:
w = myindex.writer(compound=False)
Cheers,
Matt