Full text search for large documents

38 views
Skip to first unread message

Rahul Thathoo

unread,
Oct 19, 2014, 1:29:25 PM10/19/14
to mobile-c...@googlegroups.com
Hi,

I am trying to use CBL for storing 10s of thousands of documents on the mobile device that are each between 10kb and 200 kb in size. Assume them all to be text only documents. I want to enable full text search on these potentially 100s of thousands of documents. What would be the most optimal way to go about doing this. My main constraints are to ensure that search is fast and that the space taking by the storage is not huge. Is there a way to store the documents in a zipped format and yet still be able to query against the view? Or how would you suggest going about solving this problem?

Jens Alfke

unread,
Oct 19, 2014, 4:44:38 PM10/19/14
to mobile-c...@googlegroups.com
On Oct 19, 2014, at 10:29 AM, Rahul Thathoo <rahul....@gmail.com> wrote:

I want to enable full text search on these potentially 100s of thousands of documents. What would be the most optimal way to go about doing this.

Despite implementing the FTS support in CBL/iOS, I'm not an expert on the details of the actual indexing; that's done by the SQLite FTS4 extension. So I don't know exactly how much space it takes up. The SQLite docs might have more info.

Is there a way to store the documents in a zipped format and yet still be able to query against the view?

Hm. I was about to advise storing the text as an attachment to the document, in zipped form, then in the map block retrieving the attachment and unzipping it and emitting the text. But the problem with this is that there's currently no API to get the contents of document attachments from within a map block. :( You could store the zipped text in the document JSON itself, but then you'd have to base64-encode it, which would undo most of the compression.

I just filed an issue on this so we don't forget about it.

So right now I don't have a good solution — you'll have to put the text directly in the document. Unfortunately this will be about twice the size (on average) of compressed text.

—Jens

Rahul Thathoo

unread,
Oct 19, 2014, 8:39:27 PM10/19/14
to mobile-c...@googlegroups.com
Thanks for the reply Jens.

Any time estimate on when we could expect the issue you filed to be addressed?

Rahul

Jens Alfke

unread,
Oct 20, 2014, 1:45:09 PM10/20/14
to mobile-c...@googlegroups.com

On Oct 19, 2014, at 5:39 PM, Rahul Thathoo <rahul....@gmail.com> wrote:

Any time estimate on when we could expect the issue you filed to be addressed?

No, sorry. It doesn't seem high priority because it's never come up for anyone before. We have a lot of higher priority stuff in our issue tracker.

—Jens
Reply all
Reply to author
Forward
0 new messages