Retrieve all documents in index in an speed efficient manner

34 views
Skip to first unread message

Filip Nilsson

unread,
Jul 1, 2015, 8:27:23 AM7/1/15
to google-a...@googlegroups.com
I’m trying to retrive all documents in a search index in an efficent manner. My current approach is something like this: https://gist.github.com/filleokus/8941f0824bef0fd921a7, but this seem to take about 50 ms per 100 item batch, which is way to slow. In this particular index we have a couple of thousand document (<5k though), which means the request can take upwards of 4000 ms sometimes.

Any suggestions on how to access all items faster, or is it simply impossible? There must be a faster way to just retrieve everything.

Thanks for any help,
Filip

Barry Hunter

unread,
Jul 1, 2015, 8:35:04 AM7/1/15
to google-appengine
On 1 July 2015 at 13:27, Filip Nilsson <fill...@gmail.com> wrote:
I’m trying to retrive all documents in a search index in an efficent manner.

Why? To be frank it just sounds like bad design. Try to do whatever you doing by accessing less data. 

 

Any suggestions on how to access all items faster, or is it simply impossible?

Certainly think it impractical, but not impossible. 

 
There must be a faster way to just retrieve everything.

A search index is designed for getting a very specific subset quite quickly, its simply not designed to 'get everything'. 



Filip Nilsson

unread,
Jul 1, 2015, 8:51:42 AM7/1/15
to google-a...@googlegroups.com


Den onsdag 1 juli 2015 kl. 14:35:04 UTC+2 skrev barryhunter:


On 1 July 2015 at 13:27, Filip Nilsson <fill...@gmail.com> wrote:
I’m trying to retrive all documents in a search index in an efficent manner.

Why? To be frank it just sounds like bad design. Try to do whatever you doing by accessing less data. 

Each document is displayed in a web front end, where the user can perform filtering etc. The strategy before did the search on the server side, but the site was very slow since each filtering operation required a request to the backend, so now I do all the filtering on the client side with very excellent performance.  
 

 

Any suggestions on how to access all items faster, or is it simply impossible?

Certainly think it impractical, but not impossible. 

 
There must be a faster way to just retrieve everything.

A search index is designed for getting a very specific subset quite quickly, its simply not designed to 'get everything'. 

I guess that, if I decided to continue with this approach, I need to just access the data store directly. The problem is that a lot of logic is dependant on the index, so it's no small task to replace all that. If I would just get all the documents out of the index in a fast manner, I could keep everything else the same. 

Barry Hunter

unread,
Jul 1, 2015, 9:08:51 AM7/1/15
to google-appengine
 
so now I do all the filtering on the client side with very excellent performance.  

Maybe you could just retrieve all the documents in low-priority 'batch' process. And store the intermediate results just as a blob of text. Perhaps put as json file into Cloud Storage. 

The front end can just access it directly by URL, and the backend can just fire of a job to rebuild the file whenever the documents in the index update. 

Patrice (Cloud Platform Support)

unread,
Jul 1, 2015, 11:18:39 AM7/1/15
to google-a...@googlegroups.com, barryb...@gmail.com
Hi Filip, Barry.

Thank you Barry for the great help here :). 

Filip, Barry is right, simply having 4000 docs being grabbed is a bit inefficient at best, and unscalable if you ever grow to more docs.

The blob idea is definitely a good idea, it will be quick to retrieve and show. With a cron job to update every day/week, or a bit in your code that re-generates that blob whenever the index changes, you'll always have up to date info on your files.

Then from the display, whenever the customer selects a doc, the search index will work the way it's intended : give you quick results :).

Cheers!
Reply all
Reply to author
Forward
0 new messages