When/Where to create index and add documents for Search API - Java

47 views
Skip to first unread message

Akash Eldo

unread,
Dec 29, 2018, 12:34:29 PM12/29/18
to Google App Engine

I'm creating a Java web server on the Google App Engine to do full text search on my database. Before I can search, I have to add all my database entries to an index. I should only have to do this once because the index is stored in persistent storage. Even if GCP creates a new instance of my Java server, the index should still be there (right?).


My question is, how do I set up my program so it will only create the index once? I've tried using Warming Services, but as I understand it that will be called every time a new instance is created, so there'd be redundant calls to my index creation code

Jim

unread,
Dec 29, 2018, 2:08:59 PM12/29/18
to Google App Engine
Are you using Cloud Datastore?  If so, you can use Datastore Callbacks and define "put" and "delete" callback functions on specific kinds.  Whenever an entity is put or deleted your callback functions will be called and you can do your index updates there.

Akash Eldo

unread,
Dec 29, 2018, 5:02:58 PM12/29/18
to Google App Engine
I'm using Firebase Firestore, but I think it has similar callbacks.

Tiago (Google Cloud Platform Support)

unread,
Jan 7, 2019, 2:48:37 PM1/7/19
to Google App Engine
Hello Akash, 

This could be achieved by redefining your database transaction calls to log externally in persistent storage a timestamp along with the associated transactions when updates are made to the database. You would then trigger updates to the index based on the specific entries in this log with cron. 

Alternatively, you could externally store a checksum of the database once it has been fully indexed, and introduce a conditional statement at startup to only re-update the index when that checksum has been changed. 

Another possibility would be to store the information on if an entity was already indexed directly in the database itself (possibly with a boolean property). 

The better choice between these options depends strongly on the peculiarities of your project (choice and size of database, transaction implementations, readily available access to external storage for the logs, etc).
Reply all
Reply to author
Forward
0 new messages