What is storage engine underneath the so called "Search API" engine

105 views
Skip to first unread message

PK

unread,
Jan 22, 2016, 1:35:48 AM1/22/16
to Google App Engine
I find very interesting that while the so called Search API is really a database for documents with a query engine, it is not presented in the “Storing data” section of the documentation of GAE. Also unlike a lot of documentation on how Paxos is used underneath the Datastore to provide high availability even in the event of catastrophic failure of one data center, I have not found much on the architecture of the search API store. 

Does anybody have any pointers to such material?




Nick (Cloud Platform Support)

unread,
Jan 22, 2016, 1:51:59 PM1/22/16
to Google App Engine
Hey PK,

The Search API is not quite a database. The Search API provides a model for indexing documents which contain structured data. A model is simply an object that contains a unique id and user defined fields. The index is the means by which that document can be quickly interrogated and information retrieved via query. That means that you basically have a pipeline that tokenizes the document, and then a separate pipeline that retrieves the token via query.

In the documentation there is quite a lot of information about how the Search API works, but very little about its implementation, and unfortunately we don't reveal such information. It would probably be best to code against the API as an abstraction. Feel free to check out some of the documentation links below (and be sure to use the side-bar in the docs to browse other topics related to the Search API):


[1] https://cloud.google.com/appengine/training/fts_adv/ 
[2] https://cloud.google.com/appengine/docs/python/search/ 
[3] https://cloud.google.com/appengine/training/fts_adv/lesson1 
[4] https://www.youtube.com/watch?v=cE6gb5pqr1k

PK

unread,
Jan 22, 2016, 3:49:59 PM1/22/16
to google-a...@googlegroups.com
I am not going to get religious on what a database is but the facts seem to be that what GAE calls “Search API” is a Key/Value store service with APIs to store values (which are records/documents with fields), retrieve records/documents based on keys AND/OR retrieve records/documents using a query language very well suited for text based fields.

Therefore it is really a way to store and retrieve data and I find it confusing that it stands out there on its own with a misleading name… For a long time, and before looking at the API, I thought it was a service that allows searching over data stored in other services that store data (datastore, cloudSQL, cloudstorage,..)

Anyways, Google can call it whatever you want, just my 2c and an attempt to make sure I am not missing something fundamental as I plan to start using the “Search API” in addition to the datastore that I have been using and understand very well.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/8464dc3e-b636-4f0a-a1a6-ef4831c28b1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Nick (Cloud Platform Support)

unread,
Jan 22, 2016, 3:56:36 PM1/22/16
to Google App Engine
You know, you make some compelling points. I suppose it could be considered a database of sorts. Sort of raises the question: is a linux file system equipped with 'find' and 'grep' a database, with regexp's being the query language? The answer seems to be, yeah, sort of, it's just not commonly phrased that way. But at any rate, you can definitely store and query data via the Search API. Best of luck going forward! 


On Friday, January 22, 2016 at 3:49:59 PM UTC-5, PK wrote:
I am not going to get religious on what a database is but the facts seem to be that what GAE calls “Search API” is a Key/Value store service with APIs to store values (which are records/documents with fields), retrieve records/documents based on keys AND/OR retrieve records/documents using a query language very well suited for text based fields.

Therefore it is really a way to store and retrieve data and I find it confusing that it stands out there on its own with a misleading name… For a long time, and before looking at the API, I thought it was a service that allows searching over data stored in other services that store data (datastore, cloudSQL, cloudstorage,..)

Anyways, Google can call it whatever you want, just my 2c and an attempt to make sure I am not missing something fundamental as I plan to start using the “Search API” in addition to the datastore that I have been using and understand very well.
On Jan 22, 2016, at 10:51 AM, Nick (Cloud Platform Support) <pay...@google.com> wrote:

Hey PK,

The Search API is not quite a database. The Search API provides a model for indexing documents which contain structured data. A model is simply an object that contains a unique id and user defined fields. The index is the means by which that document can be quickly interrogated and information retrieved via query. That means that you basically have a pipeline that tokenizes the document, and then a separate pipeline that retrieves the token via query.

In the documentation there is quite a lot of information about how the Search API works, but very little about its implementation, and unfortunately we don't reveal such information. It would probably be best to code against the API as an abstraction. Feel free to check out some of the documentation links below (and be sure to use the side-bar in the docs to browse other topics related to the Search API):


[1] https://cloud.google.com/appengine/training/fts_adv/ 
[2] https://cloud.google.com/appengine/docs/python/search/ 
[3] https://cloud.google.com/appengine/training/fts_adv/lesson1 
[4] https://www.youtube.com/watch?v=cE6gb5pqr1k

On Friday, January 22, 2016 at 1:35:48 AM UTC-5, PK wrote:
I find very interesting that while the so called Search API is really a database for documents with a query engine, it is not presented in the “Storing data” section of the documentation of GAE. Also unlike a lot of documentation on how Paxos is used underneath the Datastore to provide high availability even in the event of catastrophic failure of one data center, I have not found much on the architecture of the search API store. 

Does anybody have any pointers to such material?





--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.

Kaan Soral

unread,
Jan 23, 2016, 4:59:56 AM1/23/16
to Google App Engine
Search API seems like "a deal with the devil" to me, for some time you get all that you could dream for, but at one point, you suffer and die painfully

I haven't reached the limits of Search API myself yet, I haven't seen anyone that did either, but once you reach the limits, you can't add any more documents, the index gets stuck - I'm guessing, at that point, one must implement another system to start deleting unimportant documents to make room for more important stuff, this applies to my use case, but it might not apply to everyone's

It would be great if Search API did this on it's own, if you reach the limit, send in an argument at document.add's to remove the least ranked document automatically

Anyway, now that I've been using the Search API for some time, I regret some of the thing that I did with "datastore", I have complex and well calibrated indexes and filters, filled with workarounds and trade-off's

At one point, I might re-visit them and just re-do things with Search API using a separate index for specific use cases

TL;DR: Search API is awesome but look out for the limitations, it's definitely not scalable

Nick (Cloud Platform Support)

unread,
Jan 25, 2016, 12:48:53 PM1/25/16
to Google App Engine
Hey Kaan,

I'm not sure I can comment on some of the colourful analogies employed in your commentary, but I can provide the advice that you can increase your quota with the Search API up to 200GB of storage by submitting a request, and the documentation also mentions that all the quotas can be negotiated by discussing with a support representative.

We certainly aim to make all our services scalable, and a general rule to keep in mind in terms of realizing this abstract statement is that it's often not the service itself which has a limit, but what customers who are willing to communicate with us have expressly communicated was of interest to them in terms of scaling potential. We have in the past, and will again in the future, "moved mountains" for customers who are interested and willing to make a foray into a larger-scale deployment of any service, once both technical and billing details have been discussed.

This is of course still very abstract, and should not be taken as any kind of concrete commitment - my advice to anybody who might read this thread in future, and those in it right now is merely to get in touch with a support representative and explore your options. We are generally very flexible :)

Have a great day!

Kaan Soral

unread,
Jan 25, 2016, 1:18:24 PM1/25/16
to Google App Engine
Thanks Nick

I had the same impression, especially since I haven't seen anyone complain about reaching the limits
Sounds good

My largest index seems to be 613MB (although I have millions of stuff indexed), nice to see a standalone index tracker
200GB would probably go a long way when the time comes

Nick (Cloud Platform Support)

unread,
Jan 25, 2016, 1:27:08 PM1/25/16
to Google App Engine
Much obliged, best of luck! I wish your Search API indexes a prosperous and happy life of ease ;)
Reply all
Reply to author
Forward
0 new messages