Announcing the Matcher API for Trusted Testers

264 views
Skip to first unread message

Ikai Lan (Google)

unread,
Oct 19, 2010, 8:10:28 PM10/19/10
to google-a...@googlegroups.com
Hey everyone,

I wanted to announce that we are accepting signups for trusted testers for the Python Matcher API, which is available for local testing in the 1.3.8 SDK. The Matcher API allows developers to take advantage of Google's high performance matching infrastructure. Developers will be able to register a large number of queries for incoming documents to be matched against. The API will then match these queries against numerical and text properties of incoming data at a very high rate.

To better illustrate what the Matcher API can do, let’s pretend you are building a site that notifies users on stock ticker price changes. That is - a user of the site might sign up and register to receive alerts anytime BRK.A is greater than $500 but lower than $525 (by the way, if Berkshire Hathaway is ever in this price range, sell everything you have and buy. Disclosure: I am not a registered financial advisor). Here’s how this might be implemented on App Engine before:

1. When a user wants to create a new alert, a new AlertCondition entity is created. This entity records the ticker_symbol, min_price, max_price, and email to notify.

2. On an incoming notification of a stock price change of BRK.A between $500 and $525, we filter AlertCondition entities. Entities that match are returned, and from these entities, we create offline tasks to email each of the users about the price change.

This works decently, given that we don’t have many stock price changes or many alerts in the system. As the number of AlertConditions go up, we will need to change our application to break the queries into multiple pages, or even move them into task queues. Unfortunately for us, stock prices change very frequently, and (we hope) we will have many users. Fetching tens of thousands of Alert Conditions from the datastore can take on the order of seconds, causing the implementation detailed above to be difficult to scale for our expected usage.


How does the Matcher API help us solve this problem?
-------------------

The Matcher API allows us to register a set of queries, then filter incoming documents against these queries in a scalable, high-performance fashion. The type of problem being solved in the stock price notification example is a good example of how the Matcher API can be used. Here’s what we’d have to do in our application using the Matcher API:

1. When a user wants to create a new alert, we acquire an instance of a Python matcher object and register queries.

2. On an incoming stock price change, we run the alert against the matcher and try to find all the queries that matched. As queries are found, the matcher API enqueues tasks to process the results offline. Unlike the implementation using the datastore in the earlier example, the Matcher API performs checks in parallel and runs using a completely different service optimized for this use case. A single price change notification could match against hundreds of thousands of queries in the time span of a few seconds.

Let’s show this example in code (also posted here: http://pastie.org/1234174):

# We’re going to call subscribe. Here’s what we’re passing:
# dict - this means we are going to match against a Python dictionary. We can also
#          pass a db.Model type to match against. For instance, StockPrice
# “symbol: GOOG AND price > 500 AND price < 525” - this is our query
# “ikai:GOOG” - this is the name of our subscription. We’ll use this to map back to our
#          User. This must be unique, so we are using the User key and ticket combination
matcher.subscribe(dict, “symbol: GOOG AND price > 500 AND price < 525”, “ikai:GOOG”)


# When a new stock price update comes in, we create a Python dictionary representing
# all the parts we care about
change = { "symbol" : "GOOG", "price" : 515 }

matcher.match(change)

# The code above doesn’t execute right away. It makes an API call to Google’s
# matcher service, which, upon completion, begins dispatching matches to a
# task queue at the URI path /_ah/matcher. You’ll need to define the task queue handler:

application = webapp.WSGIApplication(
    [('/_ah/matcher', ChangeNotificationHandler)])

# You'd define the handler, a web handler for the results:

class ChangeNotificationHandler(webapp.RequestHandler):
def post(self):
  user_ids = self.request.get_all('id')                # Returns ['ikai:GOOG']
  results_count = self.request.get('results_count')    # Returns the total number of results
  results_offset = self.request.get('results_offset')  # Returns 0
  
  for id in user_ids:
     user_id, symbol = id.split(":")
    # now we have user_id and symbol
    # we’ll use the user_id to find the User and send them an email!

# Note that subscriptions last, by default, 24 hours, so we'll need to create a
# cron job that re-registers them.

What makes Matcher API really powerful are the performance characteristics. We can easily return hundreds of thousands of matches in seconds.


Tip of the iceberg
-------------------

It’s possible to filter on many other types data. Here are a few examples of what this API could be used for:

- matching incoming status updates for specific words or phrases (think Google Alerts or Twitter real-time search updates)
- creating a real time notification system for location based services like Google Latitude, allowing users to subscribe to their favorite locations for users matching certain criteria
- any kind of notification service with a large number of notifications and incoming data

The full API is much more robust than the stock prices example. You can find more documentation here:

http://code.google.com/p/google-app-engine-samples/wiki/AppEngineMatcherService

You'll also want to see the sample application here:

http://code.google.com/p/google-app-engine-samples/source/browse/#svn/trunk/matcher-sample


Sounds cool, what do I have to do?
-------------------

1. Start playing around with the Matcher API in your local SDK!

2. Add yourself to the trusted tester list here:

https://spreadsheets4.google.com/a/google.com/viewform?formkey=dEc5eFp4NmRqdHI5Rk40M0FWdHBCbUE6MQ

Check it out and sign up if this is something you can make use of! If you have any questions about what the API can be used for, let us know and we’ll try to answer any questions to may have.

- Ikai, posted on behalf of Bob, Bartek and the Matcher API team

Jeff Schwartz

unread,
Oct 19, 2010, 8:17:13 PM10/19/10
to google-a...@googlegroups.com
Sounds great. Any plans for Java support for this api? Please say YES :)

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Jeff

Nickolas Daskalou

unread,
Oct 19, 2010, 8:29:42 PM10/19/10
to google-a...@googlegroups.com
Hi Ikai,

I've tried accessing the trusted tester list but I get this permission error from Google Docs:


We're sorry, <my email address> does not have permission to access this spreadsheet.

You are signed in as <my email address>, but that email address doesn't have permission to access this spreadsheet. (Sign in as a different user or request access to this document)


where <my email address> is this email address I'm sending from now (it's my Google Apps + Google Account email address).

Nick


--

Ikai Lan (Google)

unread,
Oct 19, 2010, 8:36:22 PM10/19/10
to google-a...@googlegroups.com
That's my fault, here's a working link:


And yes, Java support is coming.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine

Jeff Lindsay

unread,
Oct 19, 2010, 8:37:50 PM10/19/10
to Google App Engine
Yes, I would love to apply since I'm building an app that would be
perfect for this. However, the link is broken as Nick mentioned.

On Oct 19, 5:29 pm, Nickolas Daskalou <n...@daskalou.com> wrote:
> Hi Ikai,
>
> I've tried accessing the trusted tester list but I get this permission error
> from Google Docs:
>
> We're sorry, <my email address> does not have permission to access this
> spreadsheet.
>
> You are signed in as <my email address>, but that email address doesn't have
> permission to access this spreadsheet. (Sign in as a different user or
> request access to this document)
>
> where <my email address> is this email address I'm sending from now (it's my
> Google Apps + Google Account email address).
>
> Nick
>
> On 20 October 2010 11:10, Ikai Lan (Google)
> <ikai.l+gro...@google.com<ikai.l%2Bgro...@google.com>
> >http://code.google.com/p/google-app-engine-samples/wiki/AppEngineMatc...
>
> > You'll also want to see the sample application here:
>
> >http://code.google.com/p/google-app-engine-samples/source/browse/#svn...
>
> > Sounds cool, what do I have to do?
> > -------------------
>
> > 1. Start playing around with the Matcher API in your local SDK!
>
> > 2. Add yourself to the trusted tester list here:
>
> >https://spreadsheets4.google.com/a/google.com/viewform?formkey=dEc5eF...
>
> > Check it out and sign up if this is something you can make use of! If you
> > have any questions about what the API can be used for, let us know and we’ll
> > try to answer any questions to may have.
>
> > - Ikai, posted on behalf of Bob, Bartek and the Matcher API team
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

Gaurav Vaish

unread,
Oct 19, 2010, 10:27:51 PM10/19/10
to Google App Engine
Hi Ikai,

That's a fantastic news!

btw wondering if "processing pipeline" architecture is somewhere down
the line in the roadmap?
I have been working with FAST and OpenPipe, to name a couple, for
document processing... and it will be great to have such a feature
incorporated. Is something in roadmap yet? Or any visbility on the
same?



-Gaurav
www.mastergaurav.com


On Oct 20, 5:36 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> That's my fault, here's a working link:
>
> https://spreadsheets.google.com/a/google.com/viewform?hl=en&formkey=d...
>
> And yes, Java support is coming.
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> Blogger:http://googleappengine.blogspot.com
> Reddit:http://www.reddit.com/r/appengine
> Twitter:http://twitter.com/app_engine
>
> On Tue, Oct 19, 2010 at 5:29 PM, Nickolas Daskalou <n...@daskalou.com>wrote:
>
> > Hi Ikai,
>
> > I've tried accessing the trusted tester list but I get this permission
> > error from Google Docs:
>
> > We're sorry, <my email address> does not have permission to access this
> > spreadsheet.
>
> > You are signed in as <my email address>, but that email address doesn't
> > have permission to access this spreadsheet. (Sign in as a different user or
> > request access to this document)
>
> > where <my email address> is this email address I'm sending from now (it's
> > my Google Apps + Google Account email address).
>
> > Nick
>
> >  On 20 October 2010 11:10, Ikai Lan (Google) <ikai.l+gro...@google.com<ikai.l%2Bgro...@google.com>
> >>http://code.google.com/p/google-app-engine-samples/wiki/AppEngineMatc...
>
> >> You'll also want to see the sample application here:
>
> >>http://code.google.com/p/google-app-engine-samples/source/browse/#svn...
>
> >> Sounds cool, what do I have to do?
> >> -------------------
>
> >> 1. Start playing around with the Matcher API in your local SDK!
>
> >> 2. Add yourself to the trusted tester list here:
>
> >>https://spreadsheets4.google.com/a/google.com/viewform?formkey=dEc5eF...
>
> >> Check it out and sign up if this is something you can make use of! If you
> >> have any questions about what the API can be used for, let us know and we’ll
> >> try to answer any questions to may have.
>
> >> - Ikai, posted on behalf of Bob, Bartek and the Matcher API team
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Google App Engine" group.
> >> To post to this group, send email to google-a...@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>
> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/google-appengine?hl=en.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsu...@googlegroups.com>
> > .

Ikai Lan (Google)

unread,
Oct 20, 2010, 12:37:44 PM10/20/10
to google-a...@googlegroups.com
No, it's not on the roadmap. You can check this roadmap here:


Note that our roadmap is not an exhaustive list of everything we intend to do, just a list of the highest priority items for us.


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

Ikai Lan (Google)

unread,
Oct 20, 2010, 7:12:22 PM10/20/10
to google-a...@googlegroups.com
I need to post a correction to my early example. Here's the code you should use:

    # First, define a schema for our StockTopic
    schema = {str: ["symbol"], float: ["price"]}
    matcher.subscribe(dict,
                      "symbol:GOOG AND price > 500 AND price < 525",
                      "ikai:GOOG",
                      schema=schema,
                      topic="StockTopic")

    # Note: A subscription is active if the state returned by list is: matcher.SubscriptionState.OK
    # After waiting for the subscription to become active, you can do the following:

    change = { "symbol" : "GOOG", "price" : 515.0 }
    matcher.match(change, topic="StockTopic")

The reason you need to use this code is because you can't just pass an arbitrary dictionary. You need to register a schema to be used if you don't use a db.Model as the first argument to matcher.subscribe(), and you need to pass it to the subscribe() method.

Note that the subscription needs to become active. This isn't instantaneous, but it's pretty darned fast.

Anyway, here's the trusted tester signup link for anyone that didn't catch it:


--
Ikai Lan 
Developer Programs Engineer, Google App Engine



Bob Wyman

unread,
Oct 20, 2010, 10:02:53 PM10/20/10
to Google App Engine
FYI: The same example implemented using the db.Model methods would
look something like the following:

class StockTopic(db.Model):
symbol = db.StringProperty()
price = db.FloatProperty()

matcher.subscribe(StockTopic, "symbol:GOOG AND price > 500 AND price <
525", "ikai:GOOG")

# Wait some small amount of time for the subscription to become active
and then:

quote = StockTopic()
quote.symbol = "GOOG"
quote.price = 515.0
matcher.match(quote)

The neat thing about this is that, if you wanted to, you could then
just do quote.put() to insert the message into datastore. (Which might
not make sense for a stock price alerting app, but *would* make sense
for many other applications.

bob wyman

On Oct 20, 7:12 pm, "Ikai Lan (Google)" <ikai.l+gro...@google.com>
wrote:
> I need to post a correction to my early example. Here's the code you should
> use:
>
>     # First, define a schema for our StockTopic
>     schema = {str: ["symbol"], float: ["price"]}
>     matcher.subscribe(dict,
>                       "symbol:GOOG AND price > 500 AND price < 525",
>                       "ikai:GOOG",
>                       schema=schema,
>                       topic="StockTopic")
>
>     # Note: A subscription is active if the state returned by list
> is: matcher.SubscriptionState.OK
>     # After waiting for the subscription to become active, you can do the
> following:
>
>     change = { "symbol" : "GOOG", "price" : 515.0 }
>     matcher.match(change, topic="StockTopic")
>
> The reason you need to use this code is because you can't just pass an
> arbitrary dictionary. You need to register a schema to be used if you don't
> use a db.Model as the first argument to matcher.subscribe(), and you need to
> pass it to the subscribe() method.
>
> Note that the subscription needs to become active. This isn't instantaneous,
> but it's pretty darned fast.
>
> Anyway, here's the trusted tester signup link for anyone that didn't catch
> it:
>
> https://spreadsheets.google.com/a/google.com/viewform?hl=en&formkey=d...
>
> --
> Ikai Lan
> Developer Programs Engineer, Google App Engine
> Blogger:http://googleappengine.blogspot.com
> Reddit:http://www.reddit.com/r/appengine
> Twitter:http://twitter.com/app_engine
>
> On Wed, Oct 20, 2010 at 9:37 AM, Ikai Lan (Google)
> <ikai.l+gro...@google.com<ikai.l%2Bgro...@google.com>
>
>
>
>
>
>
>
> > wrote:
> > No, it's not on the roadmap. You can check this roadmap here:
>
> >http://code.google.com/appengine/docs/roadmap.html
>
> > Note that our roadmap is not an exhaustive list of everything we intend to
> > do, just a list of the highest priority items for us.
>
> > --
> > Ikai Lan
> > Developer Programs Engineer, Google App Engine
> > Blogger:http://googleappengine.blogspot.com
> > Reddit:http://www.reddit.com/r/appengine
> > Twitter:http://twitter.com/app_engine
>
> > On Tue, Oct 19, 2010 at 7:27 PM, MasterGaurav <gaurav.va...@gmail.com>wrote:
>
> >> Hi Ikai,
>
> >> That's a fantastic news!
>
> >> btw wondering if "processing pipeline" architecture is somewhere down
> >> the line in the roadmap?
> >> I have been working with FAST and OpenPipe, to name a couple, for
> >> document processing... and it will be great to have such a feature
> >> incorporated. Is something in roadmap yet? Or any visbility on the
> >> same?
>
> >> -Gaurav
> >>www.mastergaurav.com
>
> >> On Oct 20, 5:36 am, "Ikai Lan (Google)" <ikai.l+gro...@google.com<ikai.l%2Bgro...@google.com>
> >> ikai.l%2Bgro...@google.com <ikai.l%252Bgro...@google.com>>
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages