Remote realtime FIFO implementation

84 views
Skip to first unread message

Grigory Fishilevich

unread,
Dec 10, 2011, 3:39:51 PM12/10/11
to google-a...@googlegroups.com
Hi all, 

I'm new to Google App Engine and I'm looking for an advice. 

I want write a remote FIFO for realtime access outside of GAE. 
This FIFO should contains small integers only and works as follows: 
- client A write integer-values to FIFO
- client B reads the values 1-5 seconds later
- after what the values are invalid and can be deleted
- there are always only 2 client per FIFO
- it should be possible to have x FIFOs in the app

Importent for me: ability for realtime and concurrency, no deadlock etc. 

So, I'm new to GAE, wich service can I use for my App? 
Pull Queues are limited to 100 active queues, it's not enough for me. 

Should I use memcache? DB? 

Thank in advance

Brandon Wirtz

unread,
Dec 12, 2011, 12:27:55 AM12/12/11
to google-a...@googlegroups.com

Memcache won’t work by itself. DataStore writes would make this expensive for what you are looking to do.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/aEVQATdRwkQJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Rishi Arora

unread,
Dec 12, 2011, 10:08:51 AM12/12/11
to google-a...@googlegroups.com
How about this:

1.  Rely on an in-memory implementation of FIFO as the primary storage.
2.  Store stuff to memcache as a temporary backup to protect against your front-end instances dying because of over-usage of memory, or other reasons.
3.  Memcache writes should probably be asynchronous (use taskqueues) to avoid incurring unnecessary latency (I think taskqueue.add is faster, but you should try it out)
4.  If your client B doesn't read from the FIFO for some period of time, say 60 seconds, back up your FIFO data into Datastore (asynchronously of course).

Gerald Tan

unread,
Dec 12, 2011, 2:38:46 PM12/12/11
to google-a...@googlegroups.com
There are two ways to do it.

1. Datastore, optionally memcached. Memcache by itself is not reliable, you will need the Datastore fallback which means everything written to the Memcache has to be written to the Datastore too, so Memcache will probably not improve anything except saving on the Datastore read from Client B

2. Store in Backend Instance RAM. If your traffic is really high, this may become cheaper than using the Datastore.

James X Nelson

unread,
Dec 14, 2011, 7:35:31 PM12/14/11
to google-a...@googlegroups.com
Do not rely on a frontend instance's RAM-cache.  The instances do not share RAM and the only way you can directly address a particular instance is using a front-facing backend, as noted by Gerald.

The trouble is, even if you could safely store all your ints in addressable RAM space, you are going to need code synchronization on your variables and maps/lists to avoid CoMod exceptions.

This is not only painful to code, if every method is synchronizing on a single map of values, you effectively lose all ability to run in parallel and will probably be slower than single threaded.

If you want to take advantage of threadsafe {you REALLY want to take advantage of threadsafe}, you need every request to be servable from instance scope.  That is, using as few static vars / servlet instance vars as possible {servlets are singletons, remember!} 

Honestly though, I don't think you even need either ram or ds... Maybe not even memcache.


I will describe for you one possible scenario to do this efficiently; you don't have to take my advice, but here it is:

If, instead of having client B poll for the fifo data, you use the channel api / url fetch to push messages directly from client A to client B.
Instead of making client A save to appengine, then have client B get from appengine, just use appengine to let client A save to client B.

If the client is in a web browser, use the channel api to push messages {frontends only, no backend support yet}.
If the client is a web server, use url fetch to push the data.

If client A is writing ints in batches, you can just send those ints in batches.  A serialized CSV string makes very good sense here. 



Should you have any trouble with the channel api, you may want to keep copies of your int frames in memcache.

You can rely on messages going through most of the time, but if client B is expecting data and doesn't get it, you should have a backup to request it from the cache.

Because you might expect to need your backup, and because you need data to be FIFO, you will need either a monatomically increasing index, or a list of serialized int frames in cache.


 To use monatomically increasing index, use: 

cache = MemcacheServiceFactory.getMemcacheService("clientBnamespace");
Long index = cache.increment("fromClientA", 1,0L);

This acts as a shared long value you can safely increment across all instances.

Use this to timestamp your packet of ints.

cache.put("fromClientA"+index, "1,2,3,4,5", Expiration.byDeltaMillis(5000));

Note that this will always be deleted within 5 seconds, so no cleanup is on you.  You may want to give it 30 seconds or more.

Just make sure you send the index value with the packets, so if clientB has to manually ask, it can do cache get on "fromClientA"+lastReceivedIndex++ until no values are found.

This, of course, could easily drop some of your packets, so maybe not the best solution...



The other option is to have a list of ints in a single memcache location.

So long as you stick to low-level memcache, you can do stuff like .putIfUntouched() so you can catch the error to read in updates from other threads before adding to the list.

If .putIfUntouched fails, you need to do a fresh get(), add in the new changes, and try again.

Just remember, the order that gets are processed does not necessarily equal the order that puts are finished.
This is where your monatomically incremented long int comes in very handy.

If you can gaurantee that client A won't send more data until it's previous send has finished, you don't have to worry. 

If client A fires off data as soon as it gets it, you will need to timestamp / orderstamp your packets. 

Storing a list of packets {with each packet object having a list of ints and a long order stamp} in memcache will make for easier, safer reads by clientB.

Just remember that memcache serializes/deserializes lists you put into it, so the list you get is NOT backed by the memcache value; you have to commit a successful put before other threads will see your changes.




Don't use DS writes and reads if you don't have to.  Short-lived memcache operations have all of the features you want at the lowest price on appengine.

You may want a ds fallback if (!CapabilitiesServiceFactory.getCapabilitiesService().getStatus(Capability.MEMCACHE).getStatus().equals(CapabilityStatus.ENABLED))

To easily implement a fallback in java, write your .put(), .get(), .delete() methods in an interface.  Have one implementation use only memcache, and have another that uses ds+memcache or just ds.

If the capability service says memcache is having trouble, use the ds backed impl.  

So long as your FIFO code uses this interface, you can send it whatever impl you want.




Anyway, if you give me more details about the levels of control you have over client A and client B, I can likely help you distill a more specific implementation from the preceding rant. 

Grigory Fishilevich

unread,
Dec 19, 2011, 2:19:35 PM12/19/11
to Google App Engine
Hey guys, thanks a lot for the answers. I really have to think about
how I do it.

Problems I see:

- ds is too expensive, I await about 3 reads/write per second

- RAM-cache will not work (as James X Nelson says, I mean it's not
shared between instances, but I have to learn some more about RAM-
concept of GAE)

- as I understand channel api, it's only PUSH, so I can't push,
because both clients are behind NAT, so I can pull only

Sorry, I'm too tired today to understand all of James post ;) but I
will be back tomorrow ;)

Thanks again for the answers.

Grigory Fishilevich

unread,
Dec 19, 2011, 2:24:22 PM12/19/11
to Google App Engine
> both clients are behind NAT, so I can pull only

I mean, I can only pull from clients, both clients can not be reached
from GAE application :(

Reply all
Reply to author
Forward
0 new messages