ColdBox 3.6 - Advice on Query/Caching strategy for large dataset

130 views
Skip to first unread message

Nolan Dubeau

unread,
Jul 18, 2013, 11:17:58 AM7/18/13
to col...@googlegroups.com
HI folks,

I working on an AngularJS / ColdBox app and need some advice on how to handle something.  The app is a real-time interface which connects to a Websocket channel.  

When the interface is loaded, events will come in through the socket and be pushed to a JS array. Prior to the events going through the websocket they are being processed by CF, stored in a DB, and then funnelled to the socket. There are about 6 different types of events that come through. Depending on the situation, the audience size of the people who are generating those events could be extremely large - i.e 100,000 users or more, and depending on the interaction each user could generate several events.  So very quickly both CF and JS will be doing some heavy lifting.   

If the interface is reloaded in the browser I need to get the history of the previous events....possibly several hundred thousand records....and process/filter them for display in the JS interface.  I've never built anything that needed a query of this magnitude and I'm trying to determine how I could build a caching strategy, either as the events come into CF (before socket), appending to an cached object,  or build some sort of iterative loader that looks at the total number of records, divides that into pages, and then javascript loops through the pages to load the full data set.

In terms of architecture, I'm using an external provider for web sockets, and the interface that displays the events is on a different server than the system that receives the events.  both systems have a shared JDBC JSON cache. I'm currently not using anything like Node.JS, which may have some benefit in this type of situation as an added technology.

I would appreciate your thoughts on the above.

Thanks.

Nolan

Tom Miller

unread,
Jul 18, 2013, 11:50:41 AM7/18/13
to col...@googlegroups.com

Nolan,

 

If you using json type objects for storage and retrieval of data, I’d highly recommend elasticsearch over a database.

 

I store similar data, and I’ve found elasticsearch to be excellent; you just throw it a json object, and it stores it and returns it in pageable format, and it’s highly searchable/filterable too and it’s incredibly fast.

 

http://www.elasticsearch.org/

 

Not sure if it will fit your requirements, but I thought I’d throw that in there for you.

 

Tom.

--
--
You received this message because you are subscribed to the Google Groups "ColdBox Platform" group.
For News, visit
http://blog.coldbox.org
For Documentation, visit
http://wiki.coldbox.org
For Bug Reports, visit
https://ortussolutions.atlassian.net/browse/COLDBOX
---
You received this message because you are subscribed to the Google Groups "ColdBox Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
coldbox+u...@googlegroups.com.
To post to this group, send email to
col...@googlegroups.com.
For more options, visit
https://groups.google.com/groups/opt_out.
 
 

Nolan Dubeau

unread,
Jul 18, 2013, 11:52:07 AM7/18/13
to col...@googlegroups.com
Thanks Tom,

I will take a look.  

Nolan

Nolan Dubeau

unread,
Jul 18, 2013, 11:59:14 AM7/18/13
to col...@googlegroups.com
Tom,

This looks very promising, thank you.  Could you shed some light on your environment configuration?  i.e  dedicated machine?  how much storage/ram?

Cheers,

Nolan

Tom Miller

unread,
Jul 18, 2013, 12:22:26 PM7/18/13
to col...@googlegroups.com

I have a dedi yes –Elasticsearch is running on an EC2 large instance (7GB), but I’ve also got other things running on that server; tomcat (alfresco), MySQL and it’s a NFS server too. ElasticSearch is tuned to use 1GB of RAM, which I’ve found ample. Storage wise I have 1TB, but elasticsearch only uses about 500Mb at the moment.

 

I’d guess it would run fine on a EC2 Small instance…

Nolan Dubeau

unread,
Jul 18, 2013, 1:03:00 PM7/18/13
to col...@googlegroups.com
Hi Tom.

Thanks for the environment details.
Another question for you...Did you use it as a ColdBox cache and create a cache provider or did you simply use the REST api? In other words, how did you interact with Elasticsearch from within your CB app?

Thanks.

Nolan

Tom Miller

unread,
Jul 18, 2013, 1:04:22 PM7/18/13
to col...@googlegroups.com

Rest. I have a cfc I use to interface with it. Happy to share it, although it’s really just a simple function which accepts a few parameters. I’d show you an example if you’re interested?

br...@bradwood.com

unread,
Jul 18, 2013, 1:16:53 PM7/18/13
to col...@googlegroups.com
That sounds like it would be a good start for a CacheBox provider so anyone can drop in Elasticache support in their ColdBox app :)
 
Aaron Greenlee was working on that at one point in time-- I wonder what he accomplished...

Thanks!

~Brad

ColdBox Platform Evangelist
Ortus Solutions, Corp

E-mail: br...@coldbox.org
ColdBox Platform: http://www.coldbox.org
Blog: http://www.codersrevolution.com 

Tom Miller

unread,
Jul 18, 2013, 1:16:45 PM7/18/13
to col...@googlegroups.com

https://gist.github.com/tgmweb/6031116

 

There’s a simple example

 

From: col...@googlegroups.com [mailto:col...@googlegroups.com] On Behalf Of Nolan Dubeau


Sent: 18 July 2013 18:03
To: col...@googlegroups.com

Tom Miller

unread,
Jul 18, 2013, 1:17:53 PM7/18/13
to col...@googlegroups.com

That was ElasticCache – AWS version of MemcacheD.

 

ElasticSearch is different, it’s basically a fork of Lucene I think….

br...@bradwood.com

unread,
Jul 18, 2013, 1:23:47 PM7/18/13
to col...@googlegroups.com
Ahh sorry-- wrong "elasti..."  :)

Nolan Dubeau

unread,
Jul 18, 2013, 6:03:31 PM7/18/13
to col...@googlegroups.com
This is great.  I'm going to look at writing a CB plugin for this and will post it on forgebox (with credit to you of course Tom!)  

Thanks!

Nolan
Reply all
Reply to author
Forward
0 new messages