storm VS esper

5,715 views
Skip to first unread message

liulei

unread,
Sep 20, 2011, 4:09:28 AM9/20/11
to storm-user
I need one CEP system, I have researched esper, there are powerful
event processing model in esper, example: select, from, where, group-
by, having, order-by, limit and distinct clauses and so on. How can I
use storm to implement these functions of esper, and how many events
can be handled by storm in second?


Thanks,

LiuLei

nathanmarz

unread,
Sep 20, 2011, 11:27:21 AM9/20/11
to storm-user
Hi LiuLei,

Storm provides the primitives for doing realtime computation in a way
that's horizontally scalable and does not lose data. Esper, as far as
I can tell, is not horizontally scalable. I've been able to process
millions of messages per second with Storm, and there's no reason
Storm can't handle message throughputs higher than that (just add more
machines).

Storm does not have higher-level abstractions for doing streaming
aggregations like Esper does (yet). However, there's no reason you
can't implement those things on top of Storm's API. The really hard
stuff -- parallelization, fault-tolerance, guaranteeing message
processing -- is provided by Storm. Holding onto tuples in a bolt and
doing rolling aggregations is relatively easy.

-Nathan

Nathan Marz

unread,
Sep 26, 2012, 3:52:32 PM9/26/12
to storm...@googlegroups.com
You may find Trident useful, as it is a higher level abstraction that includes powerful facilities for managing state.


On Thu, Sep 20, 2012 at 3:52 PM, Gabi Campos <gabi....@gmail.com> wrote:
Hi Nathan,

I have looked at esper as well as at Erlang for an algorithm I am trying to implement but I think Storm is the way to go, just wondering there are any pointers to examples out there. The algorithm needs to match between three or four requestors (tens of thousands of users) of a resource (thousands of service providers) such that it creates the ideal group of requestors matched to a resource. This requires some math implemented in bolts as well as some running over an index of resources in other bolts and the piece for which I choose Storm is that over time (even 1 second) a decision made on the most suitable resource might change due to change of that resources status, so there's a piece where you need a map-reduce approach that runs continuously until a match of 3-4 requesters and 1 resource has been made.

Let me know if this rings any bells. I will let you know how i progress anyways.
 
Thanks,
Gabi



--
Twitter: @nathanmarz
http://nathanmarz.com

Thomas Hunziker

unread,
Dec 5, 2012, 4:16:29 PM12/5/12
to storm...@googlegroups.com
For those who are interessted in a CEP. Im currently writting my master thesis about this.

I have wrote a SPARQL like implementation for processing triple streams. Currently it supports aggregations (min,max,avg.), joins (also with temporal expressions), functions to calculate new values and filters. 

Matteo Cusmai

unread,
Dec 6, 2012, 4:01:15 PM12/6/12
to storm...@googlegroups.com
Hi all,
i am very interested in CEP discussion.
I have developed a CEP system based on GSN (Global Sensor Network) and DROOLS and WEKA.
GSN has the concept of Wrapper (like Spout) and Processing Class (like Bolt) and it adds tools such as SQL Streaming (like Esper).
Drools fusion is a streaming expert system and WEKA is a machine learning framework.

However GSN doesn't have all mechanism that STORM has, such as parallelism, fault tolerance, etc... so my idea is to move my work on STORM.
There are anyone interested in this topic?

Bye bye,
Matteo.

OG

unread,
Dec 6, 2012, 4:26:33 PM12/6/12
to storm...@googlegroups.com
I'd love to see a non-GPL Esper-like java library.  I don't know of any: https://twitter.com/otisg/status/266642906465570816

And it looks like I'm not alone in wanting this.

One would have to implement all the sliding window stuff and aggregations, but I just looked at Esper source the other day and this monster is close to 3000 Java classes!

Otis

Matteo Cusmai

unread,
Dec 6, 2012, 5:18:23 PM12/6/12
to storm...@googlegroups.com
Hi Otis,
GSN is GPL.... but it has sliding windows and other very nice sql aggregations.
But, from my experience, sql streaming is very limited, because it allows you to use simple sql operators, such as max, avg, min, etc..
but usually you need to do much more computations.
Try to use Drools fusion.

Olga Gorun

unread,
Dec 7, 2012, 1:37:09 AM12/7/12
to storm...@googlegroups.com
Hi Matteo,

We've built some time ago a system working on Storm + Drools Fusion.
Storm is in charge of parallelism (and so scalability) and fault tolerance and Drools Fusion - for streaming processing. Integration key: use fields grouping such a way to be able to distribute work to independent instances of Drools Fusion engine.

What are advantages of GSN? And what is the role of Weka in your system?

Regards,
Olga Gorun

Matteo Cusmai

unread,
Dec 7, 2012, 2:32:30 PM12/7/12
to storm...@googlegroups.com
Hi Olga,
GSN advantages are:
  • persistence of each tuple / message on db
  • sql streaming capabilities such as group by, max, min, avg, etc
  • sql streaming between different stream source with sql operators like join
  • time synchronization of tuple belong all gsn instances
  • others...

But it isn't fault tolerant and it doesn't manage topologies and cluster.

Finally, it is GPL...

Thomas Söhngen

unread,
Jan 16, 2013, 11:49:25 AM1/16/13
to storm...@googlegroups.com
Hi Olga,

we would be very interested in some details about your Drools implementation:

* Do you use separate servers for Drools Fusion and Storm or do both run on the same machines in parallel?
* Do you run an application server to host Drools (like JBoss AS)?
* Are your rules static or do they change over time? How do you manage them?

I think Drools and Strom could be a perfect fit for our needs (Storm for pre-processing and aggregation, Drools for CEP with a large, dynamic rule-base). This could be of interest for a lot of people, maybe together we could make a tutorial or Blog post about the setup.

Thanks in advance,
Thomas
-- 
Thomas Söhngen

Office: +49 221 294 975 20
Mobile: +49 178 732 1202
Email: thomas....@stockpulse.de

www.stockpulse.de
www.facebook.com/stockpulse

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse UG (haftungsbeschränkt)
Sitz der Gesellschaft: Köln
Amtsgericht: Köln (HRB 145359)
Vertretungsberechtige Geschäftsführer: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse UG (Limited Liability)
Registered Office: Cologne
District Court: Cologne HRB (145359)
Managing Director: Stefan Nann, Jonas Krauss 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Quinton Anderson

unread,
Jan 16, 2013, 5:05:34 PM1/16/13
to storm...@googlegroups.com
Hi,

The way I have done this in the past is simply to embed the knowledge session into the bolt, you don't need JBoss AS at all. You just include the dependancies into your POM and then create the knowledge session. The maven dependancies:
<dependency>
      <groupId>org.drools</groupId>
      <artifactId>drools-core</artifactId>
      <version>5.5.0.Final</version>
    </dependency>
    <dependency>
        <groupId>org.drools</groupId>
        <artifactId>drools-compiler</artifactId>
        <version>5.5.0.Final</version>
    </dependency>

Then in your bolt (this is a very simplistic example of course):
public void prepare(Map stormConf, TopologyContext context,
OutputCollector collector) {
this.collector = collector;
//TODO: load the rule definitions from an external agent instead of the classpath, read the governor docs
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder();
kbuilder.add( ResourceFactory.newClassPathResource( "/X.drl", 
             getClass() ), ResourceType.DRL );
if ( kbuilder.hasErrors() ) {
   LOG.error( kbuilder.getErrors().toString() );
}
KnowledgeBase kbase = KnowledgeBaseFactory.newKnowledgeBase();
kbase.addKnowledgePackages( kbuilder.getKnowledgePackages() );
ksession = kbase.newStatelessKnowledgeSession();
}

public void execute(Tuple input) {
//you could also extract specific fields and execute against them, will simplify your rules somewhat
ksession.execute( input );
                //get objecst from session and emit as required
}

Hope that helps.

Matteo Cusmai

unread,
Jan 16, 2013, 5:27:47 PM1/16/13
to storm...@googlegroups.com
Hi Quinton,
It seems that you aren't using drools fusion, right?
Drools fusion is the cep version of drools.
Cheers,
Matteo.

Quinton Anderson

unread,
Jan 16, 2013, 6:19:05 PM1/16/13
to storm...@googlegroups.com
Hi,

They aren't actually separate, fusion simply adds the temporal operators, which makes the rules engine a cep. This functionality is already included in the drools jars though.

Sent from my iPhone

Thomas Söhngen

unread,
Jan 18, 2013, 12:10:31 PM1/18/13
to storm...@googlegroups.com
Hi Quinton,

thank you very much for the example!
How will Drools cache the temporal data of the stream? Is it possible to
define a central repository for this? So even if the Drools-Bolts would
be spawned over several executers and workers, could we join over all
the streams? We will need Guvnor anyways, because our rule-base has to
be very flexible, so I think we won't get around an AS.

Regards,
Thomas
--
Thomas S�hngen
Email: thomas....@stockpulse.de

www.stockpulse.de
www.facebook.com/stockpulse

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse UG (haftungsbeschr�nkt)
Sitz der Gesellschaft: K�ln
Amtsgericht: K�ln (HRB 145359)
Vertretungsberechtige Gesch�ftsf�hrer: Stefan Nann, Jonas Krauss

Quinton Anderson

unread,
Jan 19, 2013, 3:36:10 PM1/19/13
to storm...@googlegroups.com
Hi Thomas,

I would let storm manage the state for you, I have found the stateful knowledge sessions degrade in performance quite quickly once the number of facts gets too high. In terms of joining, you would need to figure out how to segregate your domain then and then use groupBy grouping. I would not try implement distributed state across the bolts personally, particularly when you require strong consistency (HBase uses a consistent hash that works very well for the same thing).

Hope that helps.


On Saturday, January 19, 2013 4:10:31 AM UTC+11, thomas....@stockpulse.de wrote:
Hi Quinton,

thank you very much for the example!
How will Drools cache the temporal data of the stream? Is it possible to
define a central repository for this? So even if the Drools-Bolts would
be spawned over several executers and workers, could we join over all
the streams? We will need Guvnor anyways, because our rule-base has to
be very flexible, so I think we won't get around an AS.

Regards,
Thomas

Am 1/17/2013 12:19 AM, schrieb Quinton Anderson:
> Hi,
>
> They aren't actually separate, fusion simply adds the temporal operators, which makes the rules engine a cep. This functionality is already included in the drools jars though.
>
> Sent from my iPhone
>
> On 17/01/2013, at 9:27, Matteo Cusmai <cusmai...@gmail.com> wrote:
>
>> Hi Quinton,
>> It seems that you aren't using drools fusion, right?
>> Drools fusion is the cep version of drools.
>> Cheers,
>> Matteo.


--
Thomas S�hngen

Office: +49 221 294 975 20
Email: thomas....@stockpulse.de

www.stockpulse.de
www.facebook.com/stockpulse

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
StockPulse UG (haftungsbeschr�nkt)
Sitz der Gesellschaft: K�ln
Amtsgericht: K�ln (HRB 145359)
Vertretungsberechtige Gesch�ftsf�hrer: Stefan Nann, Jonas Krauss

Matteo Cusmai

unread,
Jan 20, 2013, 5:25:27 PM1/20/13
to storm...@googlegroups.com
Hi Quinton,
How many fact do you insert into working memory in order to have degradation?
Thanks,
Matteo.

Quinton Anderson

unread,
Jan 20, 2013, 5:28:33 PM1/20/13
to storm...@googlegroups.com
I wouldn't go over 1000 per node.

If you have highly associative model with low latency requirements and massive volume then I wouldn't go with Drools, I would implement the logic as vanilla, optimized java within the bolts.

Matteo Cusmai

unread,
Jan 21, 2013, 2:13:31 AM1/21/13
to storm...@googlegroups.com
Hi Quinton,
let me clarify my previous question.
When you use drools fusion, you have to distinguish facts (static information) from events (dynamic data representing data stream).
So, how many fact do you have, and how many events per second are you inserting in wm?

Deepak Sharma

unread,
Jan 21, 2013, 2:17:50 AM1/21/13
to storm...@googlegroups.com
I will also be interested to work about this use case.
I will be more interested in fraud detection application using drools and storm.

Thanks
deepak

avirup....@gmail.com

unread,
Sep 10, 2013, 6:30:00 AM9/10/13
to storm...@googlegroups.com
Hi,
    This example is really helpful.But can you help me how to write rule in drl  in this case ? In this case rule will be fired in execute method of the bolt.So there will be no separate execution class for the rule right? so while writing rule in drl from where it will take input ? I had set up drools in eclipse and used to write rules in drl there and fired it using an execution class and a POJO.But what should be the approach to write rules in drl now?
Reply all
Reply to author
Forward
0 new messages