fixed-time-window does not seem to work for me

654 views
Skip to first unread message

Juan Miguel Garcia

unread,
Aug 8, 2013, 5:00:23 AM8/8/13
to rieman...@googlegroups.com
Hi,

I have the following (partial) configuration:

(streams
  (where (service "Search")
       #(info "received search event" %)
       (with {:time (unix-time)}
    (fixed-time-window 60
    (fn [events]
    (let [num_searches (count events)]
    (prn (format "Number of events: %d" num_searches))
    #(info "Generating searches/min event" %)
      (index {:metric num_searches :state "ok" :service "searches/min" :time (unix-time)})
      ;(librato :gauge)
    )      
    )
    #(info "Generating fixed-time-window event" %)      
    )
       )
    )  
  )

I am trying to generate an event every minute, which would be the number of searches in my application. The events are generated, and they seem to be processed by the stream above. The events contain a metric, which is the execution time in milliseconds. So I want to aggregate all the search events and obtain the number of searches per minute. These are the events I receive:

INFO [2013-08-08 18:52:58,309] pool-1-thread-6 - riemann.config - received search event #riemann.codec.Event{:host local, :service Search, :state OK, :description Search Operation, :metric 6074, :tags nil, :time 1375951978307/1000, :ttl nil, :userAgent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36, :postcode 2000, :serviceTypeId 1, :howManyResults 1}

INFO [2013-08-08 18:53:01,746] pool-1-thread-8 - riemann.config - received search event #riemann.codec.Event{:host local, :service Search, :state OK, :description Search Operation, :metric 276, :tags nil, :time 275190396349/200, :ttl nil, :userAgent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36, :postcode 2000, :serviceTypeId 1, :howManyResults 64}

However, fixed-time-window does not send the list of events every 60 seconds. If I change it to moving time window, then it will work, but that is not what I need.

What am I doing wrong?

Thanks!
Juan

Aphyr

unread,
Aug 8, 2013, 1:20:08 PM8/8/13
to rieman...@googlegroups.com
On 08/08/2013 02:00 AM, Juan Miguel Garcia wrote:
> Hi,
>
> I have the following (partial) configuration:

> (with {:time (unix-time)}

Like most languages, Lisp evaluates the deepest-nested terms first, then
left-to-right at each branch.

This line calls (unix-time), which returns a number (e.g. 1234). Then it
makes a map: {:time 1234}. Then it calls (with {:time 1234} ...), which
returns a function that sets every event's time to 1234.

You probably wanted

(smap #(assoc % :time (unix-time)) ...).

By the way, is there a reason you're not using (rate) instead of the
windowing functions? If you use fixed-time-window, you have to hold all
N events in memory. If you use (with :metric 1 (rate 60)), it only
requires constant space for one number; much faster.

--Kyle

Juan Miguel Garcia

unread,
Aug 9, 2013, 2:27:58 AM8/9/13
to rieman...@googlegroups.com
Hi Kyle,

Thanks for your answer. According to the documentation, rate

Take the sum of every event over interval seconds and divide by the interval
size.

So I guess if there are 4 search operations per minute, the generated metric will be 4/60. But what I am interested in is the 4, i.e., the number of search operations.

How can I obtain this with rate?

Thanks,
Juan

Aphyr

unread,
Aug 9, 2013, 12:22:35 PM8/9/13
to rieman...@googlegroups.com
On 08/08/2013 11:27 PM, Juan Miguel Garcia wrote:
> Hi Kyle,
>
> Thanks for your answer. According to the documentation, rate
>
> Take the sum of every event over interval seconds and divide by the interval
> size.
>
>
> So I guess if there are 4 search operations per minute, the generated
> metric will be 4/60. But what I am interested in is the 4, i.e., the
> number of search operations.
>
> How can I obtain this with rate?

Rate is defined this way because it produces values invariant with
respect to the sampling interal; e.g. rate 60 and rate 5 will both be 4,
assuming a constant rate. This makes it easy to combine different rates
correctly, and to adjust precision without altering the scale of your
results.

Assuming this is the behavior you want, and you'd just like to measure
the rate per minute instead of the rate per second, all you have to do
is scale the events coming out of rate:

(rate 5 (scale 60 ...))

Changing the rate number controls the time precision; changing the scale
number controls the units. Make sense?

--Kyle

Juan Miguel Garcia

unread,
Aug 9, 2013, 6:34:05 PM8/9/13
to rieman...@googlegroups.com
It does, and it works perfectly well.

Thanks again, Kyle.

Cheers
Juan

Abhishek Gupta

unread,
Dec 4, 2014, 12:38:36 PM12/4/14
to rieman...@googlegroups.com
Hi Kyle,

Sorry for asking basic question. I have started working on Riemann lately and found very interesting. My requirement is to apply sliding window kind of feature, Here is what i am looking for:
- Any events crosses metric of more than X number,3 times in 3 sec window. We want to send notification. I tried to implement using below but some how its not working...

(let [email (mailer {:from "rie...@localhost.com"})]
(streams
    (where (and (service #"global_inventory_availability$")
          (not (expired? event)))
          #(info "[Inside sliding 1 rule]" %)
          (fixed-time-window 3
            (where (not (nil? event))
                #(info "event" %)
                (moving-event-window 3
                      (where (> metric 100.0)
                       #(info "[TC]" %)
                       (email "a...@localhost.com"))))
)))) 

My second info is not getting printed. Can you please help me on this?

Haim Ashkenazi

unread,
Dec 4, 2014, 2:04:55 PM12/4/14
to rieman...@googlegroups.com
Hi Abhishek,

I didn't really understand your code, but I'm achieving the same thing with the following:

(where (> metric SOMELIMIT)
  (with :metric 1
    (rate 3
      (where (> metric 1)
        prn))))

A few notes:
  • I wrote this directly in the email so I hope the parenthesis are balanced :)
  • Read the rate docs to understand what this does. The second metric is 1 because rate sums the metrics and devide by the number of seconds so 3/3 = 1.
HTH

Haim Ashkenazi

--
You received this message because you are subscribed to the Google Groups "Riemann Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to riemann-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Haim

Abhishek Gupta

unread,
Dec 5, 2014, 2:31:47 AM12/5/14
to rieman...@googlegroups.com
Hi,

Thanks for the quick reply. So i am looking for a window kind of feature like in 5 sec of period 3 times metric goes more than 100ms then send notification. I see your code has rate does it mean that it checks any three times metric goes more than some limit? I also wanted to include a 5 sec of period here. So for example in last 5 sec, 3 or more time metric goes higher than 100ms then send notification. When it comes down to say < 100ms in 5 sec period 1 time then reset it.

Please let me know, how do i achieve it?

Haim Ashkenazi

unread,
Dec 5, 2014, 3:32:59 AM12/5/14
to rieman...@googlegroups.com
Hi Abhishek,

The reset issue is a bit tricky. if you want it to reset if every time it gets metric smaller then 100ms then it's probably more complicated. fixed-time-window is confusing because it only triggers when an event occurs in the next time window. You can achieve this probably by using moving-time-window but it could use a lot of memory if you have lots of events. Here's an example I'm using to reset the count whenever an event with good state arrives. However, this only searches for 3 consecutive "bad" events, regardless of time:

(where (service #"postgres.*")
  (tagged "connection_healthcheck"
    (by :host
      (moving-event-window 3
        (smap (fn [events]
                (let [errors? (every? #(= (:state %) "critical") events)]
                  (merge (first events)
                    {:state (if errors? "critical" "ok")})))
          (changed-state
            do-something...))))))


HTH

Haim Ashkenazi

Abhishek Gupta

unread,
Dec 5, 2014, 12:44:39 PM12/5/14
to rieman...@googlegroups.com
Hi Haim,

You are really a great help here. Yes we are going to have lots of events. I went through rate documentation but i am not well versed with clounjor. So please help me understanding with your pervious suggestion...

(where (> metric SOMELIMIT)
  (with :metric 1
    (rate 3
      (where (> metric 1)
        prn))))


Rate Documentation: Take the sum of every event's metric over interval seconds and divide by the
interval size. Emits one event every interval seconds. Starts as soon as an
event is received, stops when the most recent event expires. Uses the most
recently received event with a metric as a template. Event ttls decrease
constantly if no new events arrive.

So for example my metric limit is 100 and if we get three events within 3 sec interval of metrics 101,101,101 so this code should fire and would return sum of metric/interval as (101+101+101)/3, So it would emit a final event with metric of 101 and nothing in that event. So in your code when you write

 (with :metric 1
    (rate 3
      (where (> metric 1)

What does it mean :metric 1 and > metric 1? Also is there any way i can get data of those three (or no of event) events that caused this? I am storing some attributes there.

Please advise.

Haim Ashkenazi

unread,
Dec 5, 2014, 4:12:27 PM12/5/14
to rieman...@googlegroups.com
On Fri, Dec 5, 2014 at 7:44 PM, Abhishek Gupta <mail2ab...@gmail.com> wrote:
Hi Haim,

You are really a great help here. Yes we are going to have lots of events. I went through rate documentation but i am not well versed with clounjor. So please help me understanding with your pervious suggestion...

(where (> metric SOMELIMIT)
  (with :metric 1
    (rate 3
      (where (> metric 1)
        prn))))


Rate Documentation: Take the sum of every event's metric over interval seconds and divide by the
interval size. Emits one event every interval seconds. Starts as soon as an
event is received, stops when the most recent event expires. Uses the most
recently received event with a metric as a template. Event ttls decrease
constantly if no new events arrive.

So for example my metric limit is 100 and if we get three events within 3 sec interval of metrics 101,101,101 so this code should fire and would return sum of metric/interval as (101+101+101)/3, So it would emit a final event with metric of 101 and nothing in that event. So in your code when you write

 (with :metric 1
    (rate 3
      (where (> metric 1)

What does it mean :metric 1 and > metric 1? Also is there any way i can get data of those three (or no of event) events that caused this? I am storing some attributes there.
Sorry, it is a little confusing :)

The first (with :metric 1 ...) changes all the passing events to have a metric of 1 before it reaches rate. Then rate sums up the 3 event's metric (1 + 1 + 1) and divides by the seconds (3 in this example) which results in 3/3 which is 1. If there were 5 events then the metric would be 5/3 and if there was only one event it would be 1/3. Hope it clears the problem.

Regards



--
Haim

Abhishek Gupta

unread,
Dec 6, 2014, 1:18:18 PM12/6/14
to rieman...@googlegroups.com
Hi Haim,

Thanks, it works fine. Is there any way we can find those n events attributes? Also, in production i want to do HA/Clusters for Riemann servers? I tried reading on riemann.io but didn't understand how does it support HA/Clusters? Can you please help me?

Haim Ashkenazi

unread,
Dec 6, 2014, 1:41:09 PM12/6/14
to rieman...@googlegroups.com
Hi Abhishek,

Regarding attributes, I graph all the events so if I get an alert I just go to the graph and see the real values. You can also copy the attribute to a different key before processing the event. Regarding cluster, sorry, I'm a newbie myself and I have no experience with it.

Abhishek Gupta

unread,
Dec 8, 2014, 9:02:52 PM12/8/14
to rieman...@googlegroups.com
Hi Haim,


Sorry for disturbing one more time, Here you said

The first (with :metric 1 ...) changes all the passing events to have a metric of 1 before it reaches rate. Then rate sums up the 3 event's metric (1 + 1 + 1) and divides by the seconds (3 in this example) which results in 3/3 which is 1. If there were 5 events then the metric would be 5/3 and if there was only one event it would be 1/3. Hope it clears the problem.

The rate sums up 3 events because we said rate 3?

If yes then how does it divide by 3 sec here? If i change my requirement of 3 events in 5 sec then how does it work? Please explain.


On Friday, December 5, 2014 1:12:27 PM UTC-8, Haim Ashkenazi wrote:

Haim Ashkenazi

unread,
Dec 9, 2014, 1:16:39 AM12/9/14
to rieman...@googlegroups.com
Hi Abhishek,

Let me try and change the numbers and maybe it would be clearer. Let's say we want to send email every time we get more then 4 events with metric higher then 100 within 10 seconds. We would do something like this:

(where (> metric 100)
  (with :metric 1
    (rate 10
      (where (> metric 4/10)
       (email...))))

HTH

Abhishek Gupta

unread,
Dec 9, 2014, 3:41:08 AM12/9/14
to rieman...@googlegroups.com
Thanks, This makes more sense.

Abhishek Gupta

unread,
Dec 10, 2014, 7:05:33 PM12/10/14
to rieman...@googlegroups.com
Hi Haim,

Sorry to bother you again. But i learned that rate holds action until 10 seconds expires. So in your code

(where (> metric 100)
  (with :metric 1
    (rate 10
      (where (> metric 4/10)
       (email...))))

it will send email only after 10 sec. Basically we don;t want this. We want in any given sec window if more than n count events happen for specified threshold then send email immediately.

Please advise.

Haim Ashkenazi

unread,
Dec 11, 2014, 12:26:00 AM12/11/14
to rieman...@googlegroups.com
Hi Abhishek,

Not sure how to achieve that, I guess you can use moving event window and calculate the time difference, maybe someone else has an idea. 

Kyle Kingsbury

unread,
Dec 11, 2014, 11:23:56 AM12/11/14
to rieman...@googlegroups.com

Yeah, I don't think either rate nor moving-time-window is gonna solve your problem here. We probably need a new realtime windowing stream. You can back off to defining one using part-time-simple I think.

Abhishek Gupta

unread,
Dec 12, 2014, 11:07:59 PM12/12/14
to rieman...@googlegroups.com
Hi Kyle.

Quick question:
- If i want to send notification on more than 3 occurrences in one hour interval window 
- If i want to send notification on 3 metrics bigger than 100ms in 1 day interval window 

I already tried rate but it doesn't work as expected. We want kind of moving window. But you said it takes lot of memory then there is no solution? Please advise?

Aphyr

unread,
Dec 15, 2014, 1:02:56 PM12/15/14
to rieman...@googlegroups.com
On 12/12/2014 08:07 PM, Abhishek Gupta wrote:
> Hi Kyle.
>
> Quick question:
> - If i want to send notification on more than 3 occurrences in one hour interval
> window
> - If i want to send notification on 3 metrics bigger than 100ms in 1 day
> interval window
>
> I already tried rate but it doesn't work as expected. We want kind of moving
> window. But you said it takes lot of memory then there is no solution? Please
> advise?

Sorry, I don't have time to write this for you, but you can build arbitrary
windowing fns with part-time-simple:

https://github.com/aphyr/riemann/blob/master/src/riemann/streams.clj#L544

And for example windowing streams built with part-time-simple, see

https://github.com/aphyr/riemann/blob/master/src/riemann/streams.clj#L1035-L1115

--Kyle

Abhishek Gupta

unread,
Dec 15, 2014, 8:10:37 PM12/15/14
to rieman...@googlegroups.com
Thanks Kyle. I am very new to Clojour, I would appreciate if you can find some time and give some tips. No hurry by any means. Please help me once you get time with some example.

Aphyr

unread,
Dec 15, 2014, 8:14:10 PM12/15/14
to rieman...@googlegroups.com
On 12/15/2014 05:10 PM, Abhishek Gupta wrote:
> Thanks Kyle. I am very new to Clojour, I would appreciate if you can find some
> time and give some tips. No hurry by any means. Please help me once you get time
> with some example.

You may be waiting indefinitely. Not that I don't care, it's just that writing
one-offs for folks is pretty low on my queue, and I think you're gonna need more
handholding than I can reasonably provide.

Best I can give you at this point is

https://aphyr.com/tags/Clojure-from-the-ground-up

Sorry mate!

--Kyle
Reply all
Reply to author
Forward
0 new messages