How to determine cause of this exception?

93 views
Skip to first unread message

Pete GS

unread,
Jul 7, 2015, 4:31:47 PM7/7/15
to rieman...@googlegroups.com
I'm getting an exception and I'm trying to determine what's causing it but having some troubles with how to go about this.

Here's what I see in the Riemann log:

INFO [2015-07-08 06:21:39,872] defaultEventExecutorGroup-2-3 - riemann.config - #riemann.codec.Event{:host mulcher, :service total_messages_per_second, :state nil, :description 2015-07-07 20:21:01 W3SVC1 BNE001TP 203.147.172.225 GET /wc_test, :metric 1, :tags [CatchAll], :time 20:21:01, :ttl 60.0, :cs-method GET, :SourceModuleName IISlogIn, :_id c65bbdc1-24e5-11e5-8a79-005056a26fe8, :sc-win32-status 0, :date 2015-07-07, :c-ip 210.247.233.3, :facility IIS, :s-sitename W3SVC1, :gl2_source_input 546e896be4b09f86f5596d67, :cs-version HTTP/1.0, :s-computername BNE001TP, :source BNE001TP, :s-ip 203.147.172.225, :level 6, :time-taken 15, :SourceModuleType im_file, :gl2_source_node f69eb759-ab20-4616-b1e6-334ab8a4ee33, :EventReceivedTime 2015-07-08 06:21:34, :sc-status 200, :sc-bytes 613, :version 1.0, :timestamp 2015-07-07T10:21:01.000Z, :SourceName IIS, :sc-substatus 0, :full_message 2015-07-07 20:21:01 W3SVC1 BNE001TP 203.147.172.225 GET /wc_test/hello.asp - 80 - 210.247.233.3 HTTP/1.0 - - - - 200 0 0 613 34 15, :cs-uri-stem /wc_test/hello.asp, :cs-bytes 34, :message 2015-07-07 20:21:01 W3SVC1 BNE001TP 203.147.172.225 GET /wc_test, :s-port 80}
WARN [2015-07-08 06:21:39,873] defaultEventExecutorGroup-2-3 - riemann.streams - riemann.streams$rate$rate_SINGLEQUOTE___4322@35b961eb threw
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Number
        at clojure.lang.Numbers.add(Numbers.java:126)
        at riemann.streams$periodically_until_expired$stream__4021.invoke(streams.clj:495)
        at riemann.streams$rate$rate_SINGLEQUOTE___4322.invoke(streams.clj:847)
        at riemann.streams$with$stream__4879$fn__4906.invoke(streams.clj:1316)
        at riemann.streams$with$stream__4879.invoke(streams.clj:1316)
        at riemann.core$stream_BANG_$fn__5678.invoke(core.clj:19)
        at riemann.core$stream_BANG_.invoke(core.clj:18)
        at riemann.transport$handle.invoke(transport.clj:159)
        at riemann.transport.tcp$tcp_handler.invoke(tcp.clj:93)
        at riemann.transport.tcp$gen_tcp_handler$fn__5904.invoke(tcp.clj:65)
        at riemann.transport.tcp.proxy$io.netty.channel.ChannelInboundHandlerAdapter$ff19274a.channelRead(Unknown Source)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
        at io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
        at io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
        at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:36)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)

Here's the code:

; This one groups by host to simply graph messages per second for each host
(let [index (index)]
;  (streams
;    (by :host
;      (with {:service "host_messages_per_second" :metric 1}
;        (rate 1
;          ;#(info (:host %) (:service %) (:metric %))
;          ;graph
;          ))))
; This one simply graphs total messages per second
  (streams
    (with {:service "total_messages_per_second" :host "mulcher" :metric 1}
        #(info %)
        (rate 1
          #(info %)
          ;#(info (:host %) (:service %) (:metric %))
          graph
        )))
)

I've commented the first section out just to try to troubleshoot the exception.

I guess to start with I'm not sure why it's having troubles trying to use a string as a number when all I'm trying to do is get a rate.

Any pointers on where to start here? I tried adding (exception-stream #(info %)) just after the (rate 1 but it doesn't show me anything, although I'm probably using that incorrectly.

Cheers, Pete

Pete GS

unread,
Jul 8, 2015, 6:45:48 PM7/8/15
to rieman...@googlegroups.com
I've spent quite a number of hours trying to narrow down the cause of the exception and I"m a little closer but still essentially none the wiser.

I figured the exception related to the data I was sending Riemann from Graylog and this appears to be the case.

We use nxlog to send IIS, FTP, and various other logs on Windows servers to Graylog and if I ignore anything coming in via nxlog's "im_file" module then I get no exceptions.

What I've been trying unsuccessfully to do is to determine which field in which event is causing the exception. Just putting #(info %) before the rate statement doesn't enlighten me with this at all.

I see the exception contains the text "SINGLEQUOTE" which would lead me to believe there is a field containing data that has an unterminated quote but I can't see this is any of the events I've reviewed, so possibly that's a red herring.

Next step is doing some tests but in the meantime while I figure tests out, is there any way to get Riemann to just dump the event causing the exception to the log?

Cheers, Pete

Kyle Kingsbury

unread,
Jul 8, 2015, 6:51:17 PM7/8/15
to rieman...@googlegroups.com
> I see the exception contains the text "SINGLEQUOTE" which would lead me
> to believe there is a field containing data that has an unterminated
> quote but I can't see this is any of the events I've reviewed, so
> possibly that's a red herring.

riemann.streams$rate$rate_SINGLEQUOTE___4322@35b961eb is the munged name
of the function which threw: riemann.streams/rate has an inner function
called rate'--the ' is SINGLEQUOTE.

java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Number tells you you're trying to add a string to something.

at
riemann.streams$periodically_until_expired$stream__4021.invoke(streams.clj:495)

well, let's go look at streams.clj 495:

https://github.com/aphyr/riemann/blob/master/src/riemann/streams.clj#L495

Okay, so it's adding the :time and the :ttl of the event. One of those
was a string, not a number. Let's look up at the event:

:time 20:21:01

That doesn't look like a number to me. That's probably the culprit.

--Kyle

Pete GS

unread,
Jul 8, 2015, 7:00:48 PM7/8/15
to rieman...@googlegroups.com
Ahah! And there is another "ding" moment, thanks so much Kyle!

I did look at the source earlier but without your explanation of the first couple of steps it was all a blur to me, now I see what I was missing!

I'll add a :time (unix-time) and that should correct the issue.

Cheers again, Pete

--
You received this message because you are subscribed to a topic in the Google Groups "Riemann Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/riemann-users/cjhJEAtTGGU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to riemann-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pete GS

unread,
Jul 8, 2015, 7:32:32 PM7/8/15
to rieman...@googlegroups.com
And just for completeness, this now works like a charm. Rates are graphed in Graphite and no exceptions in riemann.log.

; This one groups by host to simply graph messages per second for each host
(let [index (index)]
  (streams
    (by :host
      (with {:service "host_messages_per_second" :metric 1 :time (unix-time)}
        (rate 1
          graph
        ))))
; This one simply graphs total messages per second
  (streams
    (with {:service "total_messages_per_second" :host "mulcher" :metric 1 :time (unix-time)}
      (rate 1
        graph
      )))
)

Cheers, Pete

Aphyr

unread,
Jul 8, 2015, 9:11:52 PM7/8/15
to rieman...@googlegroups.com
On 07/08/2015 04:32 PM, Pete GS wrote:
> And just for completeness, this now works like a charm. Rates are graphed in
> Graphite and no exceptions in riemann.log.

You may also want to dig into the software that's emitting these events and
figure out why it's generating malformed requests.

--Kyle

Pete GS

unread,
Jul 8, 2015, 11:17:43 PM7/8/15
to rieman...@googlegroups.com
Most definitely! I've narrowed it down to a subset of hosts sending data to Graylog and therefore onto Riemann.

I'll keep tracking that to its source and get it resolved once and for all.

In the meantime I have the data/graphing I need.

Cheers, Pete
Reply all
Reply to author
Forward
0 new messages