Blank graphs in collector server after more than 64 entries

97 views
Skip to first unread message

Feue...@gmx.de

unread,
Nov 7, 2012, 11:26:00 AM11/7/12
to javam...@googlegroups.com
Hello evernat,

its me and my weird issues again.
We notified an issue from V1.4.1-SNAPSHOT. After an amount of time, our graphs are blanked out. Today I have done some testing with different properties. I found out, that our primary collector server is monitoring 96 entities. Another collector server which ran 29 entities is fine with the same version.
I've done some cross testing. 

63 and 64 entities are monitored fine with JBoss 7 on Windows Machine
65 entities have blanked out the graph after 2 hours with JBoss 7 on WIndows
29 entities are fine the whole time on version V1.4.1-Snapshot with JBoss 4.2 on RHEL
96 entities have blanked out after 2 weeks on version V1.4.1-Snapshot on Jboss 4.2 on RHEL

On restart from big collector server (96) every entity was polled only once, then was blank.

Thank you for your support.

Best regards
eugen

Vernat Emeric

unread,
Nov 7, 2012, 6:32:35 PM11/7/12
to javam...@googlegroups.com
Hi,

First, thanks for the feeback. It is good to know that javamelody may
scale and is currently used with dozens of monitored "entities"
(applications).


Then, can you explain "every entity was polled only once, then was
blank" for the big collector server. How do you know that every "entity"
was polled only once? with logs?
And what was blank then? the logs or the graphs? (I understood that
blank graphs means some data for some time, then NaN values and just
graph axes)


The current state of the big collector server is interesting, because it
seems that the issue can be reproduced.
I think that the collector server may have problems to reach some of the
monitored entities and may be perhaps waiting a lot. So I suggest to
take a thread dump of this collector server and send it here to see what
it is doing. For example, you may use "jstack <pid>" from the jdk, where
"<pid>" is the OS pid of the collector server.

Of course, you may look at the logs of the collector server. But if I
remember well, you don't have much in the logs.

Otherwise, you could start the collector server in debug mode, and
connect in debug with some IDE (Eclipse for example) and with the
javamelody sources (V1.4.1-SNAPSHOT ??).
And so you would probably see what and where is the problem, by runinng
step by step in the IDE. But, let's see the thread dump of the server first.

bye,
Emeric

Feue...@gmx.de

unread,
Nov 8, 2012, 6:42:56 AM11/8/12
to javam...@googlegroups.com

Hi,


After restart of the JBoss instance the melody shows in the logs, the typical: request from application in ms and which amount of data has been transferred in KB. In the graphs there is a small green line on some graphs. There not even on all of them. Since the update of the list which are sorted by name. The first named instance are in best situation. There are more data than the last once.

Yes sometimes in the logs there are peaks about "collecting in 60000ms with 0 KB". But its different. Sometimes he waits a lot and sometimes he gets the info from a down instance really fast.

There are shown to main failure messages:


2012-11-08 00:00:45,429 INFO  [STDOUT] 00:00:45,429 INFO  [javamelody] http call done in 60060 ms with 0 KB read for http://.../minimon/monitoring?collector=stop&format=serialized

2012-11-08 00:00:45,430 INFO  [STDOUT] 00:00:45,429 WARN  [javamelody] Read timed out

java.net.SocketTimeoutException: Read timed out


and


2012-11-08 00:00:46,644 INFO  [STDOUT] 00:00:46,644 INFO  [javamelody] http call done in 1 ms with 0 KB read for http://.../minimon/monitoring?collector=stop&format=serialized

2012-11-08 00:00:46,644 INFO  [STDOUT] 00:00:46,644 WARN  [javamelody] Connection refused

java.net.ConnectException: Connection refused


The logs are fine. The output ist just the same as the small collector server. An online instance is beeing polled and there is transferred data in the  logs.


Greetings,

eugen

Feue...@gmx.de

unread,
Nov 8, 2012, 6:44:51 AM11/8/12
to javam...@googlegroups.com
Hi,
the missing file: jstack from big collector server.

Regards,
Eugen
jstack-mon.zip

Vernat Emeric

unread,
Nov 9, 2012, 3:00:20 PM11/9/12
to javam...@googlegroups.com

Hi,

First, your thread dump shows that the "collector" thread of the server was writing files on disk. So, this does not prove that the server is waiting a lot for some slow monitored entities (applications).

But given your logs below with some long http calls, I continue to think that the server is often waiting for a response from some of the applications.


I have made a change in order to use parallel threads for collecting data. So slow applications will not block the collect of the others applications.
(I have used a pool of 10 threads and not more, because I think that a high number of threads may cause high spikes of memory, cpu, disk or network usage.)

This is in trunk (http://code.google.com/p/javamelody/source/detail?r=3130) and will be in the next release (1.42).
I have made a new build of the collect server from the trunk and it is available at:
http://javamelody.googlecode.com/files/javamelody-20121109.war

I suggest that you try this version by simply replacing your war of the collect server by this new one. (You can make a backup of your war file first.)

bye,
Emeric
Reply all
Reply to author
Forward
0 new messages