Broken Pipe in Solaris 11.2

62 views
Skip to first unread message

tom

unread,
Mar 2, 2015, 2:41:37 AM3/2/15
to dim...@googlegroups.com
Hi,

I am seeing this issue on the dimstat agent when It tries to connect to the dimstat server
STATsrv-4135:4999> Connect -> [dimstat_Client_hostname_IPAddress] CMD= "STAT_LIST" TIME: Sun Mar  1 23:27:16 2015
STATsrv-4135:4999> Exit -> [dimstat_Client_hostname_IPAddress] (broken pipe) CMD= "STAT_LIST" TIME: Sun Mar  1 23:27:16 2015


On the dim stat server its not able to connect to agent but I can ping it, The agent status shows green in the webpage, When I start the collect above error comes

Also tried this command on the server
root@dimstatServerName:/opt/WebX/bin# ./STATcmd -h dimstat_Client_hostname -c  STAT_LIST
STAT *** NO CONNECTION
STAT *** NO CONNECTION

I also upgraded the core to  dim_STAT CoreUpdate-14 

I am currently on Solaris 11.2

 cat /etc/release 
                            Oracle Solaris 11.2 SPARC
  Copyright (c) 1983, 2014, Oracle and/or its affiliates.  All rights reserved.
                            Assembled 22 October 2014


Please help, This has never caused a issue on earlier release of solaris. 

Thanks
Tom

Dimitri

unread,
Mar 2, 2015, 2:53:29 AM3/2/15
to dim...@googlegroups.com
Hi Tom,

try to use a different IP port on the client (different from 5000
which is default) -- it's possible that some other OS services are
using it too. To change it on the client STAT-service just edit
/etc/STATsrv/STAT-service file and change 5000 port to 5001 for ex.
and restart STAT-service.. Then retry with ./STATcmd again and add
option "-p" with the new port number. Also try to execute STATcmd
locally on the machine first (from /etc/STATsrv/bin) and then remotely
from dim_STAT server..

indeed, there is something going odd..

Rgds,
-Dimitri
> --
> You received this message because you are subscribed to the Google Groups
> "dim_STAT" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dimstat+u...@googlegroups.com.
> To post to this group, send email to dim...@googlegroups.com.
> Visit this group at http://groups.google.com/group/dimstat.
> For more options, visit https://groups.google.com/d/optout.
>

tom

unread,
Mar 2, 2015, 3:11:58 AM3/2/15
to dim...@googlegroups.com
Hi Dimitri,

I am running with 4999, I had hit 5000 earlier, I changed the port. I missed the -p option
Its now showing up the list :-)

On the agent side, The broken pipe message is still coming.

./STATcmd -h dimstat_Client_hostname -c  STAT_LIST -p 4999
STAT *** OK CONNECTION 0 sec.
STAT *** LIST COMMAND (STAT_LIST)
STAT: vmstat
STAT: mpstat
STAT: netstat
STAT: ForkExec
STAT: MEMSTAT
STAT: tailX
STAT: ioSTAT.sh
STAT: netLOAD.sh
STAT: netLOAD
STAT: psSTAT
STAT: UserLOAD
STAT: ProcLOAD
STAT: bsdlink
STAT: bsdlink.sh
STAT: sysinfo
STAT: SysINFO
STAT: Siostat
STAT: ProjLOAD
STAT: PoolLOAD
STAT: TaskLOAD
STAT: ZoneLOAD
STAT: IOpatt
STAT: CPUSet
STAT: UDPstat
STAT *** LIST END (STAT_LIST)


thanks
Tom

Ps I really love this program, Its been a  immense help in many projects.

tom

unread,
Mar 2, 2015, 6:24:50 AM3/2/15
to dim...@googlegroups.com
I have installed the dimserver on an another node and it works without any issues.

This seems to be an hardware issue particulary NIC, seeing lot NIC failures in the system messages.

Thanks for the prompt help.
Tom

Dimitri

unread,
Mar 2, 2015, 9:46:42 AM3/2/15
to dim...@googlegroups.com
Tom, everything is fine which is finishing fine ;-))

regarding "broken pipe" error messages : no need to worry too much
about them, in fact these messages were added explicitly to the
STAT-service code to help to understand for which reason the
collection of stats was stopped... And there are 2 options: the
command line output was closed unexpectedly (broken pipe) or a network
connection was closed between servers..

And both states may be normal and abnormal ;-))

When you're stopping stats collects via web interface -- all
corresponding network connections will be closed then, and
STAT-service will receive an error as "connection lost", and this will
be as expected and correct.. Now, if the client running STAT-service
will hang, or reset, or reboot -- the connection will be lost then on
the dim_STAT server side, and this will be seen as abnormal, so you'll
automatically get in your LOG messages the corresponding errors
pointing to the connection lost..

Same for "broken pipe" : generally each stat command is supposed to
loop forever, but it can be killed, or be buggy and crash (for ex.) --
in this case the STAT-service will recognize the situation and will
log a message about "broken pipe" from a corresponding stat command,
and then close the connection and allowing the dim_STAT server to
restart it in a normal way again. The only exception here is a
reserved "STAT_LIST" command -- this is a reserved order to show the
list of all available stat commands from a given STAT-service, and
this order is processed as any other stat command. And as STAT_LIST is
not supposed to loop forever, there is an "error" message about broken
pipe, once the STAT_LIST output is finished (just to say that all info
was sent to the network connection and pipe was closed)..

Well, hope it'll clarify little bit what is going inside ;-)
Reply all
Reply to author
Forward
0 new messages