broken pipe

68 views
Skip to first unread message

Greg Lane

unread,
Jun 17, 2014, 3:02:44 PM6/17/14
to dim...@googlegroups.com
I have successfully installed dim_STAT and got the STAT service running on remote machine.  However I am having issues with collection stats and graphing.  Initially it was just graphing but now it is both.  I edited the access file on the remote server to only allow the dim_STAT server to run the scripts.  The collections show up but anytime I select Start New Collect the access log on the remote sever throws a:

STATsrv-5714:5000> Connect -> [10.252.3.10] CMD= "STAT_LIST" TIME: Tue Jun 17 18:55:21 2014
STATsrv-5714:5000> Exit -> [10.252.3.10] (broken pipe) CMD= "STAT_LIST" TIME: Tue Jun 17 18:55:21 2014

It also throws that warning when I select the host and hit continue.  When I then select a stat to collect is doesn't start. This is verified by going to Analyze selecting active only or Active Status.

Is show the remote hose as being connected as indicated by the green led status also.

Dimitri

unread,
Jun 18, 2014, 7:55:58 AM6/18/14
to dim...@googlegroups.com
Hi Greg,

unfortunately I cannot reproduce your issue...

few words about what is going on within dim_STAT internals :
- every time you're requesting a STAT-service status from any server
dim_STAT is sending the "STAT_LIST" request to this server..
- STAT_LIST request is just involving a scan for all available stats
on the machine and allowed for requestor's IP address..
- (same happens when dim_STAT is needing to show a LED corresponding
to the server state, etc.)
- what is not normal that you have a "broken pipe" message.. - while
there is still a possibility that on the initial v9 the code did not
read the STAT_LIST to the end, but did you upgrade your dim_STAT
instance to the latest Core Update?..
- and when you're limiting an access via IP address setting there
still should not be any issue for STAT_LIST as it'll just have an
empty list if IP is not matched, or a full list in other case..

but well, this is indeed strange...
so, let's debug it little bit..

on the dim_STAT machine:

# cd /opt/WebX/bin
# ./STATcmd -h your_remote_server_IP -c STAT_LIST

what do you see on the output?
what do you see in the /etx/STATsrv/log/access.log on the machine with
"your_remote_server_IP" ?..

Rgds,
-Dimitri
> --
> You received this message because you are subscribed to the Google Groups
> "dim_STAT" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dimstat+u...@googlegroups.com.
> To post to this group, send email to dim...@googlegroups.com.
> Visit this group at http://groups.google.com/group/dimstat.
> For more options, visit https://groups.google.com/d/optout.
>

Greg Lane

unread,
Jun 18, 2014, 8:24:53 AM6/18/14
to dim...@googlegroups.com
Here is the output:

[mysql@stat bin]$ sudo ./STATcmd -h IP_address -c STAT_LIST
STAT *** OK CONNECTION 0 sec.
STAT *** LIST COMMAND (STAT_LIST)
STAT: Lvmstat
STAT: Lmpstat
STAT: tailX
STAT: LioSTAT_v10
STAT: LpsSTAT
STAT: LPrcLOAD
STAT: LUsrLOAD
STAT: LnetLOAD
STAT: LcpuSTAT
STAT: sysinfo
STAT: SysINFO
STAT: IObench
STAT: dbSTRESS
STAT *** LIST END (STAT_LIST)

As you can see it is listing the correct list.  I also removed the access restriction to the IP in access log so that any server could run just to eliminate that as a problem.  Now that I think of it I believe once I applied that update is when I may have started having this issue. 

As far as the patch goes, I assume I updated it correctly.  I followed the updates procedure you provided with core update #12 while applying the #13 update.

Greg

Dimitri

unread,
Jun 18, 2014, 9:15:05 AM6/18/14
to dim...@googlegroups.com
Greg,

so the output is correct, but do you still seeing the "broken pipe"
message within access.log?..

and are you able to run stat collects?..

Rgds,
-Dimitri
>> > email to dimstat+u...@googlegroups.com <javascript:>.
>> > To post to this group, send email to dim...@googlegroups.com
>> <javascript:>.

Greg Lane

unread,
Jun 18, 2014, 9:24:10 AM6/18/14
to dim...@googlegroups.com
Yes the error is still being thrown in the access log.  I reapplied the update again and it still persist.  I try to run the stat collect by select LioSTAT and it lets me continue and says start collecting but when I check the
Analyze and select Active only single host nothing comes up in the next screen.

I want to add one bit of info also.  The directory's are owned by the user I am logged in under but everything is run using sudo on the dim_STAT server but on the remote server the directory is owned by root and run as sudo.  I don't think that is an issue but I just wanted to add that because I have tried it where everything was owned as root and it made no difference that I was able to see.

Greg

Alan Impink

unread,
Jun 18, 2014, 9:34:46 AM6/18/14
to dim...@googlegroups.com
Just throwing this out....   Maybe the socket is getting closed via timeout?   [very large (>2min) collection interval? unflushed output?]

Alan


From: Greg Lane <gregorye...@gmail.com>
To: dim...@googlegroups.com
Sent: Wednesday, June 18, 2014 9:24 AM
Subject: Re: [dim_STAT] broken pipe

> To post to this group, send email to dim...@googlegroups.com.
> Visit this group at http://groups.google.com/ group/dimstat.
> For more options, visit https://groups.google.com/d/ optout.
>

Greg Lane

unread,
Jun 18, 2014, 9:39:22 AM6/18/14
to dim...@googlegroups.com, aim...@yahoo.com
The error is being thrown before any stat collecting is started.  I shows the list and allows me to select a collector but nothing starts.

Greg

Greg Lane

unread,
Jun 18, 2014, 10:00:42 AM6/18/14
to dim...@googlegroups.com, aim...@yahoo.com
So when trying to start the STAT collection i selected the show debug and this is the output:

/apps/client/start_STAT_db00_ndbc_noaa_gov: line 35: warning: setlocale: LC_NUMERIC: cannot change locale (en): No such file or directory /apps/client/start_STAT_db00_ndbc_noaa_gov: line 36: warning: setlocale: LC_NUMERIC: cannot change locale (en): No such file or directory Usage: WebX script [-Var Value [-Var2 Value2 [..]]] ERROR: "

I'm thinking I may need to just do a fresh install and start over at this point.

Greg

Dimitri

unread,
Jun 18, 2014, 12:54:02 PM6/18/14
to dim...@googlegroups.com, aim...@yahoo.com
Ah, ok, that explains...

few points :

1.) to do a manual debugging of STAT-service you may just try to start
a stat command instead of STAT_LIST:

$ /opt/WebX/bin/STATcmd -h hostname -c "LioSTAT 10"

and see if the output is going correctly or something will break..
and if you still have a broken pipe error message in STAT-service

2.) then, to debug stat collections -- just instead of the "start"
the collection choose "create a script", and then run this script
manually (select only one stat to collect to reduce output messages)
-- then you'll see if collection is working or not (and why)..

NOTE: the LANG and LC_NUMERIC settings are in the script to force an
"expected" format from error messages and numerics..

Rgds,
-Dimitri


On 6/18/14, Greg Lane <gregorye...@gmail.com> wrote:
> So when trying to start the STAT collection i selected the show debug and
> this is the output:
>
> /apps/client/start_STAT_db00_ndbc_noaa_gov: line 35: warning: setlocale:
> LC_NUMERIC: cannot change locale (en): No such file or directory
> /apps/client/start_STAT_db00_ndbc_noaa_gov: line 36: warning: setlocale:
> LC_NUMERIC: cannot change locale (en): No such file or directory Usage:
> WebX script [-Var Value [-Var2 Value2 [..]]] ERROR: "
>
> I'm thinking I may need to just do a fresh install and start over at this
> point.
>
> Greg
>
>
> On Wednesday, June 18, 2014 8:39:22 AM UTC-5, Greg Lane wrote:
>>
>> The error is being thrown before any stat collecting is started. I shows
>>
>> the list and allows me to select a collector but nothing starts.
>>
>> Greg
>>
>> On Wednesday, June 18, 2014 8:34:46 AM UTC-5, Alan Impink wrote:
>>>
>>> Just throwing this out.... Maybe the socket is getting closed via
>>> timeout? [very large (>2min) collection interval? unflushed output?]
>>>
>>> Alan
>>>
>>> ------------------------------
>>> *From:* Greg Lane <gregorye...@gmail.com>
>>> *To:* dim...@googlegroups.com
>>> *Sent:* Wednesday, June 18, 2014 9:24 AM
>>> *Subject:* Re: [dim_STAT] broken pipe

Greg Lane

unread,
Jun 18, 2014, 2:51:52 PM6/18/14
to dim...@googlegroups.com, aim...@yahoo.com
Here is the output of running STATcmd as you suggested in #1.


STAT *** OK CONNECTION 0 sec.
STAT *** BAD COMMAND (no access: LioSTAT)

It also threw a broken pipe error on the remote server access log.

Greg

Greg Lane

unread,
Jun 18, 2014, 2:54:34 PM6/18/14
to dim...@googlegroups.com, aim...@yahoo.com
Running #2 manually gives the same error I provided earlier:

./start_STAT_db00_ndbc_noaa_gov: line 35: warning: setlocale: LC_NUMERIC: cannot change locale (en): No such file or directory
./start_STAT_db00_ndbc_noaa_gov: line 36: warning: setlocale: LC_NUMERIC: cannot change locale (en): No such file or directory


<!-- WebX v.8.5-1 (c) Dimitri KRAVTCHUK (dim) 1995-2012 -->


Usage:
        WebX script [-Var Value [-Var2 Value2 [..]]]

ERROR:  "<Timeout> is not filled!"

Greg

On Wednesday, June 18, 2014 11:54:02 AM UTC-5, (dim) wrote:

Dimitri

unread,
Jun 18, 2014, 3:18:22 PM6/18/14
to dim...@googlegroups.com
Greg,

may you send here your file "start_STAT_db00_ndbc_noaa_gov" ?..
indeed, this all is looking very strange..

and for the #1 -- how ever it's possible that on the STAT_LIST the
"LioSTAT" is printed, while when you're asking to send you the
"LioSTAT" output, you're getting no access message?...

may you also send the /etc/STATsrv/access file here?

and then also both outputs again:
$ /opt/WebX/bin/STATcmd -h hostname -c "STAT_LIST"
$ /opt/WebX/bin/STATcmd -h hostname -c "LioSTAT 10"

Greg Lane

unread,
Jun 18, 2014, 4:39:56 PM6/18/14
to dim...@googlegroups.com
You would like the script file from the /apps/client directory, the outputs of those two commands, and the access.log from the remote server correct?
The access log is rather small because I truncate the file each time a restart the STATsrv to see if anything new happens during that time.
Lastly would you like me to just email attachment or post text here?

I surely hope I'm not wasting your time by something I have done.  I have given you all the info from the install.  Do you think it might be better to just do a clean install check to see if it runs then apply the new patch?  I stumbled through the install a little initially trying to get my linux administrator to install the correct 32 bit libraries.

Greg

Dimitri

unread,
Jun 18, 2014, 5:15:08 PM6/18/14
to dim...@googlegroups.com
Hi Greg,

yes all the mentioned files and command outputs, while for "access"
file I'll need rather the /etc/STATsrv/access one (just to check why
you're getting the no access message)..

is it possible you have some strange right permissions on your files
on your linux box?..

Greg Lane

unread,
Jun 19, 2014, 7:49:37 AM6/19/14
to dim...@googlegroups.com
Here you go.  Talked to my linux admin and it doesn't seem that there are any unusual permissions that we can think of that might effect this.
One thing I do want to mention about the permissions though is that I currently have the dim_STAT install directories owned by the user I run it under, mysql.  I did at one point chown -R to root and when I ran the scripts to start as root it wouldn't start the mysql instance. 
DIM_STAT.ZIP

Dimitri

unread,
Jun 19, 2014, 8:26:01 AM6/19/14
to dim...@googlegroups.com
Hi Greg,

seems like there are several points...

1.) I was wrong when asked you to start "LioSTAT" via STATcmd as
currently it's banned in the access file, and the "LioSTAT_v10" is
proposed instead ;-)

so, the following should work for you:
$ /opt/WebX/bin/STATcmd -h hostname -c "LioSTAT_v10 5"

and I think you'll see the Linux iostat output every 5 sec ;-)

2.) I'm suspecting that you've deleted all existing stat collects in
your database right now.. - is it so?.. -- because just last week
Matthieu and Alain reported me the issue with the way how the auto-ID
are assigned for new stat collects (the max(ID)+1 currently is used
from a SELECT, and if there is no existing stats in the database
already, then the max value will be not assigned, and stats ID
either..) -- the issue is fixed within the next CoreUpdate, but not in
the code you have..

to check if you're hitting this issue:
- create a new database in dim_STAT (via Preferences)
- select this new database as your current database
- start a new collect from this new database..

Rgds,
-Dimitri

Greg Lane

unread,
Jun 19, 2014, 8:30:03 AM6/19/14
to dim...@googlegroups.com
I ran it with the LioSTAT_v10 5 and you  are correct is did start outputting.  Also yes I did delete the old stats after I had run them prior as well.  I will create the new database and check again.  This will hopefully solve it all.  Initially I wasn't getting graphs but then all this started to happen.  I will report back shortly.

Greg

Dimitri

unread,
Jun 19, 2014, 8:36:40 AM6/19/14
to dim...@googlegroups.com
If data are collected but graphs are not shown, check if ploticus is
not missing some its libs to run:
$ ldd /opt/WebX/bin/pl

Rgds,
-Dimitri


On 6/19/14, Greg Lane <gregorye...@gmail.com> wrote:

Greg Lane

unread,
Jun 19, 2014, 8:52:36 AM6/19/14
to dim...@googlegroups.com
I created the database and did show all tables but I'm getting errors say tables do not exist. lol it's like a snowball. 

Greg

Greg Lane

unread,
Jun 19, 2014, 10:27:59 AM6/19/14
to dim...@googlegroups.com
It's odd because I checked the information_schema and all the tables are there and I can run queries against some of the tables but not against others.  At first I thought maybe it was the engine type, some are MyISAM and some are Innodb, but that isn't the case.

Greg

Greg Lane

unread,
Jun 19, 2014, 12:36:15 PM6/19/14
to dim...@googlegroups.com
After some more frustration I finally just did a clean install and after cleaning up the cookies was able to get things up and running and currently graphing as of now.  I will keep in mind about the deleting of data in the future because that seems to have created some issues.  I will say however that the original issue of the broken pipe error is still being thrown in the access.log.

Greg

Dimitri

unread,
Jun 19, 2014, 1:16:43 PM6/19/14
to dim...@googlegroups.com
Hi Greg,

I think all these issues will be gone since CoreUpdate-14,
but still don't know why you have a broken pipe error..

the update tgz will be soon available on my site, and once
you've tested it I'll make an "official" announce ;-)

Greg Lane

unread,
Jun 19, 2014, 1:28:16 PM6/19/14
to dim...@googlegroups.com
Sounds good to me and thank you for all the help.  Can't wait to test it out when it is available.

Greg

Dimitri

unread,
Jun 20, 2014, 9:44:12 AM6/20/14
to dim...@googlegroups.com
Hi Greg,

please, try now the dim_STAT CoreUpdate-14 from:
- http://dimitrik.free.fr/Core_Updates/WebX_apps-v90-u14.tgz

you may deploy it just live on your dim_STAT server..

then, regarding the "broken pipe" error message mystery.. - in fact
this is a feature ;-))
on every EXIT for any STAT command there will be one of 2 messages in
the access.log:
- lost connection : means the command EXIT was due closed socket on
dim_STAT server side..
- broken pipe : means the command EXIT came from the STAT-service
side (a reading pipe from the command output reached the EOF)

these messages were added to avoid any doubt why the collection from a
STAT command was interrupted (ex. killed by sysadmin, other)..

indeed, we could manage STAT_LIST command little bit differently here
(as once it arrives to its end message we know it was not killed).. -
but once you know now why it's so - who cares? ;-)) (NOTE: the code
is open and you may change it as you like.. - while if I'll change it
by myself it'll be no more GPL and will automatically belong to Oracle
Corp. since then)..

Greg Lane

unread,
Jun 20, 2014, 12:15:28 PM6/20/14
to dim...@googlegroups.com
Hi Dimitri,

That clears up a lot then :).  Thanks for the update I will apply it on Monday when I am back in the office.

Thanks,
Greg

Greg Lane

unread,
Jun 23, 2014, 8:14:47 AM6/23/14
to dim...@googlegroups.com
Implemented the new update and started new collections and was able to delete all data and start new collections again with no issues as of yet.
Again thanks for all the help.  If I notice anything else I will surely visit back here.

Greg
>> >>>> >> >> >> >>> >&gt
...
Reply all
Reply to author
Forward
0 new messages