We have successfully integrated the tool in our environment as the
primary collection tool. We are now looking into collecting
additional metrics but have run into the first hurdle. We would
appreciate some help with Add-ON STATs, has anyone added them in
version 8.3-1?
I am trying to collect disk usage statistics via the command "df -k".
On the client node, the script has been added to the /etc/STATsrv/bin
directory (just a simple while true, do /usr/sbin/df -k, sleep $1,
done). the /etc/STATsrv/access file has been modified to have the
command added. The script runs fine, with the expected output.
On the server, the Add-On STAT has been added successfully. The
addition of the new STAT had no errors and the STAT is listed on the
"Currently Installed Add-On STAT(s)" as OK. There is no error
reported anywhere.
I am not seeing the STAT name in the Analyze screen or in the
Bookmarks list. How can I get a bookmark for the new STAT? I have
followed all the steps in the User Guide to add the new STAT but still
do not see a way to see the new data. Have I missed something?
Thanks in advance!
Also, I monitor many systems with dim_STAT, and have been collecting
filesystem stats. In order to reduce the footprint of the collector,
I wrote a program to produce and format the df output. This method
keeps the client from creating context switches and processes every
collection cycle because it's only a single executable that's
continually running.
I can provide the source (Solaris) and my Add-On stat description file
if requested.
HTH,
Alan
Thanks very much for the quick response. I have followed your advice:
1. Checked access file entry, nothing seems amiss. Here are the
entries:
# Added command for disk usage (df -k)
command DiskUsage /etc/STATsrv/bin/DiskUsage.sh
2. Checked "Start New Collect(s)" for that node, the STAT DiskUsage
is not showing up there.
3. Rechecked the command, it ran as expected.
For what it's worth, here is the saved description of the add-on stat
(I am still trying to figure out the digits):
#
=======================================================================
# DiskUsage: dim_STAT New STAT Description
#
=======================================================================
DiskUsage
6
1
DiskUsage Statistic(s)
DiskUsage %i
Filesystem
#
=======================================================================
# Column: filesystem (filesystem)
#
=======================================================================
filesystem
8
1
filesystem
df -k output Filesystem
0
#
=======================================================================
# Column: kbytes (kbytes)
#
=======================================================================
kbytes
2
2
kbytes
df -k output kbytes
0
#
=======================================================================
# Column: used (used)
#
=======================================================================
used
2
3
used
df -k output used
0
#
=======================================================================
# Column: avail (avail)
#
=======================================================================
avail
2
4
avail
df -k output avail
0
#
=======================================================================
# Column: capacity (capacity)
#
=======================================================================
capacity
2
5
capacity
df -k output capacity
0
#
=======================================================================
# Column: mountpoint (mountpoint)
#
=======================================================================
mountpoint
2
6
mountpoint
df -k output Mount on
0
Thank you for the offer of your df program. I would very much like to
have it, always interested in a better way of getting things done.
Regards,
Tom
Another silly thing, but you did restart the client STATsrv, right? I
think it picks up the access file when it starts up.
One more thing to try is using netcat (nc) to query the client
manually:
$ printf DiskUsage | nc -w 10 <hostname> 5000
If your add-on doesn't show up, then the problem is client-side, not
server-side.
In any case, here's the sources I referred to:
----------------
dfinfo.c
----------------
/*
** Custom df output for dim_STAT
**
** Alan Impink
** 31-Mar-2008
*/
#include <stdio.h>
#include <sys/types.h>
#include <sys/mntent.h>
#include <sys/mnttab.h>
#include <sys/statvfs.h>
#include <sys/stat.h>
#include <signal.h>
void sig_handler(int sig)
/*
** Handle caught signals
*/
{
fprintf(stderr, "Exiting on Signal %d\n", sig);
exit(sig);
}
#define blk2kb(x, b) \
((b) < (fsblkcnt64_t)1024 ? \
(x) / ((fsblkcnt64_t)1024 / (b)) : (x) * ((b) / (fsblkcnt64_t)1024))
void usage(char *prog)
{
fprintf(stderr, "Usage: %s Timeout\n", prog);
fprintf(stderr, "\n");
fprintf(stderr, " where Timeout is number of seconds to sleep
\n");
exit(1);
}
int main(int argc, char *argv[])
{
int sleeptime;
FILE *mntp;
struct mnttab mnt;
struct stat device_stat, mount_stat;
struct statvfs fs;
unsigned long frag_size, blk_size;
unsigned long blk_total, blk_used, blk_avail, blk_free,
blk_reserved, blk_realtotal;
unsigned long kb_total, kb_used, kb_avail;
unsigned long in_total, in_used, in_free;
double kb_pct, in_pct;
signal(SIGINT, sig_handler);
signal(SIGTERM, sig_handler);
signal(SIGPIPE, sig_handler);
signal(SIGCHLD, sig_handler);
if (argc != 2) usage(argv[0]);
sleeptime = atol(argv[1]);
if (sleeptime <= 0) usage(argv[0]);
while (1) {
if ((mntp = fopen(MNTTAB, "r")) == 0) {
perror(MNTTAB);
continue;
}
while (getmntent(mntp, &mnt) == 0) {
if (strcmp(mnt.mnt_fstype, MNTTYPE_UFS) != 0) {
continue;
}
if (stat(mnt.mnt_mountp, &mount_stat) != 0)
continue;
if (stat(mnt.mnt_special, &device_stat) != 0)
continue;
if (device_stat.st_rdev == mount_stat.st_dev) {
if (statvfs(mnt.mnt_mountp, &fs) < 0) {
continue;
}
frag_size = fs.f_frsize;
blk_size = fs.f_bsize;
blk_total = fs.f_blocks;
blk_used = fs.f_blocks - fs.f_bfree;
blk_free = fs.f_bfree;
blk_avail = fs.f_bavail;
blk_reserved = fs.f_bfree - fs.f_bavail;
blk_realtotal = blk_total - blk_reserved;
kb_total = blk2kb(blk_total, frag_size);
kb_used = blk2kb(blk_used, frag_size);
kb_avail = blk2kb(blk_avail, frag_size);
kb_pct = (blk_realtotal == 0 ? 0 : (double) blk_used /
(double) blk_realtotal * 100.0);
in_total = fs.f_files;
in_free = fs.f_ffree;
in_used = fs.f_files - fs.f_ffree;
in_pct = (in_total == 0 ? 0 : (double) in_used /
(double) in_total * 100.0);
printf("%s %ld %ld %ld %.0f %s %ld %ld %.0f\n",
mnt.mnt_special, kb_total, kb_used, kb_avail, kb_pct, mnt.mnt_mountp,
in_used, in_free, in_pct);
}
}
fclose(mntp);
printf("\n");
sleep(sleeptime);
}
}
------------------
dfinfo.desc
------------------
#
=======================================================================
# dfINFO: dim_STAT New STAT Description
#
=======================================================================
dfINFO
8
1
dfINFO Statistic(s)
dfINFO %i
#
=======================================================================
# Column: filesystem (filesystem)
#
=======================================================================
filesystem
24
6
filesystem
filesystem
0
#
=======================================================================
# Column: kbytes (kbytes)
#
=======================================================================
kbytes
1
2
kbytes
kbytes
0
#
=======================================================================
# Column: kbytesused (kbytesused)
#
=======================================================================
kbytesused
1
3
kbytesused
kbytes used
0
#
=======================================================================
# Column: kbytesavail (kbytesavail)
#
=======================================================================
kbytesavail
1
4
kbytesavail
kbytes avail
0
#
=======================================================================
# Column: pctused (pctused)
#
=======================================================================
pctused
1
5
pctused
% Used
0
#
=======================================================================
# Column: inodeused (inodeused)
#
=======================================================================
inodeused
1
7
inodeused
iNodes Used
0
#
=======================================================================
# Column: inodefree (inodefree)
#
=======================================================================
inodefree
1
8
inodefree
iNodes Free
0
#
=======================================================================
# Column: pctinodeused (pctinodeused)
#
=======================================================================
pctinodeused
2
9
pctinodeused
% iNode Used
0
--Alan
Thank you for the source code. I will have to look into it and see
about implementing it. Regarding netcat, I will have to search for an
x86 version for Solaris 10 since that is what our test node is
running. I still don't know where the problem lies. The access log
on the client side in /etc/STATsrv/log contains entries for
"DiskUsage" as well as "DiskUse" after I shortened the STAT name. I
also checked the start script on the server side at /apps/client and
the entries DiskUsage (and DiskUse) are in the script as well. So for
the time being I am using a shell script to collect what we need.
Thank you for your help. I hope to post findings for a solution soon.
Regards,
Tom
P.S. The service was cycled after each change although the User Guide
states that the service checks the access file "all the time" so there
is no need to restart it.
it was a big pleasure to follow your discussion and I may only invite
everybody to bring more and more such kind of exchange and share your
tips and experience!! :-))
Few comments to Alan first:
- one of the advantages of the google groups is that you're able to
add files and create your own pages here :-) - then it'll be much
more easier for others to download such files directly rather to
copy&paste from emails :-) - your solution for "df -k" monitoring is
great!! - and I'm pretty sure will interest other users too! (as well,
if it's ok for you - we may add it to the Solaris STAT-service
"officially" :-))
- a small remark for the source code: you cannot guarantee a truly
regular output with a fixed sleep().. I mean, between sleep() you
still have other instructions to execute and they will take some time
(small, but not zero :-)) - so after several iterations you'll be
probably 1 sec late.. Then more and more shift between a printed
number of measurements and elapsed time..
What I'm doing usually to avoid it I'm replacing:
...
while(1)
{
...
sleep( timeout );
}
...
by:
time_t timer;
...
time( &timer );
timer += timeout;
...
while( 1 )
{
...
sleep( timer - time() );
timer += timeout;
}
...
in this case your sleep() will be auto-adapted to the time spent
within your stats instructions :-)
NOTE: there is one negative thing - if somebody will start to play to
change the system time on your server the output regularity will be
broken for sure (and many other system or application programs may get
things wrong as well).. - So the main rule: if you have to change the
system time on your server - stop your STAT-service first! :-))
Now, before move to Tom's problem, few general things:
- it's true the name of Add-Ons was limited to 8 characters in old
versions, but at least since v.8.3 it's 16 characters now :-)) (as
well you may always check it via MySQL by "desc dim_anySTAT;" table)
- the "access" file in STAT-service is verified for changes on every
iteration, so no need to restart it if you modify entries in the
"access" file - it's true
- then few undocument tips (otherwise what is the advantage of the
user group over documentation? :-))
The communication between STAT-service and dim_STAT is going via
"STATcmd" command. For debug purposes it's shipped with every STAT-
service package (and you may find it under /etc/STATsrv/bin/), as well
it's installed on your dim_STAT server under WebX home/bin (usually /
opt/WebX/bin)
The usage of this command is very simple:
# /etc/STATsrv/bin/STATcmd
STATcmd v.2.4 (dim)
Usage: STATcmd [-k] [-t numb] [-b nsec] -h host [-p port] -c command
-b nsec -- break connection after nsec-seconds of inactivity
-t numb -- try to connect only numb times (default: 0 (unlimited))
-p port -- STAT-service port number (default: 5000)
-k -- keep alive
And if you're on Solaris and your STAT-service is up and running,
by typing:
# /etc/STATsrv/bin/STATcmd -h localhost -c pipo
you should get error message about BAD COMMAND "pipo" :-))
then:
# /etc/STATsrv/bin/STATcmd -h localhost -c vmstat
it'll print first lines of "vmstat" and stop (same when you start
"vmstat" without arguments)
then:
# /etc/STATsrv/bin/STATcmd -h localhost -c "vmstat 5"
you'll get a regular "vmstat" output every 5 sec, and you may stop it
by Ctrl+C
And there is a special "reserved" command: STAT_LIST - which prints
the list of all available stat commands published by the STAT-service
on your machine..
For ex. on my Linux laptop:
# /etc/STATsrv/bin/STATcmd -h localhost -c STAT_LIST
STAT *** OK CONNECTION 0 sec.
STAT *** LIST COMMAND (STAT_LIST)
STAT: Lvmstat
STAT: Lmpstat
STAT: tailX
STAT: LioSTAT
STAT: LpsSTAT
STAT: LPrcLOAD
STAT: LUsrLOAD
STAT: LnetLOAD
STAT: LcpuSTAT
STAT: sysinfo
STAT: SysINFO
STAT: IObench
STAT: dbSTRESS
STAT *** LIST END (STAT_LIST)
And now let's take Tom's problem as example for debugging.
Tom, let's check first:
1. # /etc/STATsrv/bin/STATcmd -h localhost -c STAT_LIST ==> shows
DiskUsage in the list
2. # /etc/STATsrv/bin/STATcmd -h localhost -c "DiskUsage 5" ==>
prints your output every 5 sec
3. you see same outputs when you execute "STATcmd" locally from your
dim_STAT server
(/opt/WebX/bin/STATcmd -h hostname -c ... )
and if all this stuff works correctly - let's check then why when
you're going to start a new collect the checkbox for "DiskUsage" is
not appearing...)
Rgds,
-Dimitri
Thank you for your advise. The additional tips are very helpful as
well as the troubleshooting steps. I have uploaded the outputs of the
STATcmd in a zip file. I have also included screen shots of the New
Collect page as well as the Analyzer page.
One clarification: The DiskUse stat checkbox is available in the New
Collect page (and it is checked for the new collect). However, this
DiskUse stat is not available on the Analyzer page (it is not present
at the bottom of the page, and it is not in the STAT Status column).
Also, you may notice that the output of df -k is not the default
output. I shifted the output columns to list the mount point first
(by using /usr/sbin/df -k |awk '{ print $6," ",$2," ",$3," ",$4," ",
$5 }').
I look forward to your input and I also hope to see Alan's "df -k"
solution in the next release of dim_STAT!
Thanks again,
Tom
> One clarification: The DiskUse stat checkbox is available in the New
> Collect page (and it is checked for the new collect).
that's much better so far! :-))
> However, this
> DiskUse stat is not available on the Analyzer page (it is not present
> at the bottom of the page, and it is not in the STAT Status column).
it's another story now :-)
>
> Also, you may notice that the output of df -k is not the default
> output. I shifted the output columns to list the mount point first
> (by using /usr/sbin/df -k |awk '{ print $6," ",$2," ",$3," ",$4," ",
> $5 }').
you don't really need it because when you describe your Add-On you may
always change the order of columns :-) - you may say that the first
value will be "MountPoint" at it's on the 6th position (column) on the
output line :-)
But well, looking on the output examples of your command I think what
you've missed is the *separator* between your outputs! :-) The
easier way to fix it - simply add a print of an empty output
line at the end of "df -k" (echo "") - it'll match the default setting
and I think you'll see your data collected :-)
Then be sure you've filled the pattern for lines to ignore (ex:
*Filesystem* ) - otherwise you'll get an error on every header
line :-)
>
> I look forward to your input and I also hope to see Alan's "df -k"
> solution in the next release of dim_STAT!
I'll need an agreement from Alan first, and then the source code
modified including Alan's name/copyright and GNU license headers, so
it's up to Alan :-)
Rgds,
-Dimitri
All I can say is that I am thrilled - everything is working perfectly!
There's two important things that I learned with this case:
1. A new STAT will not be available on the Analyzer page until there
is data. (Thanks again to Alan Impink.)
2. There has to be a separator between each data interval. (That was
a good catch from my screen shot!)
Now I can go look at my newly collected data.
Thanks so much for all your help!
Tom
> All I can say is that I am thrilled - everything is working perfectly!
>
great! :-)
> There's two important things that I learned with this case:
>
> 1. A new STAT will not be available on the Analyzer page until there
> is data. (Thanks again to Alan Impink.)
BTW, why do you want to see the button for new STAT if there is no
data in your collects?? :-) for ex. if you don't collect IOSTAT -
you'll don's see it either :-) - it's normal :-) if you don't have
any data - you don't have any data :-)
and another thing - the first measurement is always skipped! (as in
many system stats usually it contains some global avg numbers, which
are sometimes may be wrong)
> 2. There has to be a separator between each data interval. (That was
> a good catch from my screen shot!)
the default separator is the new line (empty line), however you may
use a pattern for separators - for ex. in your case it was possible to
create a pattern on the headers line and it will be also ok - so it's
up to everyone which solution to choose :-)) but that's why there is
the mail-list and the user group to leave nothing unclear :-))
Rgds,
-Dimitri
>
> Now I can go look at my newly collected data.
>
> Thanks so much for all your help!
> Tom
>
> --
> You received this message because you are subscribed to the Google Groups
> "dim_STAT" group.
> To post to this group, send email to dim...@googlegroups.com.
> To unsubscribe from this group, send email to
> dimstat+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/dimstat?hl=en.
>
>
I expected to see the button for the new STAT because the new STAT was
created successfully on the server AND I was expecting to see data. I
did not realize that the data had not been processed due to the
missing separator. Now that I know about STATcmd, it will be used to
verify collections for each new STAT.
dim_STAT is more useful than I thought and it has become a very
important tool for me. Keep up the great work!
Thank you,
Tom