Hi
I decide to use LMT version 2.6.3 to monitor my Lustre cluster.
Depend on manual in http://code.google.com/p/lmt/, I setup server, agents and client.
The system runs cerebro-1.11 on a centos5.4.
I decide to show my problem by showing following commands’ results.
For server (which is run on a MGS/MDS machine):
MGS-MDS # cerebro-stat –l |grep lmt
lmt_mds
MGS-MDS # cerebro-stat –m cluster_nodes
This results are shown in client machine, too.
For OSS agent server (which has two OSTs):
OSS # cerebro-stat –l |grep lmt
lmt_oss
lmt_ost
OSS # cerebro-stat –m cluster_nodes
The results for OSS are shown when I set cerebro.conf as follow:
cerebrod_listen on
and, when I use “cerebrod_speak on” without “cererod_listen on”, upper commands result nothing.
I think there is a problem with network configuration between OSS and LMT server.
Both mysqld and cerebrod are running on all machines. Also, database shows all components, bus just show data for MDS_DATA info and there isn’t any data in OSS_DATA and OST_DATA (Other tables shows appropriate outputs)
Following is the output that is created after running a cron jub:
MGS-MDS # sh -x lmt_agg.cron
#####################
# OST - filesystem_lustre
#####################
Updating hourly ost agg table for filesystem_lustre
Updating OST_AGGREGATE_HOUR for filesystem_lustre...
Determining starting and ending points in various tables...
Final timestamp from OST_AGGREGATE_HOUR: 2006-01-01 00:00:00, id=0
Final timestamp from OST_DATA (raw): , id=
First timestamp from OST_DATA: No matching timestamp in raw data
Updating other ost agg tables for filesystem_lustre
Updating aggregate tables from OST_AGGREGATE_HOUR for filesystem_lustre...
First timestamp to use from hourly data: ()
Final timestamp to use from hourly data: ()
Updating OST_AGGREGATE_DAY
bad date '' at /usr/lib/perl5/vendor_perl/5.8.8/Date/Manip.pm line 4436.
Updating filesys-level ost tables for filesystem_lustre
Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_HOUR for filesystem_lustre...
Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_DAY for filesystem_lustre...
Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_WEEK for filesystem_lustre...
Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_MONTH for filesystem_lustre...
Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_YEAR for filesystem_lustre...
Another log is the output represented in /var/log/messages. When I run “service cerebrod restart” on OSS machine, following output is shown in messages log file:
Sep 13 16:57:53 OSS /usr/sbin/cerebrod[5418]: lmt_monitor_get_filesystem_info(): problem accessing file /usr/share/lmt/cron/lmtrc
Sep 13 16:57:53 OSS /usr/sbin/cerebrod[5418]: lmt_monitor_setup(): problem getting filesystem info
Should I create lmtrc file in OSS machine. I think this error is due to setting “cerebrod_listen on” in cerebro.conf. Am I true? If so, what should be the dbuser? lwatchadmin or lwatchclient?
Anyone know what should I do? I'm really waiting for any reply :(
Regards
kian