LMT agent on OSS system cannot send information to LMT server

tay kian

unread,

Sep 14, 2010, 1:14:07 AM9/14/10

to lmt-d...@googlegroups.com

Hi

I decide to use LMT version 2.6.3 to monitor my Lustre cluster.

Depend on manual in http://code.google.com/p/lmt/, I setup server, agents and client.

The system runs cerebro-1.11 on a centos5.4.

I decide to show my problem by showing following commands’ results.

For server (which is run on a MGS/MDS machine):

MGS-MDS # cerebro-stat –l |grep lmt

lmt_mds

MGS-MDS # cerebro-stat –m cluster_nodes

MGS-MDS.l.com

Client.l.com

This results are shown in client machine, too.

For OSS agent server (which has two OSTs):

OSS # cerebro-stat –l |grep lmt

lmt_oss

lmt_ost

OSS # cerebro-stat –m cluster_nodes

OSS.l.com

The results for OSS are shown when I set cerebro.conf as follow:

cerebrod_listen on

and, when I use “cerebrod_speak on” without “cererod_listen on”, upper commands result nothing.

I think there is a problem with network configuration between OSS and LMT server.

Both mysqld and cerebrod are running on all machines. Also, database shows all components, bus just show data for MDS_DATA info and there isn’t any data in OSS_DATA and OST_DATA (Other tables shows appropriate outputs)

Following is the output that is created after running a cron jub:

MGS-MDS # sh -x lmt_agg.cron

#####################

# OST - filesystem_lustre

#####################

Updating hourly ost agg table for filesystem_lustre

Updating OST_AGGREGATE_HOUR for filesystem_lustre...

Determining starting and ending points in various tables...

Final timestamp from OST_AGGREGATE_HOUR: 2006-01-01 00:00:00, id=0

Final timestamp from OST_DATA (raw): , id=

First timestamp from OST_DATA: No matching timestamp in raw data

Updating other ost agg tables for filesystem_lustre

Updating aggregate tables from OST_AGGREGATE_HOUR for filesystem_lustre...

First timestamp to use from hourly data: ()

Final timestamp to use from hourly data: ()

Updating OST_AGGREGATE_DAY

bad date '' at /usr/lib/perl5/vendor_perl/5.8.8/Date/Manip.pm line 4436.

Updating filesys-level ost tables for filesystem_lustre

Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_HOUR for filesystem_lustre...

Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_DAY for filesystem_lustre...

Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_WEEK for filesystem_lustre...

Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_MONTH for filesystem_lustre...

Updating FILESYSTEM_AGGREGATE from OST_AGGREGATE_YEAR for filesystem_lustre...

Another log is the output represented in /var/log/messages. When I run “service cerebrod restart” on OSS machine, following output is shown in messages log file:

Sep 13 16:57:53 OSS /usr/sbin/cerebrod[5418]: lmt_monitor_get_filesystem_info(): problem accessing file /usr/share/lmt/cron/lmtrc

Sep 13 16:57:53 OSS /usr/sbin/cerebrod[5418]: lmt_monitor_setup(): problem getting filesystem info

Should I create lmtrc file in OSS machine. I think this error is due to setting “cerebrod_listen on” in cerebro.conf. Am I true? If so, what should be the dbuser? lwatchadmin or lwatchclient?

Anyone know what should I do? I'm really waiting for any reply :(

Regards

kian

Message has been deleted

Jim Garlick

unread,

Sep 15, 2010, 1:59:15 PM9/15/10

to lmt-discuss

Hi Kian,

Sounds like cerebro is not communicating. You may need to force
cerebro to use a network that all the servers have in common. Also,
the network switches the servers are connected to must support
multicast. All lustre servers should be configured to speak, and the
lmt server at least (where mysql collects your data) must listen.

On the lmt server you should see all the lmt metrics listed with
cerebro-stat -l (e.g. lmt_ost, lmt_oss, lmt_mds).

You may need to restart cerebrod after changing /etc/cerebro.conf.

Jim

On Sep 13, 10:14 pm, tay kian <kianpi...@gmail.com> wrote:
> Hi
>
> I decide to use LMT version 2.6.3 to monitor my Lustre cluster.
>

> Depend on manual inhttp://code.google.com/p/lmt/, I setup server, agents

Reply all

Reply to author

Forward