ltop not working

36 views
Skip to first unread message

Andreas Davour

unread,
Feb 10, 2011, 8:16:37 AM2/10/11
to lmt-d...@googlegroups.com

Hi

I have installed lmt-server on my monitoring host, and the lmt-server agent on
my lustre servers. Still something is odd with ltop. I have verified that data
collection works as intended, as I can see mysql tabled getting populated. But
ltop shows this error:
$ ltop
MODULE DIR = /usr/lib64/cerebro
MODULE DIR = /usr/lib64/cerebro
ltop: no data found for file system `cfs-l'

Following the installation instructions on the googlecode page for "Get ltop
Working" I have:

1. installed cerebro on the Lustre servers and the management node
2. installed lmt-server on the management node
3. installed lmt-server-agent on the servers
4. Confirmed that I can get live data by lmtmetric -m ost|mdt|osc
5. Restarted cerebrod
6. Confirmed that I can get live data by lmtmetric -m ost|mdt|osc, still

7. tried ltop and got the error mentioned above

Are there any procedure I have missed, or some way to verify manually the
steps ltop tries to do? Using the -f change nothing.

BTW, the "usage" message for ltop is not very useful. It's very terse.

/andreas
--
Systems Engineer
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"

Jim Garlick

unread,
Feb 10, 2011, 9:11:15 AM2/10/11
to lmt-d...@googlegroups.com
Your procedure sounds OK.

ltop is probably seeing mds data but not OST since it reports the file system
name. When you look at the database do you see OST data being updated?

Also: ltop ignores data older than 12s. Maybe a time sync problem?
Try ltop -s 60 (or some number >12) and see if data starts appearing?

Agreed the Usage message is useless. I'll fix that (easy one!)

Jim

> --
> You received this message because you are subscribed to the Google Groups "lmt-discuss" group.
> To post to this group, send email to lmt-d...@googlegroups.com.
> To unsubscribe from this group, send email to lmt-discuss...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lmt-discuss?hl=en.
>
>

Andreas Davour

unread,
Feb 10, 2011, 10:00:17 AM2/10/11
to lmt-d...@googlegroups.com, Jim Garlick
On Thursday, February 10, 2011 15:11:15 Jim Garlick wrote:
> Your procedure sounds OK.

Check!



> ltop is probably seeing mds data but not OST since it reports the file
> system name. When you look at the database do you see OST data being
> updated?

I guess there are no debug flags available? (Yeah, I know I can look at the
source, but I'm not fluent in C, sorry...)

lmtsh -f cfs-l
cfs-l> t
Available tables for cfs-l:
Table Name Row Count
EVENT_DATA 0
EVENT_INFO 0
FILESYSTEM_AGGREGATE_DAY 9
FILESYSTEM_AGGREGATE_HOUR 45
FILESYSTEM_AGGREGATE_MONTH 9
FILESYSTEM_AGGREGATE_WEEK 9
FILESYSTEM_AGGREGATE_YEAR 9
FILESYSTEM_INFO 1
MDS_AGGREGATE_DAY 5
MDS_AGGREGATE_HOUR 25
MDS_AGGREGATE_MONTH 5
MDS_AGGREGATE_WEEK 5
MDS_AGGREGATE_YEAR 5
MDS_DATA 3047
MDS_INFO 1
MDS_OPS_DATA 63987
MDS_VARIABLE_INFO 7
OPERATION_INFO 81
OSS_DATA 6087
OSS_INFO 2
OSS_INTERFACE_DATA 0
OSS_INTERFACE_INFO 0
OSS_VARIABLE_INFO 7
OST_AGGREGATE_DAY 54
OST_AGGREGATE_HOUR 270
OST_AGGREGATE_MONTH 54
OST_AGGREGATE_WEEK 54
OST_AGGREGATE_YEAR 54
OST_DATA 18261

Looks like MDS|OSS|OST data is arriving in the db alright, and those numbers
increases, so it's dynamic data.



> Also: ltop ignores data older than 12s. Maybe a time sync problem?
> Try ltop -s 60 (or some number >12) and see if data starts appearing?

No change after trying ltop -s 60, sadly.

All machines are using kerberos, and thus highly dependant on synced clocks. I
have confirmed that ntp is running and the times are all synced. Time shouldn't
be an issue.

> Agreed the Usage message is useless. I'll fix that (easy one!)

One step at a time towards perfection!

Jim Garlick

unread,
Feb 10, 2011, 11:39:32 AM2/10/11
to lmt-d...@googlegroups.com, Jim Garlick
Couple of other things to check:

- has cerebrod been restarted on all lustre servers since the update?
(old data format should still populate db and work with lwatch/lstat,
but not with ltop)

- on the mds, does lmtmetric -m osc show data?
(I seem to recall there being some content in proc files not anticipated
in some past debugging session)

Jim

Andreas Davour

unread,
Feb 10, 2011, 11:52:20 AM2/10/11
to lmt-d...@googlegroups.com, Jim Garlick, Jim Garlick
On Thursday, February 10, 2011 17:39:32 Jim Garlick wrote:
> Couple of other things to check:
>
> - has cerebrod been restarted on all lustre servers since the update?
> (old data format should still populate db and work with lwatch/lstat,
> but not with ltop)

Yes, multiple times.



> - on the mds, does lmtmetric -m osc show data?
> (I seem to recall there being some content in proc files not anticipated
> in some past debugging session)

It does show something

# lmtmetric -m osc
osc: 1;cfs-mds-mgs-l.pdc.kth.se;cfs-l-OST0000;F;cfs-l-OST0001;F;cfs-l-
OST0002;F;cfs-l-OST0003;F;cfs-l-OST0004;F;cfs-l-OST0005;F

But "lmtmetric -m ost" don't

Jim Garlick

unread,
Feb 10, 2011, 12:52:57 PM2/10/11
to Andreas Davour, lmt-d...@googlegroups.com, Jim Garlick
I am looking at code to find someplace where that hyphen in your filesystem
name is causing another problem. Meanwhile I've opened issue 50 to track this

http://code.google.com/p/lmt/issues/detail?id=50

If you have a moment would you mind adding to that issue the output of:

on your MDS: lmtmetric -m mdt
on an OSS: lmtmetric -m ost

(BTW: it is expected that -m ost would not produce output on the MDS.
The osc metric represents MDS's OST client state - we are capturing the
MDS's view of the OST state there. The ost metric represents OST server state).

Thanks,

Jim

Jim Garlick

unread,
Feb 10, 2011, 1:02:06 PM2/10/11
to Andreas Davour, lmt-d...@googlegroups.com, Jim Garlick
Oh never mind, I found a really obvious one in ltop.
Let me give you another tarball (this time I will not spam the list with it!)
Jim

Andreas Davour

unread,
Feb 11, 2011, 6:34:31 AM2/11/11
to lmt-d...@googlegroups.com
On Thursday, February 10, 2011 07:02:06 pm Jim Garlick wrote:
> Oh never mind, I found a really obvious one in ltop.
> Let me give you another tarball (this time I will not spam the list with
> it!) Jim

Now ltop works just fine!

Thanks!

/andreas

Jim Garlick

unread,
Feb 11, 2011, 12:49:31 PM2/11/11
to lmt-d...@googlegroups.com
Great, I've release lmt-3.1.2 out on the google code site:

http://code.google.com/p/lmt/

Cheers,

Jim

========================================================================
Release Notes for LMT 3.1.2 11 Feb 2011
========================================================================

* Issue 48: lmtinit cannot handle hyphen in database name

* Issue 50: ltop not working with hyphenated file system name

* ltop: added options description to Usage message.

Reply all
Reply to author
Forward
0 new messages