I have installed lmt-server on my monitoring host, and the lmt-server agent on
my lustre servers. Still something is odd with ltop. I have verified that data
collection works as intended, as I can see mysql tabled getting populated. But
ltop shows this error:
$ ltop
MODULE DIR = /usr/lib64/cerebro
MODULE DIR = /usr/lib64/cerebro
ltop: no data found for file system `cfs-l'
Following the installation instructions on the googlecode page for "Get ltop
Working" I have:
1. installed cerebro on the Lustre servers and the management node
2. installed lmt-server on the management node
3. installed lmt-server-agent on the servers
4. Confirmed that I can get live data by lmtmetric -m ost|mdt|osc
5. Restarted cerebrod
6. Confirmed that I can get live data by lmtmetric -m ost|mdt|osc, still
7. tried ltop and got the error mentioned above
Are there any procedure I have missed, or some way to verify manually the
steps ltop tries to do? Using the -f change nothing.
BTW, the "usage" message for ltop is not very useful. It's very terse.
/andreas
--
Systems Engineer
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
Phone: 087906658
"A satellite, an earring, and a dust bunny are what made America great!"
ltop is probably seeing mds data but not OST since it reports the file system
name. When you look at the database do you see OST data being updated?
Also: ltop ignores data older than 12s. Maybe a time sync problem?
Try ltop -s 60 (or some number >12) and see if data starts appearing?
Agreed the Usage message is useless. I'll fix that (easy one!)
Jim
> --
> You received this message because you are subscribed to the Google Groups "lmt-discuss" group.
> To post to this group, send email to lmt-d...@googlegroups.com.
> To unsubscribe from this group, send email to lmt-discuss...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lmt-discuss?hl=en.
>
>
Check!
> ltop is probably seeing mds data but not OST since it reports the file
> system name. When you look at the database do you see OST data being
> updated?
I guess there are no debug flags available? (Yeah, I know I can look at the
source, but I'm not fluent in C, sorry...)
lmtsh -f cfs-l
cfs-l> t
Available tables for cfs-l:
Table Name Row Count
EVENT_DATA 0
EVENT_INFO 0
FILESYSTEM_AGGREGATE_DAY 9
FILESYSTEM_AGGREGATE_HOUR 45
FILESYSTEM_AGGREGATE_MONTH 9
FILESYSTEM_AGGREGATE_WEEK 9
FILESYSTEM_AGGREGATE_YEAR 9
FILESYSTEM_INFO 1
MDS_AGGREGATE_DAY 5
MDS_AGGREGATE_HOUR 25
MDS_AGGREGATE_MONTH 5
MDS_AGGREGATE_WEEK 5
MDS_AGGREGATE_YEAR 5
MDS_DATA 3047
MDS_INFO 1
MDS_OPS_DATA 63987
MDS_VARIABLE_INFO 7
OPERATION_INFO 81
OSS_DATA 6087
OSS_INFO 2
OSS_INTERFACE_DATA 0
OSS_INTERFACE_INFO 0
OSS_VARIABLE_INFO 7
OST_AGGREGATE_DAY 54
OST_AGGREGATE_HOUR 270
OST_AGGREGATE_MONTH 54
OST_AGGREGATE_WEEK 54
OST_AGGREGATE_YEAR 54
OST_DATA 18261
Looks like MDS|OSS|OST data is arriving in the db alright, and those numbers
increases, so it's dynamic data.
> Also: ltop ignores data older than 12s. Maybe a time sync problem?
> Try ltop -s 60 (or some number >12) and see if data starts appearing?
No change after trying ltop -s 60, sadly.
All machines are using kerberos, and thus highly dependant on synced clocks. I
have confirmed that ntp is running and the times are all synced. Time shouldn't
be an issue.
> Agreed the Usage message is useless. I'll fix that (easy one!)
One step at a time towards perfection!
- has cerebrod been restarted on all lustre servers since the update?
(old data format should still populate db and work with lwatch/lstat,
but not with ltop)
- on the mds, does lmtmetric -m osc show data?
(I seem to recall there being some content in proc files not anticipated
in some past debugging session)
Jim
Yes, multiple times.
> - on the mds, does lmtmetric -m osc show data?
> (I seem to recall there being some content in proc files not anticipated
> in some past debugging session)
It does show something
# lmtmetric -m osc
osc: 1;cfs-mds-mgs-l.pdc.kth.se;cfs-l-OST0000;F;cfs-l-OST0001;F;cfs-l-
OST0002;F;cfs-l-OST0003;F;cfs-l-OST0004;F;cfs-l-OST0005;F
But "lmtmetric -m ost" don't
http://code.google.com/p/lmt/issues/detail?id=50
If you have a moment would you mind adding to that issue the output of:
on your MDS: lmtmetric -m mdt
on an OSS: lmtmetric -m ost
(BTW: it is expected that -m ost would not produce output on the MDS.
The osc metric represents MDS's OST client state - we are capturing the
MDS's view of the OST state there. The ost metric represents OST server state).
Thanks,
Jim
Now ltop works just fine!
Thanks!
/andreas
Cheers,
Jim
========================================================================
Release Notes for LMT 3.1.2 11 Feb 2011
========================================================================
* Issue 48: lmtinit cannot handle hyphen in database name
* Issue 50: ltop not working with hyphenated file system name
* ltop: added options description to Usage message.