# lmtinit -l
dc
dcwan
rack5
# ltop -f dc
ltop: no data found for file system `dc'
I get the same for dcwan. I'm running lmt 3.1.2. My test cluster is on
Lustre 1.8.5, but my other two filesystems are 1.8.1.1. I restarted
cerebro on my headnode after doing the lmtinit for the new systems.
Suggestions?
Thanks,
-Nathan
It may help to know that filesystem dc started out as Lustre 1.4 so it
has some out-dated naming methods used for UUID and other components.
Dcwan started out as Lustre 1.6, but I still can't get ltop to do
anything with it.
Thanks,
-Nathan
> --
> You received this message because you are subscribed to the Google
> Groups "lmt-discuss" group.
> To post to this group, send email to lmt-d...@googlegroups.com.
> To unsubscribe from this group, send email to lmt-discuss...@googlegroups.com
> .
> For more options, visit this group at http://groups.google.com/group/lmt-discuss?hl=en
> .
>
Note that (to the best of my knowledge) lmtinit and ltop are unrelated
in lmt 3. lmt now uses cerebro directly to report information, it does
not use the MySQL database at all. lmtinit on the other hand is only
used to initialize the MySQL database.
Here's something you could try to diagnose the problem:
On the node where you want to run ltop, do:
/usr/sbin/cerebro-stat -m lmt_mdt
What do you see?
On a test cluster where we have two filesystems, I see this:
tycho-mds1:
1;tycho-mds1;0.000000;1.281737;lc1-MDT0000;414108881;463677625;1656435524;1688473892;18;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;5;0;0;1;0;0;1163432;0;0;0;0;0;0;0;0;0;0;0;1;0;0;96;0;0;493;0;0;0;0;0
tycho-mds2:
1;tycho-mds2;0.000000;20.907624;lc2-MDT0000;309666155;309666834;1238664620;1238815968;113125622;0;0;56507168;0;0;599585;0;0;558998;0;0;48266055;0;0;34438276;0;0;34365413;0;0;231593;0;0;0;0;0;2;0;0;277;0;0;134;0;0;273;0;0;4276276;0;0;1502;0;0;1462;0;0;968285;0;0;93182986;0;0;48;0;0;217;0;0;0;0;0
Not the "lc1-MDT0000" and "lc2-MDT0000" strings that label the mdts.
Note that in lmt the "lc1" and "lc2" parts are assumed to be the names
of the filesystem.
You appear to be saying filesystem named "lustre" with the command "ltop
-f lustre", but then saying that the filesystem is named "dc". What is
the actual name of your filesystem? Note that the name of the mount
point on the clients is completely irrelevant here. The mount point
could be named anything.
After you figuring that out, you can also use cerebro-stat to see what
nodes you are getting information from for the metric "lmt_ost".
Chris
/usr/sbin/cerebro-stat -m lmt_mdt
mds01: 1;mds01;2.816180;99.443103;mds-dc;
27887837;217526400;742639604;761246852;1116524878;0;0;452067805;0;0;865784;0;0;4402;0;0;26225127;0;0;1170466;0;0;1105360;0;0;1660957;0;0;7832;0;0;2;0;0;8824;0;0;7981;0;0;7419;0;0;6196157;0;0;492;0;0;509;0;0;550165989;0;0;394191904;0;0;149;0;0;442;0;0;0;0;0
mds03: 1;mds03;0.349825;95.817841;mds-wan;
57396060;142082048;486786200;497184424;183939442;0;0;86699346;0;0;471;0;0;30;0;0;564650;0;0;109585;0;0;30075;0;0;659897;0;0;232939;0;0;0;0;0;10557;0;0;16344;0;0;7015;0;0;5493522;0;0;48;0;0;56;0;0;1895203;0;0;192098572;0;0;132;0;0;269;0;0;0;0;0
oss19: 1;oss19;0.000000;35.634165;rack5-
MDT0000;214167;214238;856668;874872;412;0;0;206;0;0;0;0;0;0;0;0;201;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;1;0;0;1;0;0;0;0;0;1537532;0;0;4;0;0;0;0;0;1;0;0;220;0;0;4;0;0;25;0;0;0;0;0
So by your methodology dc is really named mds-dc and dcwan is mds-wan;
both of those agree with the filesystem name used when mounting up the
mdt. However the output from a command like tunefs.lustre does not
agree, that says the filesystem I refer to as dc is named "lustre" and
dc-wan is "client" when examining an OST. Of course my test system
(rack5/oss19) was freshly built on Lustre 1.8.5 and confirms to what
you are saying.
I removed the lmtinit definitions for dc and dc-wan and recreated one
for mds-dc, however that isn't working either:
ltop -f mds-dc
ltop: no data found for file system `mds-dc'
# ltop -f mds-wan
ltop: no data found for file system `mds-wan'
Lastly, using "cerebrostat -m lmt_ost" shows output from all of my
OSS's from all three filesystems. Here is one oss for the dc filesystem:
cerebro-stat -m lmt_ost
oss01: 2;oss01;0.715746;94.795803;dc-ost1-t01-sdb;
261375068;263103748;1045500272;3784651784;3756176770763;28863622242661;41758735;1415;0;0;0;4820;2915
;COMPLETE 4/4 0s remaining;dc-ost1-t02-sdt;
260874673;262708867;1043498692;3784651784;4331518473912;32251512881465;45210196;1415;0;0;0;4818;2913
;COMPLETE 4/4 0s remaining;dc-ost1-t03-sdc;
240891607;242671012;963566428;3784651784;2741219238556;24900907294852;36673914;1415;0;0;0;4818;2920
;COMPLETE 4/4 0s remaining;dc-ost1-t04-sdu;
254211260;255954932;1016845040;3784651784;3100483272556;26655848666906;39400430;1415;0;0;0;4818;2914
;COMPLETE 4/4 0s remaining;dc-ost1-t05-sdd;
266762617;268605543;1067050468;3784651784;2784307827359;26934858953840;39328771;1415;0;0;0;4818;2911
;COMPLETE 4/4 0s remaining;dc-ost1-t06-sdv;
270301030;272136024;1081204120;3784651784;3449114698240;31016276644389;44388107;1415;0;0;0;4820;2982
;COMPLETE 4/4 0s remaining;
Thanks,
-Nathan
Ah, I see. I think lmt is making a bad assumption about device naming.
It assumes that the mdt or ost device name always begins with the
filesystem name. That may be the default naming scheme of the newer
lustre filesystem creation tools, but I am guessing that it is by no
means a requirement.
So this sounds like an LMT bug.
>
> So by your methodology dc is really named mds-dc and dcwan is mds-wan;
> both of those agree with the filesystem name used when mounting up the
> mdt. However the output from a command like tunefs.lustre does not
> agree, that says the filesystem I refer to as dc is named "lustre" and
> dc-wan is "client" when examining an OST. Of course my test system
> (rack5/oss19) was freshly built on Lustre 1.8.5 and confirms to what
> you are saying.
>
> I removed the lmtinit definitions for dc and dc-wan and recreated one
> for mds-dc, however that isn't working either:
Don't waste your time with lmtinit for ltop. Like I said, lmtinit is
only used to configure the MySQL database which is not used at all by ltop.