[Lustre-discuss] Backing up the OSSs and MDSs

Tim Bell

unread,

Oct 31, 2008, 2:38:09 PM10/31/08

to lustre-...@lists.lustre.org

I'm new to Lustre and am investigating what would be required for
production. One of the things that we would require would be backup of
the MDS and OSS data.

For the OSSs, from what I can see, this can be achieved on a per-volume
basis using LVM and snapshotting. However, I feel uncomfortable with
this approach and would prefer something where I can restore an
individual file from backup if lost. Is there another way other than
find with mtime to find the list of modified files in the file system ?

For the MDSs, I cannot see how to back up an online MDS. If I do a
split mirror without quiesce and backup the ext3 file system of the
split-off mirror, I get an inconsistent state of database.

I've seen various references to Amanda based systems but I've not been
able to find the code that implements it so I could evaluate
implementing the same thing with our TSM system.

Tim

_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brian J. Murrell

unread,

Oct 31, 2008, 3:02:33 PM10/31/08

to lustre-...@lists.lustre.org

On Fri, 2008-10-31 at 19:38 +0100, Tim Bell wrote:
>
> For the OSSs, from what I can see, this can be achieved on a per-volume
> basis using LVM and snapshotting.

Yes.

> However, I feel uncomfortable with
> this approach and would prefer something where I can restore an
> individual file from backup if lost.

Then you need to do filesystem backups, not device backups. You do
filesystem backups from a lustre client rather than on individual server
nodes. This is no different than how you would back up any other
filesystem. I believe there is coverage in the manual about filesystem
level backups. You need to pay attention to being able to backup EAs
used for striping and so on if you wish to restore files with their
striping attributes intact.

> Is there another way other than
> find with mtime to find the list of modified files in the file system ?

Some tools, such as gnutar have a "find all files newer than" feature
built into them.

> For the MDSs, I cannot see how to back up an online MDS. If I do a
> split mirror without quiesce and backup the ext3 file system of the
> split-off mirror, I get an inconsistent state of database.

Right. The closest you can come is to synchronize as best you can
taking the LVM snapshot of the MDT and all of the OSTs and then backing
each of the snapshots up at your leisure.

> I've seen various references to Amanda based systems but I've not been
> able to find the code that implements it so I could evaluate
> implementing the same thing with our TSM system.

Indeed, there was an announcement here a while ago from zamanda about
supporting Lustre but I have not seen nor heard anything since.

b.

signature.asc

Steden Klaus

unread,

Oct 31, 2008, 4:02:43 PM10/31/08

to lustre-...@lists.lustre.org

Hi folks,

Our Lustre started exhibiting some curious performance issues today ... basically, it slowed down dramatically and reliable I/O performance became impossible. I looked through the output of dmesg and saw a number of kernel 'oops' messages, but not being a Lustre kernel expert, I'm not exactly sure what they indicate. I stopped the OSTs on the node in question and ran e2fsck on the OST drives, but they've come up clean so I don't think it's a hardware problem. I don't have physical access to the machine right now so it may in fact be something on the back end, but I'm working on verifying that with a technician on site. In the meantime ... can anyone help decipher this for me? There are a couple of messages like it:

-- cut --
ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB)
00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8
00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0
0000000000000013 0000000000000000
Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317}
<ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0}
<ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932}
<ffffffffa054227d>{:ptlrpc:l_has_lock+77} <ffffffffa056f76c>{:ptlrpc:ldlm_add_waiting_lock+2156}
<ffffffff80137be0>{default_wake_function+0} <ffffffffa0592620>{:ptlrpc:expired_request+0}
<ffffffffa05926e0>{:ptlrpc:interrupted_request+0} <ffffffffa0574ae8>{:ptlrpc:ldlm_server_blocking_ast+4072}
<ffffffffa0547f5a>{:ptlrpc:ldlm_run_ast_work+234} <ffffffffa05421f8>{:ptlrpc:l_unlock+248}
<ffffffffa055ac73>{:ptlrpc:ldlm_process_extent_lock+995}
<ffffffffa054bb4f>{:ptlrpc:ldlm_lock_enqueue+1087}
<ffffffffa0574bb0>{:ptlrpc:ldlm_server_completion_ast+0}
<ffffffffa0575b50>{:ptlrpc:ldlm_server_glimpse_ast+0}
<ffffffffa0578694>{:ptlrpc:ldlm_handle_enqueue+8836}
<ffffffffa0677485>{:obdfilter:filter_fmd_expire+53}
<ffffffffa0611dc0>{:kviblnd:kibnal_cq_callback+0} <ffffffffa0573b00>{:ptlrpc:ldlm_server_blocking_ast+0}
<ffffffffa0574bb0>{:ptlrpc:ldlm_server_completion_ast+0}
<ffffffffa062cbb2>{:ost:ost_handle+20738} <ffffffff801f78d9>{number+233}
<ffffffff801f7f09>{vsnprintf+1321} <ffffffffa048e022>{:libcfs:libcfs_debug_msg+1554}
<ffffffffa05a593a>{:ptlrpc:ptlrpc_server_handle_request+3066}
<ffffffff80116d35>{do_gettimeofday+101} <ffffffffa049014e>{:libcfs:lcw_update_time+30}
<ffffffff801454bd>{__mod_timer+317} <ffffffffa05a6d17>{:ptlrpc:ptlrpc_main+2375}
<ffffffff80137be0>{default_wake_function+0} <ffffffffa05a63c0>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa05a63c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff801114ab>{child_rip+8}
<ffffffffa05a63d0>{:ptlrpc:ptlrpc_main+0} <ffffffff801114a3>{child_rip+0}
-- cut --

I can include debug logs as well, if needed. Any help is greatly appreciated.

cheers,
Klaus

Andreas Dilger

unread,

Nov 3, 2008, 12:47:41 PM11/3/08

to Steden Klaus, lustre-...@lists.lustre.org

On Oct 31, 2008 13:02 -0700, Steden Klaus wrote:
> Our Lustre started exhibiting some curious performance issues today
> ... basically, it slowed down dramatically and reliable I/O performance
> became impossible. I looked through the output of dmesg and saw a number
> of kernel 'oops' messages, but not being a Lustre kernel expert, I'm
> not exactly sure what they indicate. I stopped the OSTs on the node in
> question and ran e2fsck on the OST drives, but they've come up clean so
> I don't think it's a hardware problem. I don't have physical access to
> the machine right now so it may in fact be something on the back end,
> but I'm working on verifying that with a technician on site. In the
> meantime ... can anyone help decipher this for me? There are a couple
> of messages like it:

These kind of messages are of relatively little use unless they include
some of the preceding lines. Are you sure this is an oops, and not a
watchdog timeout that is dumping the stack?

> -- cut --
> ll_ost_215 S 00000100d2141808 0 8584 1 8585 8583 (L-TLB)
> 00000101184233e8 0000000000000046 000000000000000f ffffffffa059c3b8
> 00000000005c2616 0000000100000000 0000000000000000 00000100d21418b0
> 0000000000000013 0000000000000000
> Call Trace:<ffffffffa059c3b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317}
> <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0}
> <ffffffffa0596e84>{:ptlrpc:ptlrpc_queue_wait+6932}

This looks like a network problem, but hard to say without more information.
If you are a supported customer, you will have better service by filing a
bugzilla bug. This list only gets "as available" replies and that is
doubly true for old 1.4 Lustre installations.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Steden Klaus

unread,

Nov 3, 2008, 2:39:56 PM11/3/08

to Andreas Dilger, lustre-...@lists.lustre.org

Hello Andreas,

Thanks for the info. Here's some more information (basically cut and pasted from before and after a typical stack dump message):

-- cut --
LustreError: 20438:0:(service.c:648:ptlrpc_server_handle_request()) request 4777659 opc 101 from 12345-10.0.0.249@vib processed in 118s trans 0 rc 0/0
Lustre: 20438:0:(watchdog.c:302:lcw_update_time()) Expired watchdog for pid 20438 disabled after 118.9719s
LustreError: 20257:0:(ldlm_lockd.c:579:ldlm_server_completion_ast()) ### enqueue wait took 118984007us from 1225493396 ns: filter-ost13_UUID lock: 0000010036031dc0/0x7bef85be5c78b145 lrc: 2/0,0 mode: PW/PW res: 15229751/0 rrc: 3 type:
EXT [0->18446744073709551615] (req 8482816->8503295) flags: 20 remote: 0x84d9c08a99ff4b23 expref: 113 pid: 20438
LustreError: 20500:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225493514, 20s ago) req@00000100cb900e00 x1151/t0 o104->@NET_0xb00000a0000fa_UUID:15 lens 176/64 ref 2 fl Rpc:/0/0 rc 0/0
LustreError: A client on nid 10.0.0.250@vib was evicted from service ost5.
LustreError: 20500:0:(ldlm_lockd.c:423:ldlm_failed_ast()) ### blocking AST failed (-110): evicting client 36604_tiger_lov_7a82cb6234@NET_0xb00000a0000fa_UUID NID 10.0.0.250@vib (10.0.0.250@vib) ns: filter-ost5_UUID lock: 000001008f840c
40/0x7bef85be5c78c80c lrc: 2/0,0 mode: PW/PW res: 15158480/0 rrc: 57 type: EXT [4651483136->4802478079] (req 4651483136->4654473215) flags: 20 remote: 0x789bd2e5e1ada437 expref: 120 pid: 20579
Lustre: 20107:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from 12345-10.0.0.250@vib portal 16 match 1151 offset 0 length 64: 2
LustreError: 20269:0:(ldlm_lockd.c:1417:ldlm_cancel_handler()) operation 103 from 12345-10.0.0.250@vib with bad export cookie 8930503638740276498
LustreError: 20580:0:(ldlm_lib.c:1272:target_send_reply_msg()) @@@ processing error (-107) req@0000010060cc0000 x7055836/t0 o101-><?>@<?>:-1 lens 176/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 0:0:(ldlm_lockd.c:205:waiting_locks_callback()) ### lock callback timer expired: evicting client 7cb21_tiger_lov_7afb88fa94@NET_0xb00000a0000f9_UUID nid 10.0.0.249@vib ns: filter-ost13_UUID lock: 0000010036031dc0/0x7bef85
be5c78b145 lrc: 1/0,0 mode: PW/PW res: 15229751/0 rrc: 2 type: EXT [0->18446744073709551615] (req 8482816->8503295) flags: 20 remote: 0x84d9c08a99ff4b23 expref: 117 pid: 20438
LustreError: 20452:0:(ldlm_lib.c:1272:target_send_reply_msg()) @@@ processing error (-107) req@0000010064e22200 x4778674/t0 o101-><?>@<?>:-1 lens 176/0 ref 0 fl Interpret:/0/0 rc -107/0
Lustre: 0:0:(watchdog.c:121:lcw_cb()) Watchdog triggered for pid 20559: it was inactive for 100000ms
Lustre: 0:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing stack for process 20559
ll_ost_220 S 000001006e854008 0 20559 1 20560 20558 (L-TLB)
000001009739f3e8 0000000000000046 000000000000000f ffffffffa05b33b8
0000000000000548 0000000100000000 0000000000000000 000001006e8540b0
0000000000000013 0000000000000000
Call Trace:<ffffffffa05b33b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317}
<ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0}
<ffffffffa05ade84>{:ptlrpc:ptlrpc_queue_wait+6932}
<ffffffffa055927d>{:ptlrpc:l_has_lock+77} <ffffffffa058676c>{:ptlrpc:ldlm_add_waiting_lock+2156}
<ffffffff80137be0>{default_wake_function+0} <ffffffffa05a9620>{:ptlrpc:expired_request+0}
<ffffffffa05a96e0>{:ptlrpc:interrupted_request+0} <ffffffffa058bae8>{:ptlrpc:ldlm_server_blocking_ast+4072}
<ffffffffa055ef5a>{:ptlrpc:ldlm_run_ast_work+234} <ffffffffa05591f8>{:ptlrpc:l_unlock+248}
<ffffffffa0571c73>{:ptlrpc:ldlm_process_extent_lock+995}
<ffffffffa0562b4f>{:ptlrpc:ldlm_lock_enqueue+1087}
<ffffffffa058bbb0>{:ptlrpc:ldlm_server_completion_ast+0}
<ffffffffa058cb50>{:ptlrpc:ldlm_server_glimpse_ast+0}
<ffffffffa058f694>{:ptlrpc:ldlm_handle_enqueue+8836}
<ffffffffa068c485>{:obdfilter:filter_fmd_expire+53}
<ffffffffa058ab00>{:ptlrpc:ldlm_server_blocking_ast+0}
<ffffffffa058bbb0>{:ptlrpc:ldlm_server_completion_ast+0}
<ffffffffa0641bb2>{:ost:ost_handle+20738} <ffffffff801f787e>{number+142}
<ffffffff801f7f09>{vsnprintf+1321} <ffffffffa049f022>{:libcfs:libcfs_debug_msg+1554}
<ffffffffa05bc93a>{:ptlrpc:ptlrpc_server_handle_request+3066}
<ffffffff80116d35>{do_gettimeofday+101} <ffffffffa04a114e>{:libcfs:lcw_update_time+30}
<ffffffff801454bd>{__mod_timer+317} <ffffffffa05bdd17>{:ptlrpc:ptlrpc_main+2375}
<ffffffff80137be0>{default_wake_function+0} <ffffffffa05bd3c0>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa05bd3c0>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff801114ab>{child_rip+8}
<ffffffffa05bd3d0>{:ptlrpc:ptlrpc_main+0} <ffffffff801114a3>{child_rip+0}

LustreError: dumping log to /tmp/lustre-log-tiger-oss-0-0.local.1225493885.20559
LustreError: 20526:0:(client.c:940:ptlrpc_expire_one_request()) @@@ timeout (sent at 1225493865, 20s ago) req@000001001986c000 x1346/t0 o106->@NET_0xb00000a0000ed_UUID:15 lens 176/112 ref 2 fl Rpc:/0/0 rc 0/0
LustreError: A client on nid 10.0.0.237@vib was evicted from service ost1.
LustreError: 20526:0:(ldlm_lockd.c:423:ldlm_failed_ast()) ### glimpse AST failed (-110): evicting client 82ed7_tiger_lov_eea68aa0e0@NET_0xb00000a0000ed_UUID NID 10.0.0.237@vib (10.0.0.237@vib) ns: filter-ost1_UUID lock: 000001010f17dac
0/0x7bef85be5c792519 lrc: 2/0,0 mode: PW/PW res: 15168088/0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0 remote: 0x3dd1a37046787550 expref: 182 pid: 20464
LustreError: 20702:0:(ost_handler.c:955:ost_brw_write()) @@@ Eviction on bulk GET req@0000010061543400 x4033002/t0 o4->82ed7_tiger_lov_eea68aa0e0@NET_0xb00000a0000ed_UUID:-1 lens 328/288 ref 0 fl Interpret:/0/0 rc 0/0
LustreError: 20866:0:(ost_handler.c:747:ost_brw_read()) @@@ timeout on bulk PUT req@0000010055099050 x4032963/t0 o3->82ed7_tiger_lov_eea68aa0e0@NET_0xb00000a0000ed_UUID:-1 lens 328/280 ref 0 fl Interpret:/0/0 rc 0/0
LustreError: 20866:0:(ost_handler.c:747:ost_brw_read()) Skipped 4 previous similar messages
Lustre: 20866:0:(ost_handler.c:813:ost_brw_read()) ost13: ignoring bulk IO comm error with 82ed7_tiger_lov_eea68aa0e0@NET_0xb00000a0000ed_UUID id 12345-10.0.0.237@vib - client will retry
Lustre: 20866:0:(ost_handler.c:813:ost_brw_read()) Skipped 4 previous similar messages
Lustre: 20702:0:(ost_handler.c:1050:ost_brw_write()) ost1: ignoring bulk IO comm error with 82ed7_tiger_lov_eea68aa0e0@NET_0xb00000a0000ed_UUID id 12345-10.0.0.237@vib - client will retry
Lustre: 20106:0:(lib-move.c:1728:lnet_parse_reply()) 10.0.0.230@vib: Dropping REPLY from 12345-10.0.0.237@vib for invalid MD 0x45a932db4532a.0xb4ef9
Lustre: 20106:0:(lib-move.c:1624:lnet_parse_put()) Dropping PUT from 12345-10.0.0.237@vib portal 16 match 1346 offset 0 length 112: 2
LustreError: 20592:0:(ldlm_lib.c:1272:target_send_reply_msg()) @@@ processing error (-107) req@000001010209d800 x4033039/t0 o101-><?>@<?>:-1 lens 176/0 ref 0 fl Interpret:/0/0 rc -107/0
LustreError: 20559:0:(service.c:648:ptlrpc_server_handle_request()) request 4510790 opc 101 from 12345-10.0.0.232@vib processed in 109s trans 0 rc 0/0
-- cut --

I'm working on urging the clients to upgrade to 1.6 but it's slow going.

Any insight would be very helpful ... I can probably get on top of the fix if I have an inkling of where to start looking ... at this point, I don't, I just get reports that it's "slow".

thanks,
Klaus

PS Sorry for the massive paste to everyone else on the list.

Andreas Dilger

unread,

Nov 4, 2008, 5:43:45 PM11/4/08

to Tim Bell, lustre-...@lists.lustre.org

On Oct 31, 2008 19:38 +0100, Tim Bell wrote:
> For the OSSs, from what I can see, this can be achieved on a per-volume
> basis using LVM and snapshotting. However, I feel uncomfortable with
> this approach and would prefer something where I can restore an
> individual file from backup if lost.

I would strongly recommend filesystem-level backups instead of doing
device-level backups. This gives you full control of which files to
backup, and allows restoring individual files as needed. It is also
easiest to integrate into existing backup solutions.

It is probably also desirable to do a device-level backup of the MDS,
because it is the central part of the Lustre filesystem, and if this
device fails permanently then the entire filesystem would need to be
restored. Lustre can generally handle if the MDS backup is slightly
out of date w.r.t. the OSTs, though care must be taken when restoring
in such a scenario.

> Is there another way other than
> find with mtime to find the list of modified files in the file system ?

There is the "e2scan" program that is included with recent versions of
the lustre e2fsprogs RPM. It is a fast scanner of the MDS filesystem
that returns a list of files that have changed, like "find". You can
also use "find" or "lfs find" to traverse the filesystem, and this
can be very fast if run on a client mounted on the MDS node.

In the 2.0 release there will be a changelog exported from Lustre which
will catalog files that have changed in the filesystem since the last
time an application processed the log.

> For the MDSs, I cannot see how to back up an online MDS. If I do a
> split mirror without quiesce and backup the ext3 file system of the
> split-off mirror, I get an inconsistent state of database.

You should use LVM mirroring to ensure the filesystem is coherent
when the snapshot is made. There are patches going into the upstream
kernel by Takashi Sato of NEC to allow a userspace process to freeze
the filesystem while doing this sort of hardware mirror split
(http://marc.info/?l=linux-fsdevel&m=122511248322724&w=4), but as
yet this is not in the upstream kernel or any Lustre kernel.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________

Andreas Dilger

unread,

Nov 7, 2008, 5:46:56 PM11/7/08

to Steden Klaus, lustre-...@lists.lustre.org

On Nov 03, 2008 11:39 -0800, Steden Klaus wrote:
> Hello Andreas,
>
> Thanks for the info. Here's some more information (basically cut and pasted from before and after a typical stack dump message):
>
> -- cut --

> Lustre: 0:0:(watchdog.c:121:lcw_cb()) Watchdog triggered for pid 20559: it was inactive for 100000ms
> Lustre: 0:0:(linux-debug.c:155:libcfs_debug_dumpstack()) showing stack for process 20559
> ll_ost_220 S 000001006e854008 0 20559 1 20560 20558 (L-TLB)
> 000001009739f3e8 0000000000000046 000000000000000f ffffffffa05b33b8
> 0000000000000548 0000000100000000 0000000000000000 000001006e8540b0
> 0000000000000013 0000000000000000
> Call Trace:<ffffffffa05b33b8>{:ptlrpc:ptl_send_buf+824} <ffffffff801454bd>{__mod_timer+317}
> <ffffffff8033860d>{schedule_timeout+381} <ffffffff801460a0>{process_timeout+0}

This is a watchdog timeout, and should not be thought of as an oops or panic
as you mentioned in your earlier comment. It is debugging for Lustre.

> LustreError: 20438:0:(service.c:648:ptlrpc_server_handle_request()) request 4777659 opc 101 from 12345-10.0.0.249@vib processed in 118s trans 0 rc 0/0
> Lustre: 20438:0:(watchdog.c:302:lcw_update_time()) Expired watchdog for pid 20438 disabled after 118.9719s
> LustreError: 20257:0:(ldlm_lockd.c:579:ldlm_server_completion_ast()) ### enqueue wait took 118984007us from 1225493396 ns: filter-ost13_UUID lock: 0000010036031dc0/0x7bef85be5c78b145 lrc: 2/0,0 mode: PW/PW res: 15229751/0 rrc: 3 type:

This looks like slow request processing on the OST. I would suggest to
increase the lustre timeout value to 150s or more. Timeout values of
300s is fairly common in larger installations (1000 nodes or more). In
the 1.8 release Lustre will automatically tune this value so that it
can adapt to larger-scale systems better.

Klaus Steden

unread,

Nov 7, 2008, 6:03:00 PM11/7/08

to Andreas Dilger, lustre-...@lists.lustre.org

On 11/7/08 2:46 PM, "Andreas Dilger" <adi...@sun.com> etched on stone
tablets:

Hi Andreas,

That is good to know. I thought it was an oops message, not normal
debugging. I'm used to Lustre "just working" without this kind of output, so
I assumed the worst when I saw them.

As far as the issue itself, apparently the Voltaire switch needed a restart,
which has cleared whatever condition was causing the problem.

thanks,
Klaus

Reply all

Reply to author

Forward