The errors in dmesg are:
LustreError: 11-0: an error occurred while communicating with
141.212.30.181@tcp. The obd_ping operation failed with -107
Lustre: nobackup-OST0001-osc-000001007d548400: Connection to service
nobackup-OST0001 via nid 141.212.30.181@tcp was lost; in progress
operations using this service will wait for recovery to complete.
LustreError: 167-0: This client was evicted by nobackup-OST0001; in
progress operations using this service will fail.
LustreError: 29595:0:(file.c:1052:ll_glimpse_size()) obd_enqueue
returned rc -5, returning -EIO
LustreError: 29629:0:(file.c:1052:ll_glimpse_size()) obd_enqueue
returned rc -5, returning -EIO
OST0000 also lives at 141.212.30.181, so its strange that only one
will kill it off. Is there a way to ask lustre to restore this? Up
till this point, the client would recover quickly, but this time its
just waiting.
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
_______________________________________________
Lustre-discuss mailing list
Lustre-...@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
You could try "lctl --device {OSC device in question} recover".
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
i ended up having to umount and mount, which finally reconnected the ost.
--
}}}===============>> LLNL
James E. Harm (Jim); jh...@llnl.gov
System Administrator, ICCD Clusters
(925) 422-4018 Page: 423-7705x57152
You can use "echo_client" to perform operations on a single OST. See
the lustre-iokit obdfilter-survey for usage details.