FlexProtect Failed

711 views
Skip to first unread message

SRK

unread,
Jun 26, 2018, 9:50:34 AM6/26/18
to Isilon Technical User Group
hi All,

I added a new node into cluster, Somehow except the boot disk and remaining all data not wiped well (came to know after node joined to the cluster)

I rebooted the node, found below messages, recommended to reformat the disks. I pick one disk and tried but the process is failed with "corrupt or invalid GPT detected.
GEOM: da1: GPT rejected -- may not be recoverable"

so I planned to SMARTFAIL that  added node from cluster, but FLEXPROTECT keep failing and did not completed.

Can someone please suggest me the next steps ? Thanks for the help.



mount_efs: Skipping unrecognized IFS drive da8s1e with guid [5011e822000e5d3c 40c97c7970d4f334] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da7s1e with guid [5011e822000ea80f 32d6cbff243fcdac] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da6s1e with guid [5011e822000ef2db 1470547c1265a60e] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da5s1e with guid [5011e822000f3da2 3dcb559a54ca9df8] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da4s1e with guid [5011e82300004693 24c37fec0361e731] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da3s1e with guid [5011e8230000915b 0501b5f0197ffaf1] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da2s1e with guid [5011e8230000dc14 5b993a2376216aac] (reformat with newfs_efs -p)

mount_efs: Skipping unrecognized IFS drive da1s1e with guid [5011e823000126b0 34b0976835c52429] (reformat with newfs_efs -p)

mount_efs: no drives available for mounting ifs
machdep.isilon_hw_monitor: 1046271 -> 1046271
Configuring console port baudrate...  115200
Final IFS daemons:  isi_mcp

isi devices
Node 10, [DOWN]
  Bay 1        Lnum N/A     [USED]         SN:MK0371YHJKDPUD      /dev/da1
  Bay 2        Lnum N/A     [USED]         SN:MK0371YHJNYPTA      /dev/da4
  Bay 3        Lnum N/A     [USED]         SN:MK0361YHJNUGVD      /dev/da7
  Bay 4        Lnum N/A     [USED]         SN:MK0371YHJHG5UA      /dev/da10
  Bay 5        Lnum N/A     [USED]         SN:MK0361YHJJA39D      /dev/da2
  Bay 6        Lnum N/A     [USED]         SN:MK0361YHJ79T7D      /dev/da5
  Bay 7        Lnum N/A     [USED]         SN:MK0361YHJMT3PD      /dev/da8
  Bay 8        Lnum N/A     [USED]         SN:MK0371YHJM2MXA      /dev/da11
  Bay 9        Lnum N/A     [USED]         SN:MK0361YHJNT43D      /dev/da3
  Bay 10       Lnum N/A     [USED]         SN:MK0371YHJHHX2D      /dev/da6
  Bay 11       Lnum N/A     [USED]         SN:MK0371YHJM8KRA      /dev/da9
  Bay 12       Lnum N/A     [USED]         SN:MK0371YHJL20AA      /dev/da12
  Bay 13       Lnum N/A     [USED]         SN:MK0371YHJM8H9A      /dev/da19
  Bay 14       Lnum N/A     [USED]         SN:MK0371YHJP2KLA      /dev/da22
  Bay 15       Lnum N/A     [USED]         SN:MK0361YHJP85AD      /dev/da25
  Bay 16       Lnum N/A     [USED]         SN:MK0361YHJNMPKD      /dev/da28
  Bay 17       Lnum N/A     [USED]         SN:MK0361YHJJHGZD      /dev/da20
  Bay 18       Lnum N/A     [USED]         SN:MK0371YHJNYHYA      /dev/da23
  Bay 19       Lnum N/A     [USED]         SN:MK0361YHJN895D      /dev/da26
  Bay 20       Lnum N/A     [USED]         SN:MK0361YHJKTAJD      /dev/da29
  Bay 21       Lnum N/A     [USED]         SN:MK0371YHJJ6ZAD      /dev/da21
  Bay 22       Lnum N/A     [USED]         SN:MK0371YHJNYNXA      /dev/da24
  Bay 23       Lnum N/A     [USED]         SN:MK0371YHJNYJYA      /dev/da27
  Bay 24       Lnum N/A     [USED]         SN:MK0361YHJH5ESD      /dev/da30
  Bay 25       Lnum N/A     [USED]         SN:MK0371YHJKDLKD      /dev/da13
  Bay 26       Lnum N/A     [USED]         SN:MK0361YHJNPZ3D      /dev/da16
  Bay 27       Lnum N/A     [USED]         SN:PN2231P8HB0HVR      /dev/da31
  Bay 28       Lnum N/A     [USED]         SN:MK0371YHJNYH6A      /dev/da34
  Bay 29       Lnum N/A     [USED]         SN:MK0361YHJ960ZD      /dev/da14
  Bay 30       Lnum N/A     [USED]         SN:MK0371YHJJ8VZD      /dev/da17
  Bay 31       Lnum N/A     [USED]         SN:MK0371YHJK5K5A      /dev/da32
  Bay 32       Lnum N/A     [USED]         SN:MK0371YHJNYWUA      /dev/da35
  Bay 33       Lnum N/A     [USED]         SN:MK0371YHJMUTMA      /dev/da15
  Bay 34       Lnum N/A     [USED]         SN:MK0371YHJP0THA      /dev/da18
  Bay 35       Lnum N/A     [USED]         SN:MK0361YHJNX0GD      /dev/da33
  Bay 36       Lnum N/A     [USED]         SN:MK0371YHJNYMGA      /dev/da36


isi job status -v

Running jobs:
Job                        Impact Pri Policy     Phase Run Time
-------------------------- ------ --- ---------- ----- ----------
FlexProtect[166673]        Medium 1   MEDIUM     1/4   2:40:25
        Progress: Drives: 751 done, 4 in progress; last updated 25:11; Processed
        6500023 LINs and 4561431608423 KB; 1 ECC and 0 errors

Paused and waiting jobs:
Job                        Impact Pri Policy     Phase Run Time   State
-------------------------- ------ --- ---------- ----- ---------- -------------
FlexProtect[166674]        Medium 1   MEDIUM     1/4   0:00:00    Waiting
        Progress: n/a
SnapshotDelete[166668]     Medium 2   MEDIUM     1/1   0:00:00    Waiting
        Progress: n/a

Failed jobs:
Job                        Errors Run Time   End Time        Retries Left
-------------------------- ------ ---------- --------------- ------------
FlexProtect[166672]        1      3:05:29    06/26 03:40:15  2
        Progress: Processed 44340 lins; 0 zombies and 0 errors
        06/26 03:38:06 Node 1: 1:5086:a1b2: Input/output error
MultiScan[165607]          1      9:23:17    06/15 02:09:11  0
        Progress: Collect: 15307683 LINs, 0 errors; AutoBalance: 15307683 LINs,
        0 errors; 1 error total
        06/15 02:09:03 Node 6: Invalid argument
MediaScan[105329]          1      21:54:18   01/11 15:11:35  0
        Progress: Started
        01/11 15:11:35 Node 4: Devid 9 left group after MediaScan[105329]
        started; job must be retried: Operation canceled

Recent job results:
Time            Job                        Event
--------------- -------------------------- ------------------------------
06/25 11:19:54  SnapshotDelete[166664]     Succeeded (MEDIUM)
06/25 11:39:58  SnapshotDelete[166665]     Succeeded (MEDIUM)
06/25 11:49:57  SnapshotDelete[166666]     Succeeded (MEDIUM)
06/25 15:06:37  FlexProtect[166667]        Failed
06/25 18:13:09  FlexProtect[166669]        Failed
06/25 21:23:32  FlexProtect[166670]        Failed
06/26 00:32:44  FlexProtect[166671]        Failed
06/26 03:40:15  FlexProtect[166672]        Failed

Jean-Didier stefaniak

unread,
Jun 26, 2018, 6:54:18 PM6/26/18
to Isilon Technical User Group
Can you confirm the added node was unboxed by yourself ?

Siva SRK

unread,
Jun 26, 2018, 9:38:24 PM6/26/18
to isilon-u...@googlegroups.com
Hi Jean,

The node is one of decommission node (spares) not brand new and ready to join to cluster wizard when we power ON and see.
So added into cluster. After realization I tried to remove the node from cluster with smart fail , since it is added . Now Flex protect is keep failing .
Can you help me, if you have any information.
Thanks,
SRK.

On Tue, Jun 26, 2018 at 3:54 PM, Jean-Didier stefaniak <jeandomi...@gmail.com> wrote:
Can you confirm the added node was unboxed by yourself ?

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jean-Didier Stefaniak

unread,
Jun 27, 2018, 5:01:52 AM6/27/18
to isilon-u...@googlegroups.com
Okay so it sounds like there was still some data and membership from the cluster that mide was previously part of.

It should have been formated/reimaged prior to adding it to another cluster.

You need to open a case with Isilon Support as this is quite an edgy situation and it must be dealt with in a very controlled fashion which can not be attained over informal emails.

Kind Regards,
JD

On Wed 27 Jun 2018, 02:38 Siva SRK, <sikri...@gmail.com> wrote:
Hi Jean,

The node is one of decommission node (spares) not brand new and ready to join to cluster wizard when we power ON and see.
So added into cluster. After realization I tried to remove the node from cluster with smart fail , since it is added . Now Flex protect is keep failing .
Can you help me, if you have any information.
Thanks,
SRK.
On Tue, Jun 26, 2018 at 3:54 PM, Jean-Didier stefaniak <jeandomi...@gmail.com> wrote:
Can you confirm the added node was unboxed by yourself ?

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isilon-user-group/Kf9BSkLxShk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isilon-user-gr...@googlegroups.com.

Siva SRK

unread,
Jun 27, 2018, 10:43:03 AM6/27/18
to isilon-u...@googlegroups.com
Hi Jean/All,

Thanks for the information, I found like messages "The coordinator is not connected to all nodes."
Can you please check if you ran into this before and let me know ?


 isi job status -v

The coordinator is not connected to all nodes.
Unconnected nodes: 1, 2, 3, 5, 7, 9, 11, 12, 13, 14, 16, 17, 18, 19, 21, 22

No running jobs.

Paused and waiting jobs:
Job                        Impact Pri Policy     Phase Run Time   State
-------------------------- ------ --- ---------- ----- ---------- -------------
FlexProtect[166679]        Medium 1   MEDIUM     1/4   0:00:00    Waiting
        Progress: n/a
SnapshotDelete[166668]     Medium 2   MEDIUM     1/1   0:00:00    Waiting
        Progress: n/a

Failed jobs:
Job                        Errors Run Time   End Time        Retries Left
-------------------------- ------ ---------- --------------- ------------
FlexProtect[166678]        1      3:08:49    06/26 22:34:41  3
        Progress: Processed 68833 lins; 0 zombies and 0 errors
        06/26 22:32:31 Node 1: 1:5086:a1b2: Input/output error
MultiScan[165607]          1      9:23:17    06/15 02:09:11  0
        Progress: Collect: 15307683 LINs, 0 errors; AutoBalance: 15307683 LINs,
        0 errors; 1 error total
        06/15 02:09:03 Node 6: Invalid argument
MediaScan[105329]          1      21:54:18   01/11 15:11:35  0
        Progress: Started
        01/11 15:11:35 Node 4: Devid 9 left group after MediaScan[105329]
        started; job must be retried: Operation canceled

Recent job results:
Time            Job                        Event
--------------- -------------------------- ------------------------------
06/26 00:32:44  FlexProtect[166671]        Failed
06/26 03:40:15  FlexProtect[166672]        Failed
06/26 06:49:59  FlexProtect[166673]        Failed
06/26 09:58:46  FlexProtect[166674]        Failed
06/26 13:10:58  FlexProtect[166675]        Failed
06/26 16:18:05  FlexProtect[166676]        Failed
06/26 19:24:00  FlexProtect[166677]        Failed
06/26 22:34:41  FlexProtect[166678]        Failed


On Wed, Jun 27, 2018 at 2:01 AM, Jean-Didier Stefaniak <jeandomi...@gmail.com> wrote:
Okay so it sounds like there was still some data and membership from the cluster that mide was previously part of.

It should have been formated/reimaged prior to adding it to another cluster.

You need to open a case with Isilon Support as this is quite an edgy situation and it must be dealt with in a very controlled fashion which can not be attained over informal emails.

Kind Regards,
JD

On Wed 27 Jun 2018, 02:38 Siva SRK, <sikri...@gmail.com> wrote:
Hi Jean,

The node is one of decommission node (spares) not brand new and ready to join to cluster wizard when we power ON and see.
So added into cluster. After realization I tried to remove the node from cluster with smart fail , since it is added . Now Flex protect is keep failing .
Can you help me, if you have any information.
Thanks,
SRK.
On Tue, Jun 26, 2018 at 3:54 PM, Jean-Didier stefaniak <jeandomi...@gmail.com> wrote:
Can you confirm the added node was unboxed by yourself ?

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isilon-user-group/Kf9BSkLxShk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isilon-user-group+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-group+unsubscribe@googlegroups.com.

Jean-Didier Stefaniak

unread,
Jun 27, 2018, 2:14:14 PM6/27/18
to isilon-u...@googlegroups.com
Siva,

I do not know how to make it any clearer to you : you *must* sort out that node in your cluster before anything else related even remotely to the FS can be looked into.
Sorting out your cluster with that 'rogue' nide in it requires precise steps to be taken in order *not* to induce any kind of [silent] corruption to your OneFS Filesystem you and your user will pay direly later . Act now by opening a case with the Isilon Support..do not delay.

To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isilon-user-group/Kf9BSkLxShk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isilon-user-gr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isilon-user-gr...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Isilon Technical User Group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/isilon-user-group/Kf9BSkLxShk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isilon-user-gr...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages