Gary A. Pope
B.Bus(ACC)
DIRECTOR
Alchester Business Systems
m:0408-994799
anytime.
p: 03-97626293 f: 03-97626293
e: g...@alchester.com.au
Why us? alchester.com.au/whyus.html
"We take care of
everything!"
Gary A. Pope
B.Bus(ACC)
DIRECTOR
Alchester Business Systems
m:0408-994799
anytime.
p: 03-97626293 f: 03-97626293
e: g...@alchester.com.au
Why us? alchester.com.au/whyus.html
"We take care of
everything!"
Now take the RAID-1 catchup scenario where a similar flaw exists. In a Raid-1 mirror situation, where one disk fails, the failed disk is replaced for automated CATCHUP (if a H/W raid). During that critcial time (could be hours for a 1Tb disk) when catchup is occuring, the initial phase of catchup will have copied the boot blocks and probabl;y most of the system to the target disk. Remember, you'rve just had a disk failure for reasons unknown. If the CAUSE of the failure was in any way motherboard/controller/power/environment related, there remains a seondary risk of consequentiall failure at this point in time. If catchup is NOT completed, and some incident causes the machine to REBOOT, unattended, and at the same time, the survivor RAID-1 disk is suddenly offline, there is a chance the system could boot from the half-recovered replacement disk. The end result is that the server could go live with users unknowingly running on half their data.
So the scenario becomes more contrived/unlikely but it's still there. Once you move to raid5/6 the problem goes away (although raid5 has its own data integrity issues).
Cheers,
Darren
Gary A. Pope
B.Bus(ACC)
DIRECTOR
Alchester Business Systems
m:0408-994799
anytime.
p: 03-97626293 f: 03-97626293
e: g...@alchester.com.au
Why us? alchester.com.au/whyus.html
"We take care of
everything!"
----- Original Message -----From: Darren
--
You received this message because you are subscribed to the Google Groups "mlug-au" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mlug-au/-/mGQv-etich4J.
Gary A. Pope
B.Bus(ACC)
DIRECTOR
Alchester Business Systems
m:0408-994799
anytime.
p: 03-97626293 f: 03-97626293
e: g...@alchester.com.au
Why us? alchester.com.au/whyus.html
"We take care of
everything!"
----- Original Message -----
From: DarrenSent: Thursday, July 26, 2012 9:03 PMSubject: [MLUG] Re: disk raid, clone or simple 24-hr copying can have consequential boot issues during unattended hardware failures. Design flaw.
Now take the RAID-1 catchup scenario where a similar flaw exists. In a Raid-1 mirror situation, where one disk fails, the failed disk is replaced for automated CATCHUP (if a H/W raid). During that critcial time (could be hours for a 1Tb disk) when catchup is occuring, the initial phase of catchup will have copied the boot blocks and probabl;y most of the system to the target disk. Remember, you'rve just had a disk failure for reasons unknown. If the CAUSE of the failure was in any way motherboard/controller/power/environment related, there remains a seondary risk of consequentiall failure at this point in time. If catchup is NOT completed, and some incident causes the machine to REBOOT, unattended, and at the same time, the survivor RAID-1 disk is suddenly offline, there is a chance the system could boot from the half-recovered replacement disk. The end result is that the server could go live with users unknowingly running on half their data.