Youcan configure a free disk as a hot spare for all RAID groups within a single enclosure (NAS or expansion unit). Under normal circumstances, the enclosure spare disk is unused and does not store any data. When a disk in any RAID group fails, the spare disk automatically replaces the faulty disk.
I quickly got a new disk. The old disk was a WD Red 6TB WD60EFRX, the new one is a WD Red 6TB WD60AFEX. I hot-swapped the disks. According to the documentation, the new disk should be detected automatically, and the storage pool should automatically start rebuilding ("Rebuilding" state). But nothing happened.
I checked the UI, Storage & Snapshots tool. The storage pool was still in degraded state, but all four disks were now green and healthy. However, disk 3 was listed as "not a member" of the storage pool. When I selected to Manage the pool, I could do nothing. The only action that was not disabled was "Rebuild RAID Group", but when I tried that there were no free disks to add to the RAID group.
So the problem appeared to be that disk 3 had been detected and was in use, but still it was listed as "not a member" of the storage pool. No actions were available in the UI to fix the situation. Pulling out the disk and inserting it again did not change anything. Googling for help showed that others have encountered similar situations, but no solutions helped me.
For some reason the NAS did not correctly add the /dev/sdc3 disk partition to the storage pool. The disk had been correctly partitioned and the partitions formatted, and the other RAID arrays had apparently recovered, but not /dev/md1. Adding /dev/sdc3 manually to /dev/md1 fixed the problem.
One more thing: it looks like /etc/config/mdadm.conf and /etc/config/raidtab are missing. /etc/mdadm.conf and /etc/raidtab existed as symbolic links to the non-existent files. I'm not sure that they are needed, but as a precaution I created them. mdadm.conf is created like this:
I meet the same problem with my raid 1 (two ssd). When I restart my ts551, one ssd failed. The light is on and the system can read notice this disk, but raise "not a member" error when i try to rebuild raid 1. Neither can i erase it. Then I followed these instructions and it worked!
Thanks for your blog!!!
I had all the same issus as described in the post on QNAP 431P-1. One of the disks (disk 2) had failed, that the storage pool was in a "Degraded" state. Disk 2 shoud have been ok, SMART info all ok. Ran a scan for bad blocks, overnight. Came back ok & disk was green. However the raid would not rebuild & that there were no free disks to add to the RAID group. I would have followed the instructions here, but I was unable to configure SSH on in step 3 below - Unable to get a browser connection to enable SSH. By the time I followed steps 4 through 6 it had started rebuilding.
QNAP support helped to get my degraded RAID 5 array back available online, but didn't fix the replacement disk showing as not being a member.
Your information has been excellent, allowing me to add the disk back in as described.
Sincerely grateful for the clear description. All the best, Damian
I had the same issue with raid building. My Disk3 failed and I swapped it with a new drive. But the rebuilding is not happening. Now my second disk is already in a warning condition. So I am trying hard to get this raid rebuild with the Disk3.
I encountered the same problem as described above and executes all steps indicated in ssh window. All results are the same apart from the fact it concerns my sda disk instead of sdc as in the example. When trying to add the sda disk I am gettin the following error message:
mdadm --manage /dev/md1 --add /dev/sda3
mdadm: add new device failed for /dev/sda3 as 4: Invalid argument
Thx for the tip but I could never get the cap to mould the way I wanted.
I did it by using a straw with stickytape around it to reinforce; took off the face place to expose the locking arm; then juggled the straw around and pushed the arm in when it was unlocked.
Removing the cover voids your warranty tho:(
Totally saved me! I felt kind of strange sitting in my car at the Office Depot parking lot with a pack of multicolor Bic pens in my lap, a lighter in one hand, and an allen wrench with a purple pen cap on the end trying to melt it without making a mess. The lady that pulled up next to me looked over and gave a look of disgust. Ask me if I care. No sir! I was able to unlock the tray of a clients failed disk and replace it before they lost another disk! Thank you!
With the prevalence of cloud computing and online services, enterprises rely heavily on data centers to serve users. Uninterrupted and reliable data storage has become indispensable for providing stability and continuity of services. However, disk failure can occur without warning, which is why Preventive Maintenance and Predictive Maintenance are indispensable tools for every data center.
Preventive Maintenance prevents disk failure with the help of regular checks and replacement of the drive based on age or usage. The main challenge with this approach is the waste of valuable resources. Apart from the time spent in these maintenance activities, the drive could perhaps work effectively for a longer period of time. There is also a chance of drive failure occurring before the scheduled replacement.
On the other hand, predictive maintenance is a more individualized approach that maximizes the effectiveness of the drive. It predicts disk failure based on historical data of millions of data points collected over years. When a pre-failure pattern is detected by the AI algorithm, it is flagged as an unhealthy or at-risk drive. Data centers can immediately replace the drive. This is the idea behind the ULINK DA Drive Analyzer.
Predictive Maintenance takes advantage of advanced analytics and machine learning to increase reliability and reduce costly outages. It can prevent drive failures that would not be caught in time by common preventive maintenance schedules such as annual drive replacement.
Both Preventive Maintenance and Predictive Maintenance approaches have their merits. The best way forward is to use both the approaches to minimize failures as much as possible and use resources effectively.
I spent some time online, and there are a few posts on one website, where someone is suggesting that it is a firmware update that can cause the issue. The suggestion was to go back to previous versions, which I see no means to do. I do not understand the fan error, as it seems to be whirring away quite happily.
I may be wrong but I also understand that taking the usable HD from the old NAS, one cannot just plug it in to the new NAS, which means if I choose to buy a new NAS, or a NAS is beyond repair, one has to buy new and HAVE BACKUPS (which I have)
In my case the /tmp directory was getting filled up and I have discovered that this is implemented by QNAP as a ramdisk and has only 64mb size. There was an application using this temp space which was filling it up which causes the temperature and fan monitoring software to fail giving the warning that the fan is not spinning when in fact it is.
Have you tried removing the failed HD and seeing if it works with the 1 drive. I had one disk fail and the nas still worked just fine. Put a replacement in and the new disk was rebuilt. This will only be possible if they are setup as Raid 1 mirrored drives. I am not sure if you had the back plane renewed the existing disks would still work retaining your files as the serial no of the nas may have changed and it sees them as new disks! May be better to cut losses and get replacement. Good that you have backups.
Thanks very much for quick & informative reply.
I cannot access the section which allows me to alter the .tmp size i.e. autorun, not sure if there is any other way to access it?
I will try to see if I can download older firmware, but I am pretty sure there is something fundamentally wrong with the unit, losing faith in it.
Thanks for reply - I have tried to boot up with just one (working) disk installed - no luck
I bought a new disk and tried to re-build, it is set up as raid 1 - again would not have it, does not recognise 2nd disk old or new
I think I will probably just cut my losses
Emphasises the value of backups!!!
I was just looking at a 4 disk Raid 5 solution, but wondered if using QNAPs HBS Synch backup, one would just get copies to external drive as I do now ( i.e. Files in folders which one can just copy/edit etc. or whether they are in some special (to QNAP) format that one cannot access in Windows)
It looks as though you have plenty to go on here but I would recommend posting the issue on the Qnap forums to see if there are other solutions.
It seems too much of a coincidence that replacing a HD and a hardware failure happen at the same time.
Thanks for reply - that is exactly what I now think. I think the fan/circuit board has failed and therefore the system does not work. Just tough to decide best course of action. It is not easy getting questions answered, not sure where one learns more about NAS operation?
Based on Kommisar's post we gave it a shot today and it looks to be working. Disk 3 is on, and raid is rebuilding. If you are going to attempt this on your own boards, read and understand Kommisar's post so you can trace your own board. Verify your problem is identical before trying this solution. Here is what was needed for my specific scenario (disk 3 failure, eventually leading to no power).
That IC contains 2 mosfets. Tracing the pins looks like one feeds the 12V sata pins and the other feeds the 5V sata pins (at least for bay 3, other bays may need a different fix). Shorting the source and drain of each will bypass the mosfet and supply power permanently to the sata ports. One or both could be faulty, in my case it looks like only the 5V fix was needed but we did both. You could also just replace the IC with another 4957AGM, although it may still fail again (something caused it to fail originally).
3a8082e126