Ok, something I still need to mention, and some remarks.
Speaking about a DNS 320L with 256mB of memory.
- First, I rebooted the box.
and made immediately the system log. see log "1. Mon Feb 16 20:24:15."
There are few _nand errors there.Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0
Feb 16 20:21:19 terra user.info kernel: md: bind<sda2>
Feb 16 20:21:19 terra user.info kernel: Adding 524284k swap on /dev/sdb1. Priority:1 extents:1 across:524284k
Feb 16 20:21:19 terra user.notice root: Starting sslcert: OK.
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0
Alt-F folder is not there, and I see this lineFeb 16 20:21:19 terra user.info kernel: NAND device: Manufacturer ID: 0xad, Chip ID: 0xf1 (Hynix H27U1G8F2BTR-BC), 128MiB, page size: 2048, OOB size: 64which surprises me, as the box should have 256 Mb of memory.
On Monday, February 16, 2015 at 8:25:27 PM UTC, Erik J wrote:
Ok, something I still need to mention, and some remarks.
Speaking about a DNS 320L with 256mB of memory.
- First, I rebooted the box.
and made immediately the system log. see log "1. Mon Feb 16 20:24:15."
There are few _nand errors there.Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0
Feb 16 20:21:19 terra user.info kernel: md: bind<sda2>
Feb 16 20:21:19 terra user.info kernel: Adding 524284k swap on /dev/sdb1. Priority:1 extents:1 across:524284k
Feb 16 20:21:19 terra user.notice root: Starting sslcert: OK.
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0please perform the tests referred to on this post
El lunes, 16 de febrero de 2015, 22:49:51 (UTC+1), João Cardoso escribió:
On Monday, February 16, 2015 at 8:25:27 PM UTC, Erik J wrote:
Ok, something I still need to mention, and some remarks.
Speaking about a DNS 320L with 256mB of memory.
- First, I rebooted the box.
and made immediately the system log. see log "1. Mon Feb 16 20:24:15."
There are few _nand errors there.Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0
Feb 16 20:21:19 terra user.info kernel: md: bind<sda2>
Feb 16 20:21:19 terra user.info kernel: Adding 524284k swap on /dev/sdb1. Priority:1 extents:1 across:524284k
Feb 16 20:21:19 terra user.notice root: Starting sslcert: OK.
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0please perform the tests referred to on this post
See attached file.
Looks quite ok to me in general.
Feb 16 20:21:19 terra user.err kernel: __nand_correct_data: uncorrectable ECC error<3>end_request: I/O error, dev mtdblock0, sector 0
[root@DNS-325]# nanddump -f /tmp/mtd0-dump /dev/mtd0ECC failed: 0ECC corrected: 0Number of bad blocks: 0Number of bbt blocks: 0Block size 131072, page size 2048, OOB size 64Dumping data starting at 0x00000000 and ending at 0x00100000...
See attached file, yes lots of ECC: 8 uncorrectable bitflip(s) at offset xxx errors.
looks terribly bad......:((Yes... I don't have any advise for you.Your chances are:-leave it as is, as you can still boot the box, and hope that no further errors develop-flash D-Link fw back...-reflash u-boot. But for that you would need a know good u-boot image from an identical box.
ok, so what happens if
- leave it as it is, not really an option i think, the Alt-f system is deteriorating. After last reboot, no more ssh again, lost settings in transmission, no alt-f folder to be found, what will be next. It is not possible to move the mtd0 location to an error free zone?
- flashing D-Link back, which can be done through the Alt-F interface as I remember well, but does D-Link use that part of the memory also, and as a result, will I have problems again?
Then I can try to claim warranty. But, memory problems are difficult to proof..
I prefer option 3, but the most difficult, (I am that way), so a request:
- Anybody who can help me with a u-boot image for a DNS 320L A3 version??
João, do you have any link at hand how to make that image and how to flash it?
nanddump -f filename /dev/mtd0 # dumps all mtd0 contents to file named filename
nanddump -l size -f filename /dev/mtd0 # dumps size bytes of mtd0 to file named filename
[root@DNS-325]# nanddump -f dns-325-A1-mtd0-dump.bin /dev/mtd0
ECC failed: 0ECC corrected: 0Number of bad blocks: 0Number of bbt blocks: 0Block size 131072, page size 2048, OOB size 64Dumping data starting at 0x00000000 and ending at 0x00100000...
[root@DNS-325]# ls -l dns-325-A1-mtd0-dump.bin-rw-r--r-- 1 root root 1048576 Feb 18 16:25 dns-325-A1-mtd0-dump.bin
[root@dns-320l]# nanddump -f dns-320l-A1-mtd0-dump.bin /dev/mtd0ECC failed: 1024
ECC corrected: 0Number of bad blocks: 0Number of bbt blocks: 0Block size 131072, page size 2048, OOB size 64Dumping data starting at 0x00000000 and ending at 0x00100000...
ECC: 8 uncorrectable bitflip(s) at offset 0x000a0000ECC: 8 uncorrectable bitflip(s) at offset 0x000a0800...ECC: 8 uncorrectable bitflip(s) at offset 0x000bf000ECC: 8 uncorrectable bitflip(s) at offset 0x000bf800
[root@dns-320l]# l dns-320l-A1-mtd0-dump.bin-rw-r--r-- 1 root root 1048576 Feb 18 16:28 dns-320l-A1-mtd0-dump.bin
ECC: 8 uncorrectable bitflip(s) at offset 0x00000000
[root@dns-320l]# nanddump -l 524272 -f dns-320l-A1-mtd0-dump.bin /dev/mtd0
ECC failed: 0ECC corrected: 0Number of bad blocks: 0Number of bbt blocks: 0Block size 131072, page size 2048, OOB size 64
Dumping data starting at 0x00000000 and ending at 0x0007fff0...
See attached file, yes lots of ECC: 8 uncorrectable bitflip(s) at offset xxx errors.
looks terribly bad......:((Yes... I don't have any advise for you.Your chances are:-leave it as is, as you can still boot the box, and hope that no further errors develop-flash D-Link fw back...-reflash u-boot. But for that you would need a know good u-boot image from an identical box.
ok, so what happens if
- leave it as it is, not really an option i think, the Alt-f system is deteriorating. After last reboot, no more ssh again, lost settings in transmission, no alt-f folder to be found, what will be next. It is not possible to move the mtd0 location to an error free zone?No. The bootloader has to reside in a fixed memory location -- that's where the processor starts executing programs instructions at power up.Also, NAND chips have a special robust area intended specifically for the bootloader.
- flashing D-Link back, which can be done through the Alt-F interface as I remember well, but does D-Link use that part of the memory also, and as a result, will I have problems again?Probably yes. All firmware, be it D-Link, Alt-F, whatever, needs the bootloader to start. The bootloader is the first program that starts on a computer; then the bootloader starts an operating system. That is what happens on all computers, be it a PC, Mac, the DNS, a toaster...Then I can try to claim warranty. But, memory problems are difficult to proof..But D-Link has a back-door on all its NAS, the funplug script that allows running ffp and other programs. That's something that we all must thanks to D-Link. So, if you flash back d-link fw and install ffp and if mtd-utils is available for ffp, then you have an argument to claim warranty... if you want to go that way.
I prefer option 3, but the most difficult, (I am that way), so a request:Much more difficult and subject to errors and incompatibilities.Touching u-boot is out of my zone of comfort. Notice that I didn't even recommended running nandtest on it...
- Anybody who can help me with a u-boot image for a DNS 320L A3 version??
I can supply mine for a DNS-320L-A1. I have reasons to believe that all rev-Ax boards are identical. But having one for the rev-A3 is safer.
João, do you have any link at hand how to make that image and how to flash it?I reached my level of competence on this subject. I have a good understanding on the subject, but the gory details can make the difference.
Summarizing and concluding: I think that your box is dying, but I can't recommend doing something that I'm not completely sure of (and I have not yet addressed the u-boot nand writing procedure). If I had similar issues in my box I would research that for a few more days and I would eventually flash u-boot. Notice that flashing u-boot from within u-boot itself poses me no problems, but that requires a serial adapter, which you don't have.Luck!
El miércoles, 18 de febrero de 2015, 19:07:35 (UTC+1), João Cardoso escribió:See attached file, yes lots of ECC: 8 uncorrectable bitflip(s) at offset xxx errors.
looks terribly bad......:((
nanddump -f mtd0-dump.bin /dev/mtd0 2> mtd0-dump.log
nanddump -ocf mtd0-dump.hex /dev/mtd0
From the u-boot start message, it looks like that its size is 524272 bytes, and if I only dump that amount I get no errors:
quite out of my range yes. But I cannot see why it " looks like that its size is 524272 bytes "
On Thursday, February 19, 2015 at 6:43:52 AM UTC, Erik J wrote:
El miércoles, 18 de febrero de 2015, 19:07:35 (UTC+1), João Cardoso escribió:See attached file, yes lots of ECC: 8 uncorrectable bitflip(s) at offset xxx errors.
looks terribly bad......:((Yes, but what is odd is that the other flash partitions read/test without any issues. Only mtd0, the u-boot partition, shows that errors.And I doubt that they are really errors, because if they were errors the system wouldn't even boot (u-boot, the bootloader, would have errors on it and wouldn't execute correctly).I have made some further research (I just can't let it go, that's why I can't be happy :-), and found, e.g., this post. I have checked the value for the 320L for the linux kernel version that Alt-F is using and it is 40, big enough. But some other hardware uses 25 or 30 or 35 (25 is the general value for the kirkwood SOC that the box is using). And the used value (40) is used for the whole flash, not only for mtd0, so if it were a timing error it would affect all flash partitions. Conflicting information, another hypothesis is needed.I also found that there are several ways to use ECC (Error Correction Code) on NANDS. It comes in several (incompatible) flavours and can be implemented in software of hardware.I think that to be the reason why on my system does mtd0 also shows errors. I discovered that the errors appears only on a zone of the flash memory used by u-boot to save variables, and I think that the ECC algorithm used by u-boot is not identical to the one used by the mtd-utils and the linux kernel.In your case, however, the errors spread around the whole mtd0.Can you please run the following commands and attach the generated mtd0-dump.bin, mtd0-dump.log, and mtd0-dump.hex files? It could also be helpful if other DNS-320L users could execute the commands and post (attaching) the files.
nanddump -f mtd0-dump.bin /dev/mtd0 2> mtd0-dump.log
nanddump -ocf mtd0-dump.hex /dev/mtd0
...
El sábado, 21 de febrero de 2015, 19:40:00 (UTC+1), João Cardoso escribió:
On Thursday, February 19, 2015 at 6:43:52 AM UTC, Erik J wrote:
El miércoles, 18 de febrero de 2015, 19:07:35 (UTC+1), João Cardoso escribió:See attached file, yes lots of ECC: 8 uncorrectable bitflip(s) at offset xxx errors.
looks terribly bad......:((Yes, but what is odd is that the other flash partitions read/test without any issues. Only mtd0, the u-boot partition, shows that errors.And I doubt that they are really errors, because if they were errors the system wouldn't even boot (u-boot, the bootloader, would have errors on it and wouldn't execute correctly).I have made some further research (I just can't let it go, that's why I can't be happy :-), and found, e.g., this post. I have checked the value for the 320L for the linux kernel version that Alt-F is using and it is 40, big enough. But some other hardware uses 25 or 30 or 35 (25 is the general value for the kirkwood SOC that the box is using). And the used value (40) is used for the whole flash, not only for mtd0, so if it were a timing error it would affect all flash partitions. Conflicting information, another hypothesis is needed.I also found that there are several ways to use ECC (Error Correction Code) on NANDS. It comes in several (incompatible) flavours and can be implemented in software of hardware.I think that to be the reason why on my system does mtd0 also shows errors. I discovered that the errors appears only on a zone of the flash memory used by u-boot to save variables, and I think that the ECC algorithm used by u-boot is not identical to the one used by the mtd-utils and the linux kernel.In your case, however, the errors spread around the whole mtd0.Can you please run the following commands and attach the generated mtd0-dump.bin, mtd0-dump.log, and mtd0-dump.hex files? It could also be helpful if other DNS-320L users could execute the commands and post (attaching) the files.
nanddump -f mtd0-dump.bin /dev/mtd0 2> mtd0-dump.log
nanddump -ocf mtd0-dump.hex /dev/mtd0
...with pleasure!
nanddump -f /tmp/mtd0-dump.bin /dev/mtd0 2> /tmp/mtd0-dump.log
nanddump -ocf /tmp/mtd0-dump.hex /dev/mtd0 2> /dev/null
mtdinfo -M /dev/mtd0 > /tmp/mtd0-info.txt
tar -czf /tmp/mtd0.tgz /tmp/mtd0-*
rm /tmp/mtd0-*
After running the above commands I tried to shutdown the box through the webinterface. To make a picture of the memory, see attachment.
The shutdown procedure did not work, while the temperature was correct (remind a mentioned problem ago). Even the front power button did not turn if off. Leds kept blinking in a pace like 3 times a second. I unplugged the cord and took some pictures. Here is the memory.... looks cheap. (but what did I expect, cheap box)
The shutdown procedure did not work, while the temperature was correct (remind a mentioned problem ago). Even the front power button did not turn if off. Leds kept blinking in a pace like 3 times a second. I unplugged the cord and took some pictures. Here is the memory.... looks cheap. (but what did I expect, cheap box)Yes, cheap box, cheap disk holding frame and cheap thermal design.But according to your previous 'nandtest' results there was no issues regarding the area in the flash chip that holds the kernel and rootfs, so unless there is a RAM memory issue there is no explanation for that other issues.
In this post I'm addressing *only* the "__nand_correct_data: uncorrectable ECC error" subject, and that comes from the part of the flash chip regarding the box boot loader (u-boot).Again, I think that your box is ill or dying. Remove (and keep save) the disks and flash the D-Link fw back...
Hello João
Back again. I have the d-link firmware running again on my box. (so slow, and web interface has to go full screen on my netbook to show it all, it is a drag...)
The funplug is installed, but in all the repo's there is no mtd-utils available, so I can run the nanddump and check the state of the mtd0 memory partition. I know it is getting a little bit off-topic, but do you know of any possibilities to run nanddump. Do I have to compile it?
Or can it be run stand-alone. Just would like to know if the errors are still there. Almost certain yes, but also a way to claim warranty. I hope....
Thanks again, and don burn out. Take it easy!
...Hello João
Back again. I have the d-link firmware running again on my box. (so slow, and web interface has to go full screen on my netbook to show it all, it is a drag...)
The funplug is installed, but in all the repo's there is no mtd-utils available, so I can run the nanddump and check the state of the mtd0 memory partition. I know it is getting a little bit off-topic, but do you know of any possibilities to run nanddump. Do I have to compile it?If it is not available as an ffp package, yes, you have to compile it...
You might search for it also under 'optware', there are thousands of packages, not sure if if exists for your box.I have analyzed your last files posted, and I can confirm that yours and my mtd0 are identical with the exception of 3 bytes.These three bytes can cause no problems if they lie in an area of the bootloader code or data that is not used in the normal boot sequence. And that seems to be your case, as you can boot without issues.
Warning: technical content bellow (read: dragons flying bellow)
El domingo, 8 de marzo de 2015, 19:57:26 (UTC+1), João Cardoso escribió:...Hello João
Back again. I have the d-link firmware running again on my box. (so slow, and web interface has to go full screen on my netbook to show it all, it is a drag...)
The funplug is installed, but in all the repo's there is no mtd-utils available, so I can run the nanddump and check the state of the mtd0 memory partition. I know it is getting a little bit off-topic, but do you know of any possibilities to run nanddump. Do I have to compile it?If it is not available as an ffp package, yes, you have to compile it...
But, thanks to "mijzelf" on the ffp forum, it is already compiled for me. No time now, but soon I will dive into it.
http://downloads.zyxel.nas-central.org/Users/Mijzelf/FFP-Stick/packages/0.7/arm/testing/You might search for it also under 'optware', there are thousands of packages, not sure if if exists for your box.I have analyzed your last files posted, and I can confirm that yours and my mtd0 are identical with the exception of 3 bytes.These three bytes can cause no problems if they lie in an area of the bootloader code or data that is not used in the normal boot sequence. And that seems to be your case, as you can boot without issues.
Yes, booting yes. And remember, the _nand errors are few just after boot, and many more, till it reaches its maximum, after running for a few minutes.
Would it be an infringement on privacy if I, or you, harvest other owners of the DNS 320L from this forum, and ask them to run the test as you "prescribed"?
Really fed up with the D-Link firmware already, just noticed that my entire usb drive was open and public on the ftp server. (luckily it was only on a big intranet, guifi.net)
Warning: technical content bellow (read: dragons flying bellow)
Very technical, but for sure, you made a few people happy with this explanation. (and just a thought, maybe there is a fault in the nanddump program, and in cannot handle this kind of ECC. As you said before:I also found that there are several ways to use ECC (Error Correction Code) on NANDS. It comes in several (incompatible) flavours and can be implemented in software of hardware.
[root@dns-320l]# mtdinfo -M /dev/mtd0mtd0Name: u-bootType: nandEraseblock size: 131072 bytes, 128.0 KiBAmount of eraseblocks: 8 (1048576 bytes, 1024.0 KiB)Minimum input/output unit size: 2048 bytesSub-page size: 512 bytesOOB size: 64 bytesCharacter device major/minor: 90:0Bad blocks are allowed: trueDevice is writable: trueEraseblock map:0: 00000000 1: 00020000 2: 00040000 3: 000600004: 00080000 5: 000a0000 6: 000c0000 7: 000e0000
[root@dns-320l]# mtdinfo -M /dev/mtd3mtd3Name: imageType: nandEraseblock size: 131072 bytes, 128.0 KiBAmount of eraseblocks: 800 (104857600 bytes, 100.0 MiB)Minimum input/output unit size: 2048 bytesSub-page size: 512 bytesOOB size: 64 bytesCharacter device major/minor: 90:6Bad blocks are allowed: trueDevice is writable: trueEraseblock map:0: 00000000 1: 00020000 2: 00040000 3: 000600004: 00080000 5: 000a0000 6: 000c0000 7: 000e00008: 00100000 9: 00120000 10: 00140000 11: 00160000...124: 00f80000 125: 00fa0000 126: 00fc0000 127: 00fe0000128: 01000000 BAD 129: 01020000 130: 01040000 131: 01060000132: 01080000 133: 010a0000 134: 010c0000 135: 010e0000
On Monday, March 9, 2015 at 5:11:05 PM UTC, Erik J wrote:
El domingo, 8 de marzo de 2015, 19:57:26 (UTC+1), João Cardoso escribió:...Hello João
Back again. I have the d-link firmware running again on my box. (so slow, and web interface has to go full screen on my netbook to show it all, it is a drag...)
The funplug is installed, but in all the repo's there is no mtd-utils available, so I can run the nanddump and check the state of the mtd0 memory partition. I know it is getting a little bit off-topic, but do you know of any possibilities to run nanddump. Do I have to compile it?If it is not available as an ffp package, yes, you have to compile it...
But, thanks to "mijzelf" on the ffp forum, it is already compiled for me. No time now, but soon I will dive into it.
http://downloads.zyxel.nas-central.org/Users/Mijzelf/FFP-Stick/packages/0.7/arm/testing/You might search for it also under 'optware', there are thousands of packages, not sure if if exists for your box.I have analyzed your last files posted, and I can confirm that yours and my mtd0 are identical with the exception of 3 bytes.These three bytes can cause no problems if they lie in an area of the bootloader code or data that is not used in the normal boot sequence. And that seems to be your case, as you can boot without issues.
Yes, booting yes. And remember, the _nand errors are few just after boot, and many more, till it reaches its maximum, after running for a few minutes.Yes, but the bootloader role was already accomplished, and it is not relevant anymore after boot starts. What must be happening is that linux mtd driver must be checking the whole flash chip, searching for bad blocks, and during that check the errors appear.Would it be an infringement on privacy if I, or you, harvest other owners of the DNS 320L from this forum, and ask them to run the test as you "prescribed"?I don't think there is any problem, as there is no user-related data in that flash area, so feel free to ask for users collaboration.
The only place where user data is stored on the DNS-320/325 is in the mtd5 flash partition, that Alt-F and D-Link uses to save "settings".
Really fed up with the D-Link firmware already, just noticed that my entire usb drive was open and public on the ftp server. (luckily it was only on a big intranet, guifi.net)That's the result of "automagically" doing things. That's easier for the user, that hasn't to configure anything, but one never knows what the consequences are.I try to avoid that kind of automagic, but I'm aware that under Alt-F at least the NFS server exports all filesystem mount points as shares when no user defined share is defined. It's a leftover from the ffp nfs server...Warning: technical content bellow (read: dragons flying bellow)
Very technical, but for sure, you made a few people happy with this explanation. (and just a thought, maybe there is a fault in the nanddump program, and in cannot handle this kind of ECC. As you said before:I also found that there are several ways to use ECC (Error Correction Code) on NANDS. It comes in several (incompatible) flavours and can be implemented in software of hardware.Possible, but not very probable, as it works for me.The standard says that the ECC "algorithm" is specified in the flash chip itself, and it is retrievable through some specific commands. But users (read board manufacturers) are not obliged to follow standards ;-)
That is another reason for me to not feel comfortable flashing the bootloader: when writing to the flash chip, bad blocks (which naturally develop) are detected, marked as bad, and skipped. Up to a point, where the whole erase block is marked as bad. While a data block has 2KiB, an erase block has 128KiB. When an erase block is marked as bad the next erase block will be used. If the flash "partition" is small, that can make that the newly erase block belongs to the next "partition", ruining the system.This is not very likely to happens, as the initial portion of a flash chip (where typically the bootloader lies), is more rugged and guaranteed by the chip manufacturer to be free of defects. But nonetheless...My box has no bad erase blocks in mtd0, but has one in mtd3:
nanddump -f /tmp/mtd0-dump.bin /dev/mtd0 2> /tmp/mtd0-dump.log
nanddump -ocf /tmp/mtd0-dump.hex /dev/mtd0 2> /dev/null
mtdinfo -M /dev/mtd0 > /tmp/mtd0-info.txt
tar -czf /tmp/mtd0.tgz /tmp/mtd0-*
rm /tmp/mtd0-*
Hello again.
Just FIY, and maybe curiosity. Attached another bin and hex files. This time from a 320L-A2. No nand problems as I can see.
The result of writing 10 owners of a 320L-box.
Greetings.
Greetings.
Hello.
It was me who started this thread, and tell you quickly that the box is still running. Only weekends though. So it is not having a hard working life.
All the problems i had before did not return, who knows why. I dont.
But for now, i gave this problem a rest as i didnot experience the old problems.
Thanks for following up!
--
You received this message because you are subscribed to a topic in the Google Groups "Alt-F" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alt-f/IcV6XOAmEPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to alt-f+un...@googlegroups.com.
Visit this group at https://groups.google.com/group/alt-f.
For more options, visit https://groups.google.com/d/optout.