Alt-F 0.1RC5 on DNS320L: Buffer I/O error on dev mtdblock0 (pls help!)

Daniele Strollo

unread,

Oct 25, 2016, 10:41:31 AM10/25/16

to Alt-F

I have problems with the Alt-F 0.1RC5 on DNS320L rev A3.

But I donno how to solve this. Please can someone help me?

Thanks.

Daniele

Here the logs

> dmesg | grep error | grep -v "uncorrectable ECC error"

Buffer I/O error on dev mtdblock0, logical block 0, async page read

Buffer I/O error on dev mtdblock0, logical block 66, async page read

blk_update_request: I/O error, dev mtdblock0, sector 248

blk_update_request: I/O error, dev mtdblock0, sector 0

blk_update_request: I/O error, dev mtdblock0, sector 8

blk_update_request: I/O error, dev mtdblock0, sector 16

blk_update_request: I/O error, dev mtdblock0, sector 24

blk_update_request: I/O error, dev mtdblock0, sector 32

blk_update_request: I/O error, dev mtdblock0, sector 40

blk_update_request: I/O error, dev mtdblock0, sector 48

blk_update_request: I/O error, dev mtdblock0, sector 56

blk_update_request: I/O error, dev mtdblock0, sector 64

Buffer I/O error on dev mtdblock0, logical block 0, async page read

Buffer I/O error on dev mtdblock0, logical block 66, async page read

blk_update_request: I/O error, dev mtdblock0, sector 248

blk_update_request: I/O error, dev mtdblock0, sector 0

blk_update_request: I/O error, dev mtdblock0, sector 8

blk_update_request: I/O error, dev mtdblock0, sector 16

blk_update_request: I/O error, dev mtdblock0, sector 24

blk_update_request: I/O error, dev mtdblock0, sector 32

blk_update_request: I/O error, dev mtdblock0, sector 40

blk_update_request: I/O error, dev mtdblock0, sector 48

blk_update_request: I/O error, dev mtdblock0, sector 56

blk_update_request: I/O error, dev mtdblock0, sector 64

Buffer I/O error on dev mtdblock0, logical block 0, async page read

Buffer I/O error on dev mtdblock0, logical block 66, async page read

> nandtest /dev/mtd1 -k

ECC corrections: 0

ECC failures : 0

Bad blocks : 0

BBT blocks : 0

004e0000: checking...

Finished pass 1 successfully

> nandtest /dev/mtd5 -k

ECC corrections: 0

ECC failures : 1

Bad blocks : 0

BBT blocks : 0

004e0000: checking...

Finished pass 1 successfully

> nandtest /dev/mtd6 -k

ECC corrections: 5

ECC failures : 3

Bad blocks : 0

BBT blocks : 0

001e0000: checking...

Finished pass 1 successfully

João Cardoso

unread,

Oct 25, 2016, 11:32:16 AM10/25/16

to Alt-F

On Tuesday, 25 October 2016 15:41:31 UTC+1, Daniele Strollo wrote:

I have problems with the Alt-F 0.1RC5 on DNS320L rev A3.
But I donno how to solve this. Please can someone help me?

There is a recent topic addressing that/related issue, see https://groups.google.com/d/msg/alt-f/xBULQIdEax8/UMqZv0HsAgAJ

Thanks.
Daniele

Here the logs

> dmesg | grep error | grep -v "uncorrectable ECC error"

And the errors appear before or after your nand* usage?

To make the errors appear in context attach (not inline) the full dmesg output after a reboot.

In any case, there is nothing we can do.

mtd0/mtdblock0 is where the box bootloader resides, errors on it means that the box will not boot or, if the errors are sporadic at boot time, it might only boot after several poweron/cut attempts.

mtd0/mtdblock0 is not used by Alt-F. The errors can be related with the linux kernel (errors related to mtd0) or some filesystem such as jffs2 or filesystem utility such as blkid (errors related to mtdblock0) reading/scanning the flash chip to determine its bad block map or filesystem signature. A 'nanddump' on mtd0 also deploy errors on my system

Daniele Strollo

unread,

Oct 25, 2016, 12:00:21 PM10/25/16

to al...@googlegroups.com

At first thanks for your support.

Now I have attached the full dmesg after a restart.

These errors appear at every reboot.

Apparently the system seems to work since it completely starts with no problem but these lines appear at every reboot.

Second question, don't know if it is related...

I discovered weird behaviour on files shared on samba.

I had some pictures and accessing them from the laptop they seem to be corrupted (displayed with strange line bands corrupting colors).

After this I was a bit scared and tried to pass all files to an external HD via usb and the pictures where ok so it was probably a problem on sync of the two hard discs?

That's why I tried to check the log messages.

I hope to have explained the problem in a understandable manner :D

Daniele

dmesg.log

João Cardoso

unread,

Oct 25, 2016, 1:44:24 PM10/25/16

to Alt-F

On Tuesday, 25 October 2016 17:00:21 UTC+1, Daniele Strollo wrote:

At first thanks for your support.

Now I have attached the full dmesg after a restart.

OK. I hope you have meanwhile read the other topic I refer to, regarding uncorrectable ecc errors.

These errors appear at every reboot.
Apparently the system seems to work since it completely starts with no problem but these lines appear at every reboot.

Besides what I said in the other topic regarding the nand flash chip bitflips errors, read-perturbation, etc, it is also possible that those mtdblockX errors appears because some programs tries to discover what is contained on all block devices (usually disk partitions), to determine if they are RAID setups, filesystems, etc.

It happens that those programs consult a file named /proc/partitions that contains the list of all known block devices in the box, and it happens that the nand flash ship partitions also (legitimately) appear in that file, so those programs tries to read the flash chip partitions besides the disk partitions.

On a nand-flash chip with borderline specs, that might trigger those uncorrectable ecc errors.

Second question, don't know if it is related...
I discovered weird behaviour on files shared on samba.
I had some pictures and accessing them from the laptop they seem to be corrupted (displayed with strange line bands corrupting colors).

You (everybody!) should read the "WARNING: Network data corruption on RC5" top-posted topic and apply the fix.

Daniele Strollo

unread,

Oct 26, 2016, 4:13:56 AM10/26/16

to Alt-F

Besides what I said in the other topic regarding the nand flash chip bitflips errors, read-perturbation, etc, it is also possible that those mtdblockX errors appears because some programs tries to discover what is contained on all block devices (usually disk partitions), to determine if they are RAID setups, filesystems, etc.
It happens that those programs consult a file named /proc/partitions that contains the list of all known block devices in the box, and it happens that the nand flash ship partitions also (legitimately) appear in that file, so those programs tries to read the flash chip partitions besides the disk partitions.
On a nand-flash chip with borderline specs, that might trigger those uncorrectable ecc errors.

There is a way to check this?

Second question, don't know if it is related...
I discovered weird behaviour on files shared on samba.
I had some pictures and accessing them from the laptop they seem to be corrupted (displayed with strange line bands corrupting colors).

You (everybody!) should read the "WARNING: Network data corruption on RC5" top-posted topic and apply the fix.

Sorry for that I didn't realize that it was related to my problem since I supposed it was related to a non working sync between the two disks in raid1.

Now I applied the patch.

THANKS again a lot for your time and for your patience.

Daniele

João Cardoso

unread,

Oct 26, 2016, 12:24:13 PM10/26/16

to Alt-F

On Wednesday, 26 October 2016 09:13:56 UTC+1, Daniele Strollo wrote:

Besides what I said in the other topic regarding the nand flash chip bitflips errors, read-perturbation, etc, it is also possible that those mtdblockX errors appears because some programs tries to discover what is contained on all block devices (usually disk partitions), to determine if they are RAID setups, filesystems, etc.
It happens that those programs consult a file named /proc/partitions that contains the list of all known block devices in the box, and it happens that the nand flash ship partitions also (legitimately) appear in that file, so those programs tries to read the flash chip partitions besides the disk partitions.
On a nand-flash chip with borderline specs, that might trigger those uncorrectable ecc errors.

There is a way to check this?

I see two interpretations for your question:

1-verifying that the nand chip has marginal specs can only be verified (out of an electronic lab) from the fact that some users report the error while others don't, and that some users have issues in some nand areas while other users report errors on other nand areas.

The DNS-325/320-rev-A/320L all have nand chips, use the same linux kernel and its Alt-F handling is identical, and only errors on the 320L have been reported. Notice also that the 320-rev-B board seems to be identical to the 320L-rev-A.

I own a 325 myself, no errors, its nand chip maker is Samsung, while for the 320L the maker is Hynix.

So all clues point to a marginal specs nand flash chip on the 320L

2-verifying my hypothesis that what is deploying the error is the scanning of all block devices by some programs, that can be verified.

Issue the following commands

logger -t ME1 "XXXX" # adds an entry to syslog, verify that using
logread | tail
blkid  # issue one of the commands that scan all block devices by reading /proc/partitions
logread | sed -n '/ME1/,$p' # verify that/if the errors appears after your syslog marker
logger -t ME2 "YYYY" # adds another marker entry to syslog
mdadm --examine --scan --config=partitions # other command that scans all block devices
logread | sed -n '/ME2/,$p' # verify the generated errors after new new marker

Additionally, when "settings" are first loaded at bootup or when they are saved, cleared, etc, a full JFFS2 filesystem "check" is performed on mtd5, and that can also deploy the errors (in my case in mtd6!)

Notice that not always the errors appears on the second or subsequent time that you issue the commands.

So, there is nothing that we can do about it, the errors are erratic and seems to be harmless (which is very odd and puzzles me -- being harmless).

Daniele Strollo

unread,

Oct 27, 2016, 4:34:29 AM10/27/16

to Alt-F

Yes these commands give the errors you said.

So it's not a real problem i have to care about?

Thanks again for the support!!!

The logs are following

Oct 27 10:27:11 raid user.notice ME1: STEP1--