Error message - "miscompare on hdUnknown"

523 views
Skip to first unread message

Nihir Parikh

unread,
Sep 12, 2013, 7:28:16 PM9/12/13
to stressappt...@googlegroups.com
Hello guys,
I would like to get some information on the errors below that we are seeing while running stressapptest with following parameters
 
command -

stressapptest -s 18000 -m 8 -C 8 -W  -d /dev/sda  -f /dev/sda1 --cc_test --listen -i 1 --stop_on_errors 2>&1>/tmp/stresstest.log &

errors -

Report Error: miscompare : hdUnknown : 1 : 20s

Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e408(0x0:DIMM Unknown): read:0x0b71076155555555, reread:0x0b71076155555555 expected:0x5555555555555555
Report Error: miscompare : hdUnknown : 1 : 20s
Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e410(0x0:DIMM Unknown): read:0xaaaaaaaa02e81ff5, reread:0xaaaaaaaa02e81ff5 expected:0xaaaaaaaaaaaaaaaa
Report Error: miscompare : hdUnknown : 1 : 20s
Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e430(0x0:DIMM Unknown): read:0xaaaaaaaa52213de7, reread:0xaaaaaaaa52213de7 expected:0xaaaaaaaaaaaaaaaa
Report Error: miscompare : hdUnknown : 1 : 20s
Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e438(0x0:DIMM Unknown): read:0x5555555555575555, reread:0x5555555555575555 expected:0x5555555555555555
Report Error: miscompare : hdUnknown : 1 : 20s
Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e558(0x0:DIMM Unknown): read:0x5555555500000000, reread:0x5555555500000000 expected:0x5555555555555555
Report Error: miscompare : hdUnknown : 1 : 20s
Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e578(0x0:DIMM Unknown): read:0x000000000013969d, reread:0x000000000013969d expected:0x5555555555555555
Report Error: miscompare : hdUnknown : 1 : 20s
Hardware Error: miscompare on hdUnknown at 0x7f7d9d66e590(0x0:DIMM Unknown): read:0xaaaaaaabaaaaaaaa, reread:0xaaaaaaabaaaaaaaa expected:0xaaaaaaaaaaaaaaaa
 
My question is - Is there anyway we can tell definitively whether this is memory or hard disk error? The hard disk test that this command runs, is it sequential or random?
 
Thanks a lot in advance.
 
 

Nick Sanders

unread,
Sep 12, 2013, 11:13:39 PM9/12/13
to stressappt...@googlegroups.com
Generally, the only way to find out for 100% sure what component has a failure is to swap out all the parts, and find which part continues to fail in a new machine.

"-d" is random access, and "-f" is more sequential.

However, in this case it may be that you are using "-d" and "-f" on the same device. "-d" is actively using all of /dev/sda, while "-f" is actively using /dev/sda1, and are probably overwriting each other's data. If sda is also your root partition that would also interfere with "-d". Luckily "-d" defaults to non-destructive, but it does require that it has exclusive access to the block device under test.

I'd recommend rerunning without "-d" and see if the issue still reproduces, if not your hardware is probably fine.


--
 
---
You received this message because you are subscribed to the Google Groups "stressapptest-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Saurin S

unread,
Sep 13, 2013, 12:50:08 AM9/13/13
to stressappt...@googlegroups.com
Hi Nick,

Commandline argument is passing -f /dev/sda1 that is the hard drive's first partition, i saw immediately after running this command it corrupts the partition and even though OS has mounted this partition it is unable to access that. i took the dump of the entire partition before and after running this test, starting from very 1st block all the data got changed after running the stressapptest.

What i am thinking is that since OS has mounted this partition it is trying to access somehow and stressapptest also trying to accessing the partition, in that case, is there any chance this type of error can come?

Ideally in -f parameter according to help description is it required to pass the file instead of the entire device? would you please more elaborate on that? should we use file instead of device? or that file can be empty file or required some data in that?

Thanks,
Saurin
To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-discuss+unsub...@googlegroups.com.

Nick Sanders

unread,
Sep 13, 2013, 2:41:25 AM9/13/13
to stressappt...@googlegroups.com
Yes, you can pass either a file or block device to "-f", and it is a read/write test that requires exclusive access to the file or partition under test, and will wipe a partition passed to it. If you want to test a block device that is currently mounted you would ideally use "-f /path/to/device/filename.1 -f /path/to/device/filename.2", which will run two file read/write threads on that device through the filesystem, and avoid overwriting other data. The threads will create the file themselves although the path should already exist. Two threads will put more load on the disk than just one, and seems to be the optimal amount to keep the disk fully busy.

If your OS was on /dev/sda1 then that definitely explains the failure you were seeing. stressapptest was reading/writing the disk directly in two aliased ways while the OS was reading/writing the same aliased locations through the filesystem layer. 


To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.

Nick Sanders

unread,
Sep 13, 2013, 2:48:52 AM9/13/13
to stressappt...@googlegroups.com
To clarify, say you run df -h and get:

rootfs                                                       1008M  934M   74M  93% /
devtmpfs                                                      1.9G  520K  1.9G   1% /dev
tmp                                                           1.9G  2.5M  1.9G   1% /tmp
/dev/sda1                                                      25G   17G  7.1G  70% /home

You could for example use "/home/username/file.1"
as a file that would run the test on the drive at /dev/sda

Nihir Parikh

unread,
Sep 13, 2013, 12:51:56 PM9/13/13
to stressappt...@googlegroups.com
Nick,
Thanks for the clarification. We will try running the test after removing -d option to see if the issue is reproducible or not. One more clarification - Even if we use -d and -f together without exclusive access there are chances that the test can finish without any errors, correct?
 
Thanks a lot again.

Nick Sanders

unread,
Sep 13, 2013, 1:36:36 PM9/13/13
to stressappt...@googlegroups.com
both -d and -f require exclusive access. So I believe -d and -f together without exclusive access would be expected to fail. I've personally found that -d hasn't been particularly effective at discovering hardware faults, so I'd recommend just running -f.


--

Nihir Parikh

unread,
Sep 13, 2013, 8:44:30 PM9/13/13
to stressappt...@googlegroups.com
Nick,
Appreciate your help and clarification.
 
Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-discuss+unsub...@googlegroups.com.

Vijay Kumar

unread,
Feb 27, 2021, 4:46:05 AM2/27/21
to stressapptest-discuss
if you are booting throw pxe then make sure the filename should be different for both system
Reply all
Reply to author
Forward
0 new messages