How to use GSAT for finding Disk controller issues and ECC level issues?

110 views
Skip to first unread message

anuj singh

unread,
Jul 5, 2018, 7:03:06 AM7/5/18
to stressapptest-discuss
Hi 

I went through GSAT documentation, and found that GSAT claims to find following types of issues:
·        catching memory signal integrity problems
·        memory controller and bus interface issues
·        disk controller issues
·        catching bad memory cells 
·        cache coherency issues
I executed few tests, but the only error I could see was of type miscompare. Can someone guide me how to find 'Disk controller related' and 'ECC level' issues. Also it will be be helpful if some sample logs are shared.

Thanks
Anuj




C


Nick Sanders

unread,
Jul 5, 2018, 3:58:29 PM7/5/18
to stressappt...@googlegroups.com
On Thu, Jul 5, 2018 at 4:03 AM anuj singh <anujsi...@gmail.com> wrote:
I executed few tests, but the only error I could see was of type miscompare. Can someone guide me how to find 'Disk controller related' and 'ECC level' issues. Also it will be be helpful if some sample logs are shared.

Disk controller related issues will usually show up as read/write failures, or data corruption. You'll need to do some custom diagnosis to figure this out as there's a wide variety of storage controller types with different failure mechanisms. 

ECC level failures should show up in system logs or elsewhere, wherever ECC errors would be reported on your system. Since this varies considerably between chipsets and reliability monitoring software, you'd need to check separately as part of your test procedure, or add a custom "os layer" module to stressapptest code to provide an interface into your particular system.

anuj singh

unread,
Jul 6, 2018, 6:41:23 AM7/6/18
to stressapptest-discuss
Hi Nick

Thanks for the response. 
So, if I understood it correctly, stressapptest lets us generate the stress, but does not provide any mechanism to analyze or classify the errors caused.
Logs too are generic (corruption, miscompare..), and we need to use some other way to analyze the errors.

Nick Sanders

unread,
Jul 6, 2018, 2:08:56 PM7/6/18
to stressappt...@googlegroups.com
On Fri, Jul 6, 2018 at 3:41 AM anuj singh <anujsi...@gmail.com> wrote:
Hi Nick

Thanks for the response. 
So, if I understood it correctly, stressapptest lets us generate the stress, but does not provide any mechanism to analyze or classify the errors caused.
Logs too are generic (corruption, miscompare..), and we need to use some other way to analyze the errors.
Yes, that's correct. Storage is too complex to provide a universal diagnosis/analysis. stressapptest is often useful for reproducing bugs quickly to debug issues and verify potential fixes though.
 

anuj singh

unread,
Jul 9, 2018, 1:56:03 AM7/9/18
to stressapptest-discuss
Thanks for clarifying.

Regards
Anuj

Maqsood Thaned

unread,
Aug 1, 2018, 4:55:33 AM8/1/18
to stressapptest-discuss
Hi Nick Can you tell me the another method to find the log file of stressapptest

Nick Sanders

unread,
Aug 1, 2018, 7:26:08 PM8/1/18
to stressappt...@googlegroups.com
dmesg and /var/log/messages will often have informative error messages for storage failures

--

---
You received this message because you are subscribed to the Google Groups "stressapptest-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages