power spike and then pause...

Tucker

unread,

Jun 13, 2013, 9:02:17 PM6/13/13

to stressappt...@googlegroups.com

I've implemented a ramfs for my machine stress testing and have been trying to use stressapptest as the second stage in the execution. For some reason, when stressapptest hits the first power spike, it stalls. There's nothing in dmesg, messages or mcelog that indicates the machine saw a problem. Is there a way to disable the power spikes, to see if it'll complete without them? I thought about just making the time between spikes longer than the total execution time but haven't had a chance to try that yet. For what it's worth, this is currently what I'm doing for burn in:

* dd if=/dev/zero on all disks

* create GPT labels with parted on all disks

* create 100% partitions on all disks

* mkfs ext4 on all disks

* run stressapptest like so:

/usr/bin/stressapptest -s 7200 -M 42420.2 -m 24 -W -C 24 -l /var/log/stressapptest.log --destructive -f /mnt/sdb1/sat1 -f /mnt/sdc1/sat1 -f /mnt/sdd1/sat1 -f /mnt/sde1/sat1 -f /mnt/sdg1/sat1 -f /mnt/sdf1/sat1 -f /mnt/sdh1/sat1 -f /mnt/sda1/sat1

The log ends with this:

Log: Seconds remaining: 6600

Log: Pausing worker threads in preparation for power spike (6600 seconds remaining)

Log: Seconds remaining: 6590

Log: Resuming worker threads to cause a power spike (6585 seconds remaining)

Beyond that, there is nothing.

--tucker

Nick Sanders

unread,

Jun 18, 2013, 4:52:39 AM6/18/13

to stressappt...@googlegroups.com

You can set --pause_delay 100000, to a value greater than total runtime, which will prevent any power spike from happening. Sorry, the interface isn't great for this.

Couple notes about the commandline:

-M 42420.2: this is integer only, so the .2 will be ignored. Probably doesn't make much difference.

--destructive: this is for the "-d" option only and won't be used.

-f /mnt/sdb1/sat1: I'd recommend also trying two file threads per disk, you may see an increase in IO.

If the test is just hung, it's also worth running gdb and getting a backtrace of each thread. I don't know offhand of any software bug that would cause this, but if there's one I'd like to fix it..

Thanks,

Nick

--tucker

--

---
You received this message because you are subscribed to the Google Groups "stressapptest-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tucker

unread,

Jun 28, 2013, 7:19:27 PM6/28/13

to stressappt...@googlegroups.com

Inline.

On Tuesday, June 18, 2013 1:52:39 AM UTC-7, Nick wrote:

You can set --pause_delay 100000, to a value greater than total runtime, which will prevent any power spike from happening. Sorry, the interface isn't great for this.

I thought about that but assumed I was just being dumb and that wouldn't be a silly idea. Thanks for confirming that this will work.

Couple notes about the commandline:

-M 42420.2: this is integer only, so the .2 will be ignored. Probably doesn't make much difference.

That's an artifact of the way the test command line is generated. I could force it to be an int but assumed it wouldn't make a difference.

--destructive: this is for the "-d" option only and won't be used.

This is an artifact of a life cycle of testing but good to know. I'll probably pull it out of the next iteration.

-f /mnt/sdb1/sat1: I'd recommend also trying two file threads per disk, you may see an increase in IO.

I'll give that a shot. Thanks for the suggestion.

If the test is just hung, it's also worth running gdb and getting a backtrace of each thread. I don't know offhand of any software bug that would cause this, but if there's one I'd like to fix it..

Since this is an initramfs image used for machine testing, it doesn't have any dev tools on it. Once I have my current bugs sorted out, I'll find some time to whip of a custom build and try and reproduce. We'll see if we can't get to the bottom of this.

Thanks,
Nick

On Thu, Jun 13, 2013 at 6:02 PM, Tucker <ju...@gmail.com> wrote:

I've implemented a ramfs for my machine stress testing and have been trying to use stressapptest as the second stage in the execution. For some reason, when stressapptest hits the first power spike, it stalls. There's nothing in dmesg, messages or mcelog that indicates the machine saw a problem. Is there a way to disable the power spikes, to see if it'll complete without them? I thought about just making the time between spikes longer than the total execution time but haven't had a chance to try that yet. For what it's worth, this is currently what I'm doing for burn in:

* dd if=/dev/zero on all disks
* create GPT labels with parted on all disks
* create 100% partitions on all disks
* mkfs ext4 on all disks
* run stressapptest like so:

/usr/bin/stressapptest -s 7200 -M 42420.2 -m 24 -W -C 24 -l /var/log/stressapptest.log --destructive -f /mnt/sdb1/sat1 -f /mnt/sdc1/sat1 -f /mnt/sdd1/sat1 -f /mnt/sde1/sat1 -f /mnt/sdg1/sat1 -f /mnt/sdf1/sat1 -f /mnt/sdh1/sat1 -f /mnt/sda1/sat1

The log ends with this:

Log: Seconds remaining: 6600
Log: Pausing worker threads in preparation for power spike (6600 seconds remaining)
Log: Seconds remaining: 6590

Log: Resuming worker threads to cause a power spike (6585 seconds remaining)

Beyond that, there is nothing.

--tucker

--

---
You received this message because you are subscribed to the Google Groups "stressapptest-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-discuss+unsub...@googlegroups.com.

Nikhil Dixit

unread,

Oct 9, 2015, 12:38:58 PM10/9/15

to stressapptest-discuss

Hi. I am trying to execute stressapptest on Ubuntu 14.04 and I am facing the same issue. It says resuming workers after power spike and then nothing happens after that. Here are the logs when I run with gdb. Any idea about why it is stuck?

To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-discuss+unsub...@googlegroups.com.

gdb-stressapptest.txt

Nick Sanders

unread,

Oct 9, 2015, 12:43:48 PM10/9/15

to stressappt...@googlegroups.com

From the stack trace it looks like a deadlock bug. Is it readily reproducible? I'll try it on my side but if you can build with debug info and get another backtrack hopefully it should be clear where the problem is.

To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

---
You received this message because you are subscribed to the Google Groups "stressapptest-discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stressapptest-di...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward