Problem running long endurance load test using bzt

571 views
Skip to first unread message

Jimmi Kristensen

unread,
Oct 7, 2016, 7:30:14 AM10/7/16
to codename-taurus
Hi,

Im trying to run a durability load test over 12 hours sending test data to blazemeter web interface. After some time it stops sending data and the graph never receives data. The last example I ran, the graph ended after 8 hours.
What I see is that the kpi.jtl becomes quite large (400 MB) and the bzt.log keeps writing this:

[DEBUG Engine.consolidator] Processed datapoint: 1475837284/None
[DEBUG Engine.consolidator] Merging into 1475837285
[DEBUG Engine.consolidator] Merging 1475837285
[DEBUG Engine.consolidator] Processed datapoint: 1475837285/None
[DEBUG Engine.consolidator] Merging into 1475837286
[DEBUG Engine.consolidator] Merging 1475837286

and the load test keeps writing (INFO: Changed data analysis delay to 61s) and seems to stall.

I tried letting it work for 15 hours, but still didn't send data to blazemeter web interface.
The server running the load test has:
CPUs: 4x Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
Memory: 16 GB

The server does not have high load and the cpu does not seem overloaded. It seems like it has problems handling the amount og test data it collects.

My config looks like this:

---
execution:
- distributed:
  - slave1
  - slave2
  - slave3
  - slave4
  concurrency: 60
  hold-for: 12h
  write-xml-jtl: error
  scenario: load-test

scenarios:
  load-test:
    default-address: http://xxxxxx.com
    headers:
      Content-Type: application/json
      Accept: application/json
    requests:
    - url: '/tokens'
      method: POST
      body:
         '{
             "credential": {
                 "key": "username",
                 "secret": "password"
             }
          }'
      label: 'Endurance Test'

reporting:
- module: blazemeter
  report-name: Endurance Test
  test: Test

modules:
  console:
    disable: true
  blazemeter:
    token: xxxxxxxxxxxxxxxx
    data-address: https://data.blazemeter.com
    browser-open: none

    send-interval: 30s
    timeout: 5s
    artifact-upload-size-limit: 5

Dmitri Pribysh

unread,
Oct 19, 2016, 5:28:30 AM10/19/16
to Jimmi Kristensen, codename-taurus
Hi, Jimmi.

Sorry for the delay, it took quite a bit of time to reproduce and
investigate this.

From what i can tell, this was a combination of multiple issues:
- HTTP requests to Blazemeter (to upload real-time test stats) take
longer time as test passes (0.5-1s at the start, 3-4s after a few hours,
sometimes going up to 10s)
- Stats reader that reads raw stats from kpi.jtl uses dynamic speed
adjustment algorithm (so it's able to adapt to fast growing kpi.jtl for
very intense load tests) without upper limit for speed.

So my guess is that at some point when one of HTTP requests to
Blazemeter took too long, kpi.jtl has grown substantially. This made
stats reader extract more and more data per iteration. And at some point
it got stuck in a loop, where it has read, say, 20mb of data in one
iteration, then took 10 seconds to aggregate and process it, and while
it was going on, another 30mb of stats got written to kpi.jtl. And so
read speed was always increasing, and it was taking more and more time
to aggregate all stats that were read from kpi.jtl in one iteration.

Long story short, i've set the upper limit for the read speed, which
should fix the issue. I'm afraid i can't do anything with HTTP request
timing, though, so you may want to increase `settings.check-interval` to
something like 10s for long tests.
I've tried to leave a very intense test on my computer overnight for a
few times and Taurus seems to survive that just fine.

Can you try to install Taurus from a snapshot (download
http://gettaurus.org/snapshots/bzt-1.7.2.974.tar.gz, and do `pip install
bzt-1.7.2.974.tar.gz`) and run your test again?


Dmitri
> --
> You received this message because you are subscribed to the Google
> Groups "codename-taurus" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to codename-taur...@googlegroups.com
> <mailto:codename-taur...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/codename-taurus/9be7da72-3a53-45d6-81b9-c29929d73aee%40googlegroups.com
> <https://groups.google.com/d/msgid/codename-taurus/9be7da72-3a53-45d6-81b9-c29929d73aee%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

Jimmi Kristensen

unread,
Oct 19, 2016, 12:42:00 PM10/19/16
to codename-taurus, pic...@gmail.com
Hi Dmitri,

Thank you very much for your detailed description, I'll try the snapshot.

Meanwhile I have been running multiple other 12 hour tests, which all completed without any  problems. The differens is that those that ran all the way to the end, the application subjected to the load test responded in a timely manner (within 2-4 seconds).

The test that I have problems with, I was simulating a slow response from the application. This means that the application would respond within 25-50 seconds per request. For some reason it seems that the slow response from the application affects Taurus, maybe because of the blocked threads?

Andrey Pokhilko

unread,
Oct 19, 2016, 12:50:44 PM10/19/16
to codenam...@googlegroups.com

(I guess our resp times histogram is huuuge for this case...)


Andrey Pohilko
Chief Scientist
P: +7 (909) 631-21-69
BlazeMeter Inc.
To unsubscribe from this group and stop receiving emails from it, send an email to codename-taur...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/codename-taurus/5b57704b-5953-4700-9ce4-22ed3046dddb%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages