Warpscript preformance

46 views
Skip to first unread message

Fabien S.

unread,
Dec 9, 2020, 1:48:08 PM12/9/20
to Warp 10 users
Hello Warp10 Users !

First of all,  I'm sorry for my english.

I wonder about the performance of Warp10 and especially with WarpScipt.
I did a test on database access and calculation to check the performances.
Then I compare it with a Cassandra database where I store time series and a Java program.

The test takes 2 days of data, do a resampling, perform a filter operation between two values with 2 MAP.
I used the script from https://www.cerenit.fr/blog/timeseries-state-duration/ to to this test.
I used the configuration of Warp10 of the docker image with the ci suffix.

Here is the command which allows you to set up the data set on Warp10.
I generate a time serie ‘test’ of one year of data with a ramp.

echo "" | awk '
{
    # My system is in mico seconds.
    startTs = 1577836800000000 # Stat ts at 2020-01-01T00:00:00Z
    print startTs"// test{name=oneyeardata,signal=voltage} 0.0"
    
    oneDayInSec = 24 * 60 * 60
    oneYearInSec = 365 * oneDayInSec

    # We insert 4 ramps in one day
    for (nbSec = 1; nbSec < oneYearInSec; nbSec++)
    {
        print "=" startTs + nbSec * 1000000 "// " nbSec % (oneDayInSec / 4) ".0"
    }
}' | curl -v -H 'X-Warp10-Token: writeTokenCI' -H 'Transfer-Encoding: chunked' -X POST --data-binary @- "http://localhost:8080/api/v0/update"



Here is the WarpScript to do the calculation

// This script is based on the script found here : https://www.cerenit.fr/blog/timeseries-state-duration/
 
// Storing the token into a variable
'readTokenCI' 'token' STORE
$token AUTHENTICATE
2147483647 LIMIT
2147483647 MAXBUCKETS
 
1577836800 1 s * 'start' STORE // Number of second form epoch to the 2020-01-01T00:00:00Z
86400 1 s * 'one_day' STORE
 
 
// Step 1 - Start the FIND with the token as first parameter
[
    $token
    // Here you must put the classname and label selectors...
    'test'
    { 'signal' 'voltage' 'name' 'oneyeardata' }
    $start 2 d +
    $start
] FETCH  0 GET
 
// Step 2 - Resampling
[ SWAP bucketizer.last 0 10000 0 ] BUCKETIZE 0 GET
 
DUP // Keep data we fetch and work on a clone of these data.
 
// Step 3 - Apply the filter with MAP function
// Filter values from 20 to 2000
[ SWAP 20.0 mapper.ge 0 0 0 ] MAP
[ SWAP 2000.0 mapper.lt 0 0 0 ] MAP

 

To execute the WarpScript with WarpStudio:
  • Step 1 - 300ms to fetch 2 days of data
  • Step 2 - 900ms to resampling
  • Step 3 - 6 seconds to filter.
To do the same thing in Cassandra + Java, I got:
  • Step 1 - 600ms - Cassandra server is on the same machine as the java code is running
  • Step 2 - 131ms to resampling
  • Step 3 - 1ms to filter.

I did the test with my laptop :
  • i7-8650U 8gth Generation
  • 32 Gb ram
  • SSD


I think I didn’t do a good job with the WarpScript.
Is it possible to validate these result with me ?
In your point of view, is my WarpScript is ok ?
Is the configuration of Warp10 is optimal for this kind of job ?

Or maybe it is not the good way to do the job.
Is it better to have more and smaller jobs to process ?


Thank you in advance for your answers or comments :-D
Best regards,
Fabien.

Fabien S.

unread,
Dec 10, 2020, 5:46:52 AM12/10/20
to Warp 10 users
Hello, in fact it was a problem in [ SWAP bucketizer.last 0 1000 0 ] BUCKETIZE 0 GET.
My plateform is in nano seconds. 
It means I need to change the sampling rate :

[ SWAP bucketizer.last 0 1 s 0 ] BUCKETIZE 0 GET

Then it works like a charm:
  • 300ms to fetch data
  • 140ms to BUCKETIZE
  • 90ms to filter

Thanks to read all the messages :-D

Best regards,
Fabien

Pierre

unread,
Dec 10, 2020, 7:10:48 AM12/10/20
to Warp 10 users
Hello,

That's the purpose of 's' 'ms' functions, very good you find out the problem.
You might save a few ms more replacing both MAP function with just a > or < operator. It does work on GTS objects too and should be faster.
// Step 3 - Apply the filter with MAP function
// Filter values from 20 to 2000
'compare' CHRONOSTART
20.0 >=  
2000.0 <=
'compare' CHRONOEND

CHRONOSTATS

Operators on GTS are quite usefull for some simple operations where the full map or reduce frameworks are not mandatory.
CHRONO functions are usefull too to test several implementations.

Fabien S.

unread,
Dec 11, 2020, 4:48:31 AM12/11/20
to Warp 10 users
Ohh, so nice the CHRONOSTATS

I update metric here :
  • 300ms to fetch data
  • 49ms to BUCKETIZE
  • 10ms to filter
For future references, here the WarpScript to run to have the metrics.


// This script is based on the script found here : https://www.cerenit.fr/blog/timeseries-state-duration/
// Updated with the help of Pierre in Warp10 Google group : https://groups.google.com/g/warp10-users/c/9VAm4Dvflo0
// Storing the token into a variable
'readTokenCI' 'token' STORE
$token AUTHENTICATE
2147483647 MAXBUCKETS


'2020-01-01T00:00:00Z' TOTIMESTAMP 'start' STORE // Number of second form epoch to the 2020-01-01T00:00:00Z
86400 1 s * 'one_day' STORE


// Step 1 - Start the FIND with the token as first parameter
'signalFech' CHRONOSTART
[
$token
'test' // Class and label selectors
{ 'signal' 'voltage' 'name' 'oneyeardata' }
$start 2 d + // end of the query
$start // Start of the query
] FETCH 0 GET
'signalRaw' STORE
'signalFech' CHRONOEND

// Step 2 - Resampling
'signalResampling' CHRONOSTART
$signalRaw
[ SWAP bucketizer.last 0 1 s 0 ] BUCKETIZE 0 GET
'signal' STORE
'signalResampling' CHRONOEND

// Step 3 - Apply the filter with operator
// Filter values from 20 to 100
'compare' CHRONOSTART
$signal
20.0 >=
2000.0 <=
'compare' CHRONOEND

'compareWithMap' CHRONOSTART
$signal
// Step 3bis - Apply the filter with MAP function
// Filter values from 20 to 2000
[ SWAP 20.0 mapper.ge 0 0 0 ] MAP
[ SWAP 200.0 mapper.lt 0 0 0 ] MAP
'compareWithMap' CHRONOEND

CHRONOSTATS



It gives :
{
   "signalFech": {"total_calls": 1,"total_time": 329401215 },
   "signalResampling": { "total_calls": 1, "total_time": 42694317},
   "compare": {"total_calls": 1,"total_time":10475490},
   "compareWithMap": {"total_calls": 1,"total_time":107274887}
}

Reply all
Reply to author
Forward
0 new messages