FETCH one metric over a big time range vs many metrics over short time range

Mehran Shakeri

unread,

Oct 21, 2016, 9:27:47 AM10/21/16

to Warp 10 users

Recently I was trying to read data over one year for one metric and perform some interpolation and bucketizing. Considering 15 seconds time intervals for data logging, it was almost 365*24*60*4 over last year. My query was :

// fetching data
[ 'READ' 'metric.1.1' { } '2015-01-01T00:00:00' '2016-01-01T00:00:00' ] FETCH


// down sampling and interpolation
[ SWAP bucketizer.mean 0 0 10 ]  BUCKETIZE INTERPOLATE
// scaling
[ SWAP 10.0 mapper.mul 0 0 0 ] MAP
// sort ascending
SORT

I ran this on distributed version which 3 data nodes in Hadoop and one regional server in HBase. Only the FETCH part takes almost 4 seconds!

Then I tried to read all the metrics (466) with last 500 data points and it took 100 ms!

So I assume Warp10 data modeling (most probably in HBase layer) is in a way that the latest data points are considered more valuable. (Maybe the same reason that reading first data point is considerably slow).

Now my question is that is there any workaround or better solution to read my data for one metric faster? My current option would be storing down-sampled version of the same metric in another metric like cache and query over it.

Mathias Herberts

unread,

Oct 21, 2016, 9:46:59 AM10/21/16

to Warp 10 users

If your data is pushed at 15s intervals, a year worth of data contains 2,102,400 datapoints, so fetching that many datapoints in 4s leads to a read performance of 525600 datapoints per second, not a bad performance per se given your data might be residing on disk on your HDFS cluster.

There are never the less a few things you can check to attempt to speed up performance:

egress.hbase.data.blockcache.gts.threshold = XXX

Set this parameter to a value > 1 to ensure that the block cache is populated if you are fetching data for less than XXX series. Successive reads should then be faster as the blocks will already be in the block cache and the region server will not need to fetch data from HDFS.

Is your HDFS setup on bare metal machines or on VMs? I/Os on VMs are known to impact HDFS performance.

For fetching multiple series you can enable parallel scans:

// Number of threads to use for scheduling parallel scanners. Use 0 to disable parallel scanners

egress.hbase.parallelscanners.poolsize = 16

// Maximum number of parallel scanners per fetch request. Use 0 to disable parallel scanners.

egress.hbase.parallelscanners.maxinflightperrequest = 4

// Minimum number of GTS to assign to a parallel scanner. If the number of GTS to fetch is below this limit, no

// parallel scanners will be spawned. Defaults to 4.

egress.hbase.parallelscanners.min.gts.perscanner = 1

// Maximum number of parallel scanners to use when fetching datapoints for a batch of GTS (see EGRESS_FETCH_BATCHSIZE).

egress.hbase.parallelscanners.max.parallel.scanners = 4

With parallel scanning enabled we can saturate a 1Gbps uplink of a region server.

Also when fetching multiple series make sure you have the HBase filter deployed on your RS and enabled in egress.

What level of read performance are you seeking to achieve? An inmemory standalone might be better suited for some use cases depending on your target fetch time.

Mathias.

Mehran Shakeri

unread,

Oct 21, 2016, 11:11:35 AM10/21/16

to Warp 10 users

Hi Mathias,

This is new configuration.

egress.hbase.data.blockcache.gts.threshold = 1024

egress.hbase.parallelscanners.poolsize = 16

egress.hbase.parallelscanners.maxinflightperrequest = 8

egress.hbase.parallelscanners.min.gts.perscanner = 1

egress.hbase.parallelscanners.max.parallel.scanners = 16

HBase filter was already enabled and deployed. Hadoop cluster machines are Xenon CPU 3.3GH, SSD , 16GB RAM.

Now for one metric it takes almost 3 sec, but it doesn't become faster after first run. And for 3 metrics it goes to almost 9 sec.

My query is

[
  'READ'
  '~agent.8.[1,2,6]'
  {}
  //NOW -10
  '2014-01-01T00:00:00'
  '2017-01-01T00:00:00'
] FETCH
SIZE

So in case of cache I would expect to have the size as fast as possible for the second time. But it takes the same time as the first run.

I can see that allocated RAM is not freed so it must be cached (not sure HBase or Warp10). When I stop the Warp10 it frees almost 2GB RAM. But then why it takes again the same time to return the size, if the GTS is already in RAM?

Since this supposed to be cloud implementation, users will access remotely and adding network latency it should not exceed 1-2 seconds.

Current state is that with fast network connection we can display line chart over the last year for few hundreds of milliseconds. Our database is a customized file based database which stores only values and we do all the down sampling, interpolation and scaling in JAVA.

I'm not sure expecting that performance is compatible with using Hadoop stack and data model in HBase and Warp10.

Maybe it's better to ask, assuming 2M data points per metric, what is the best way to down sample it to 800, scaling times X and interpolate if necessary to be able to show it as line chart? This can be for one or several metrics.

Mathias Herberts

unread,

Oct 21, 2016, 11:26:38 AM10/21/16

to Warp 10 users

There are various ways to enhance performance by encoding the data differently.

Can you share privately the result of:

[ 'READ' '~agent.8.[1,2,6]' {}

'2014-01-01T00:00:00'
'2017-01-01T00:00:00'

] FETCH SNAPSHOT

I'll post a summary of possible optimizations.

Thanks,

Mathias.

Mathias Herberts

unread,

Oct 21, 2016, 6:24:19 PM10/21/16

to Warp 10 users

Back from the lab :-)

I've generated a 'packed' version of the GTS you sent me privately, in this packed GTS each value is an encoded form of a day worth of data (5760 datapoints) from the original GTS.

Fetching a year worth of data is then a matter of fetching 365 datapoints from this packed GTS, expand each value into the original datapoints and merge those expanded values into a single GTS.

On a test setup using a standalone Warp 10, this process leads to a performance of less than 1s for fetching a year worth of data (2188904) with an observed peak performance of 3M datapoints per second.

I'll post the scripts used to pack the original GTS and retrieve the original datapoints from the packed GTS.

This process can be automated so each day of data is packed after it is fully stored in Warp 10. On the fetch side, any fetch longer than 1 day can access the packed GTS and complete the result with the current day worth of data.

Would that be an acceptable performance?

Mathias.

Mehran Shakeri

unread,

Oct 22, 2016, 8:00:41 AM10/22/16

to Warp 10 users

Well, that sounds good enough.

What is "packed" GTS. Would you please give me some links to read more about it?

If it's flexible enough it can improve performance a lot. Like having 1h and 4h packed data for different resolutions.

Mathias Herberts

unread,

Oct 22, 2016, 8:48:53 AM10/22/16

to Warp 10 users

Packed GTS are GTS whose values are wrapped (see http://www.warp10.io/reference/functions/function_WRAP/) GTS themselves.

The wrapped sub-GTS take way more room than the original unwrapped version and therefore you can 'pack' lors more data in a GTS. The only constraint is to do a simple computation on the timerange to fetch so you retrieve the correct data that you can later MERGE and then TIMECLIP.

Mehran Shakeri

unread,

Oct 22, 2016, 10:23:10 AM10/22/16

to Warp 10 users

I think I'm missing something. This WRAP function manipulates data in memory. Can I also persist it or it's acting like a cache?

Is the process like FETCH once from persistence layer (time consuming part), then WRAP them and keep them in memory for farther queries? Or I can persist also this WRAP ed version so any time in the future I can just access them.

"any fetch longer than 1 day can access the packed GTS and complete the result with the current day worth of data."

Attaching current data to WARP ed ones also is done by me (WarpScript) or is handled by Warp10?

Another question would be even if it is an in memory functionality, once it is done and execution of WarpScript is finished, how can I keep it in memory to access it again from another WarpScript query? Is it possible at all?

Mathias Herberts

unread,

Oct 23, 2016, 6:30:34 AM10/23/16

to Warp 10 users

The enclosed macros will transform a GTS into a packed form (pack.mc2) and unpack a packed macro (unpack.mc2).

They work in memory but from within WarpScript you could call UPDATE after using pack to persist the packed GTS.

An advanced caching system can also be put together rather simply by extending WarpScript with UDFs to write and read to an external cache such as memcached.

Mathias.

pack.mc2

unpack.mc2

Mehran Shakeri

unread,

Oct 27, 2016, 9:22:29 AM10/27/16

to Warp 10 users

Hi Mathias,

Sorry for delay, I was a bit busy with another urgent task.

I tried your macros. I have to say I was impressed by result. Fetching packed GTS and Unpacking took around 220 ms (for ~1.2M data points) which is far better than before.

But I have to include macros in my WarpScript each time I want to read data for un/packing! I hope there is a way to keep these macros some where stored in Warp10 server so I just can call macro names when I need them. Am I right? I couldn't find anything in website.

Mathias Herberts

unread,

Oct 27, 2016, 9:47:52 AM10/27/16

to Warp 10 users

Hi Mehran,

the feature you are looking for is called macro repository, it is a directory on the server side (on standalone or egress nodes) in which you can drop macros. Those macros will then be available to all scripts.

To enable the macro repository simply configure the following parameters:

// Path to the macro repository root directory

warpscript.repository.directory=/opt/warp/macros

// Number of ms between rescan of the macro repository

warpscript.repository.refresh=60000

Then under the repository directory, create a 'gts' directory and put the pack and unapck mc2 files in it.

The macros will then be available via

@gts/pack and @gts/unpack

Mathias.

Mehran Shakeri

unread,

Oct 27, 2016, 9:59:19 AM10/27/16

to Warp 10 users

Thanks. It works perfect.

Reply all

Reply to author

Forward