Most Recent Value for Many Points Efficiently

28 views
Skip to first unread message

Steven Harding

unread,
Nov 19, 2019, 3:23:42 PM11/19/19
to Warp 10 users
We are loving Warp10 as a historian (it fixes many of the issues that were persisting in InfluxDB), but I'm curious about something.

Our product tends to make 'wide' queries - queries about many (ie. thousands) of individual points, but for relatively short time frames. This seems to be OK for history requests, which are irregular and from a user-experience perspective are allowed to take some time, but we also like to keep up with 'live' (or relatively recent) data, which usually means polling.

Is there any efficient way (via WarpScript or the Fetch endpoint) of asking for the most recent value across every series?

I'm aware that I could do something along the line of...

{
'token' $token
'selectors' [ '.*{}' ]
'end' NOW
'count' 1
}
FETCH

...but I was wondering if there was anything more efficient (ie. does the server happen to cache the most recent value of every series, and if so can I obtain it directly from the in-memory cache?) A 'FETCHLATEST' perhaps?

Many industrial historians (Wonderware / OSI PI) include this in-memory current value database to help prevent load on the database for services that are interested in polling for only the most recent information.

Alternatively, we might include a Redis database along-side our Warp10, so that Redis provides 'live' data and Warp10 provides historical information. Perhaps we can explore an extension to Warp10 to automatically send values inserted into the current timestamp to Redis (unless anyone can point me to someone who has already done this, of course!)

Mathias Herberts

unread,
Nov 20, 2019, 5:18:35 AM11/20/19
to Warp 10 users
Hi Steven,

the syntax you mentioned for FETCH is the one to use to retrieve the latest datapoint of a set of series.

Depending on the number of series and their spread across the key space, the storage layer may have to open quite a number of files thus possibly increasing the retrieval time. There are some caching mechanisms for both the standalone (LevelDB) and distributed (HBase) versions but they cannot do magic. One way to improve retrieval time when fetching from many series is to enable parallel scanners in the configuration.

If that does not lead to sufficient performance, maybe an in-memory instance of Warp 10 could act as a fast cache in front of your current set up. The datalog replication mechanism could be set up so updates done to your cache are forwarded to your current Warp 10 instance. An in-memory instance keeps chunks of series of a given span, so not exactly what you want if you are just interested in the most recent datapoint for each regardless of when that was.

Tell us more, maybe we could extend the existing in-memory version to only retain the most recent datapoint. We would also love to learn more how you use Warp 10, could our marketing manager contact you directly?

Pierre

unread,
Dec 9, 2019, 11:25:02 AM12/9/19
to Warp 10 users
Hi Steven,

In a project, I manage a cache myself with SHM. Instead of calling the update endpoint, I call the exec endpoint with data inside a WarpScript.
SHM allow you to store any object in memory. See this article.


Steven Le Roux

unread,
Feb 4, 2020, 10:56:23 AM2/4/20
to Warp 10 users
Something that cross my mind and could work but really depends on your workload...

If you're dealing with a reasonable volume of data, you could update the attributes of a series each time you have a datapoint to be pushed. Depending on how you push, you could also keep the most recent value per series in a buffer and flush them regularly. This way, you could get the most recent value from the index directly without fetching any data.

Keep in mind that it could be very stressful for your index/directory depending on the volume.
Reply all
Reply to author
Forward
0 new messages