Forcing views of head/tail/submaps?

14 views
Skip to first unread message

Alex Cruise

unread,
Jan 6, 2021, 9:25:29 PM1/6/21
to fast...@googlegroups.com
Hey folks,

I'm implementing a time series buffer, where values, keyed by timestamps, are appended (normally one at a time) and evicted (normally in blocks).

My current design uses a Long2DoubleRBTreeMap per projection. When time passes and I want to evict some old data, I'd like to say maps[i] = maps[i].tailMap(timestamp) but tailMap returns a view, which I can't append to the next time a value arrives.

I guess I could do something like this, but it feels likely to invite boxing:

val map = maps[i]
map.headMap(timestamp).foreach { map.remove(it) }
maps[i] = map

Any suggestions?

Thanks!

-0xe1a

Alex Cruise

unread,
Jan 6, 2021, 9:29:44 PM1/6/21
to fast...@googlegroups.com
I guess I can say map.headMap(timestamp).clear() ... it still creates at least one garbage object per operation, but likely not one per value in the collection. :)

-0xe1a

Sebastiano Vigna

unread,
Jan 7, 2021, 4:23:37 AM1/7/21
to fastutil
Il giorno giovedì 7 gennaio 2021 alle 02:29:44 UTC al...@cluonflux.com ha scritto:
I guess I can say map.headMap(timestamp).clear() ... it still creates at least one garbage object per operation, but likely not one per value in the collection. :)

Yes. The next release (you can use the Github version if you wanna try) has primitive streams, spliterators, forEach, etc. so even your first solution should work without boxing/unboxing.

But why do you need a dictionary for a time series? Are you interrogating by timestamp? 

Ciao,

seba 

Alex Cruise

unread,
Jan 7, 2021, 1:26:09 PM1/7/21
to fast...@googlegroups.com
On Thu, Jan 7, 2021 at 1:23 AM Sebastiano Vigna <sebastia...@gmail.com> wrote:
Yes. The next release (you can use the Github version if you wanna try) has primitive streams, spliterators, forEach, etc. so even your first solution should work without boxing/unboxing.

Nice!
 
But why do you need a dictionary for a time series? Are you interrogating by timestamp? 

My windowing semantics are stepless, and strictly time-based, not count-based, so I need record-at-a-time eviction.

-0xe1a

Sebastiano Vigna

unread,
Jan 7, 2021, 1:28:39 PM1/7/21
to fastutil
Il giorno giovedì 7 gennaio 2021 alle 18:26:09 UTC al...@cluonflux.com ha scritto:
 
But why do you need a dictionary for a time series? Are you interrogating by timestamp? 

My windowing semantics are stepless, and strictly time-based, not count-based, so I need record-at-a-time eviction.



Still not fully understanding. You seem to be evicting everything up to a timestamp. If you keep everything in a linked list, you can do the same eviction with less work. Unless you use other kinds of dictionary-based accesses. 

Alex Cruise

unread,
Jan 7, 2021, 2:43:22 PM1/7/21
to fast...@googlegroups.com
On Thu, Jan 7, 2021 at 10:28 AM Sebastiano Vigna <sebastia...@gmail.com> wrote:
Still not fully understanding. You seem to be evicting everything up to a timestamp. If you keep everything in a linked list, you can do the same eviction with less work. Unless you use other kinds of dictionary-based accesses. 

Thanks for your questions!

My job configuration supports multiple window sizes at the same time, and I want to run all the windows over the same buffer. I evict based on the largest window size (i.e. headMap(maxWindowEarliest).clear()), then take time-based submaps for the smaller windows, i.e. tailMap(eachWindowEarliest).

The subsetting is presumably accelerated by using a sorted map, although there will certainly be a memory/compute tradeoff compared to using a simpler data structure, depending on the window size. At the moment I'm trying to optimize for heap usage, because this is per-session state in a streaming job, and we want to carry a lot of active sessions on each node when we go live.

I guess in the degenerate case when there's only 1-2 window sizes in a job, I might be better off with 1-2 linked lists. 🤔

btw I've heard rumours over the years that linked lists don't always suffer from memory locality issues, largely thanks to thread-local allocators... But in my case, since I'm running in Apache Flink, and there are typically multiple seconds between appends, I think it's pretty unlikely that the same JVM thread will be creating the buffer as appending to it, and I would be suffering from memory bandwidth issues if I used linked lists.

It's really easy to burn a lot of time on these micro-optimizations... I call it a Dangerously Fun Engineering Problem :)

-0xe1a

--
You received this message because you are subscribed to the Google Groups "fastutil" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fastutil+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fastutil/5c330dab-4c28-4b62-9f3a-101d8fcd4ed3n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages