"high performance" MRU-list ...

Andreas Leitgeb

unread,

Jan 21, 2015, 9:49:30 AM1/21/15

to

In a script I'm processing some loooong sequence of items, and each
item contains (among other stuff) a number.

I need to keep track of the last 4 MRU (most recently used) numbers.

e.g.
MRU: 1 2 3 4 (from oldest to newest)
incoming: 2
MRU: 1 3 4 2 (moved to top)
incoming: 5
MRU: 3 4 2 5 (bottom removed, new added to top)

What would be best (in terms of efficiency) approach?

I'm *not* asking for a plain implementation, but for tricks to
improve throughput beyond the obvious [lsearch,lreplace,lappend]
and ensuring refcount==1 for the lists before modifications.

The number of actual retrievals from the MRU will be dwarved by the
number of items passing through it. It wouldn't matter if the MRU
structure casually grows beyond 4 items if that helps performance,
but it must not have duplicates among the relevant last 4 items.

PS: my script parses IBM Java's "portable heap dump"

Christian Gollwitzer

unread,

Jan 21, 2015, 11:13:54 AM1/21/15

to

Am 21.01.15 um 15:48 schrieb Andreas Leitgeb:

> In a script I'm processing some loooong sequence of items, and each
> item contains (among other stuff) a number.
>
> I need to keep track of the last 4 MRU (most recently used) numbers.
>
> e.g.
> MRU: 1 2 3 4 (from oldest to newest)
> incoming: 2
> MRU: 1 3 4 2 (moved to top)
> incoming: 5
> MRU: 3 4 2 5 (bottom removed, new added to top)
>
> What would be best (in terms of efficiency) approach?
>
> I'm *not* asking for a plain implementation, but for tricks to
> improve throughput beyond the obvious [lsearch,lreplace,lappend]
> and ensuring refcount==1 for the lists before modifications.

How about using a dict as an order-preserving unique set? A simple
implementation would be:

(chris) 52 % proc pushmru {v} {
global mru
dict unset mru $v
dict set mru $v {}
if {[dict size $mru]>4} { dict unset mru [lindex [dict keys $mru] 0] }
puts "MRU: [dict keys $mru]"
}
(chris) 53 %
(chris) 53 %
(chris) 53 % set mru {}
(chris) 54 % pushmru 1
MRU: 1
(chris) 55 % pushmru 2
MRU: 1 2
(chris) 56 % pushmru 3
MRU: 1 2 3
(chris) 57 % pushmru 4

MRU: 1 2 3 4

(chris) 58 % pushmru 2

MRU: 1 3 4 2

(chris) 59 % pushmru 5

MRU: 3 4 2 5

>

> The number of actual retrievals from the MRU will be dwarved by the
> number of items passing through it. It wouldn't matter if the MRU
> structure casually grows beyond 4 items if that helps performance,
> but it must not have duplicates among the relevant last 4 items.

if there are not too many different elements, you can leave out that if
construct and save lots of accesses. Maybe you prune the MRU after 1000
processed items. Additional micro-optimization would be inlining that
procedure using alias or manually, to get rid of the global stuff. Then
you'll have just two dict operations per item.

Still I guess the gain to lists - if any - would be a small factor below
5. For larger MRU sizes it'll pay off eventually. Maybe if these small
integers are somehow from a restricted set, one could think about some
bitset trickery. Or using tcc to compile a simple C data structure.
Hypnotoad has some sets in C.

Christian

Christian Gollwitzer

unread,

Jan 21, 2015, 11:43:49 AM1/21/15

to

Am 21.01.15 um 17:13 schrieb Christian Gollwitzer:

> (chris) 52 % proc pushmru {v} {
> global mru
> dict unset mru $v
> dict set mru $v {}
> if {[dict size $mru]>4} { dict unset mru [lindex [dict keys $mru] 0] }
> puts "MRU: [dict keys $mru]"
> }

Actually, some timing with global etc. shows my fastest version is this:

interp alias {} pushmru {} apply {{v} {global mru; dict unset mru $v;
dict set mru $v {}}}

It takes ~1.5µs to process one item on my machine, which comes quite
close to optimal (being bound by the command dispatch in Tcl)

Christian

Christian Gollwitzer

unread,

Jan 22, 2015, 10:56:07 AM1/22/15

to

Hi Andreas,

have you tried it? Does it help?

Christian

Am 21.01.15 um 17:43 schrieb Christian Gollwitzer:

Alexandre Ferrieux

unread,

Jan 22, 2015, 11:33:17 AM1/22/15

to

On Thursday, January 22, 2015 at 4:56:07 PM UTC+1, Christian Gollwitzer wrote:
> Hi Andreas,
>
>
> have you tried it? Does it help?

I'm just a side observer, but I find it *beautiful*.
Thanks Christian !

-Alex

Andreas Leitgeb

unread,

Jan 22, 2015, 12:24:49 PM1/22/15

to

Christian Gollwitzer <auri...@gmx.de> wrote:
> have you tried it? Does it help?
> Christian
> Am 21.01.15 um 17:43 schrieb Christian Gollwitzer:
>> Am 21.01.15 um 17:13 schrieb Christian Gollwitzer:
>>> (chris) 52 % proc pushmru {v} {
>>> global mru
>>> dict unset mru $v
>>> dict set mru $v {}
>>> if {[dict size $mru]>4} { dict unset mru [lindex [dict keys $mru] 0] }
>>> puts "MRU: [dict keys $mru]"
>>> }

I've only now seen this reply. It looks good.
I didn't think of the order-preserving feature in dicts.

Reducing size seems a bit more complicated with dicts,
although still straightforward. I suppose, if I was
dealing with longer MRUs (like 100 items) then yours
would surely beat mine by lengths (mostly for avoiding
the lsearch), but for 4 items the difference probably(*)
won't be significant. (*): actual measurements fill follow.

After posting my question, I needed an immediate
solution to start with, and came up with this:

set C "c3 c2 c1 c0"
proc addMRU {it} {
global C; set idx [lsearch -integer $C $it]
if {$idx < 0} { set idx 0 };# if not there, then remove oldest instead
set C [lreplace $C[set C ""] $idx $idx]
lappend C $it
}
proc getMRU {idx} {
global C; lindex $C end-$idx
}

I start with an MRU of four dummy elements to avoid the special
cases for when it isn't yet filled, and then just make sure that
size stays the same. This may not be suitable for a general MRU-
implementation, but works for my task.

>> Actually, some timing with global etc. shows my fastest version is this:
>> interp alias {} pushmru {} apply {{v} {global mru; dict unset mru $v;
>> dict set mru $v {}}}

But you're cheating here, as you left out size-reduction. ;-)

>> Christian

Thanks very much. I suppose MRU implementations are generally
useful and may also help others lateron.

Christian Gollwitzer

unread,

Jan 22, 2015, 1:03:20 PM1/22/15

to

Am 22.01.15 um 18:24 schrieb Andreas Leitgeb:

> I've only now seen this reply. It looks good.
> I didn't think of the order-preserving feature in dicts.
>
> Reducing size seems a bit more complicated with dicts,
> although still straightforward. I suppose, if I was
> dealing with longer MRUs (like 100 items) then yours
> would surely beat mine by lengths (mostly for avoiding
> the lsearch), but for 4 items the difference probably(*)
> won't be significant. (*): actual measurements fill follow.

I agree but I think it may also be limited by the byte code interpreter
more than by actual work on the CPU. See the timings below.

> After posting my question, I needed an immediate
> solution to start with, and came up with this:

premature optimization? You still need to feed in the data from
somewhere. It'll be only significantly faster if you have a compiled
command that takes a list of items at once, instead of one by one. How
many items do you have?

> set C "c3 c2 c1 c0"
> proc addMRU {it} {
> global C; set idx [lsearch -integer $C $it]
> if {$idx < 0} { set idx 0 };# if not there, then remove oldest instead
> set C [lreplace $C[set C ""] $idx $idx]
> lappend C $it
> }
> proc getMRU {idx} {
> global C; lindex $C end-$idx
> }
>
> I start with an MRU of four dummy elements to avoid the special
> cases for when it isn't yet filled, and then just make sure that
> size stays the same. This may not be suitable for a general MRU-
> implementation, but works for my task.
>
>>> Actually, some timing with global etc. shows my fastest version is this:
>>> interp alias {} pushmru {} apply {{v} {global mru; dict unset mru $v;
>>> dict set mru $v {}}}
>
> But you're cheating here, as you left out size-reduction. ;-)
>

Yes, I left it out deliberately. The code was benchmarked with pushing
both 100,000 distinct (i.e. increasing) as well as identical items. It
turned out that it's not necessary to reduce the size. At the end, for
retrieval, you can just do lrange [dict keys] end-3 end and create a new
dict from these keys.

This is my timing code:
=============================== >8 ==========
# Andreas' method

set C "c3 c2 c1 c0"
proc addMRU {it} {
global C; set idx [lsearch -integer $C $it]
if {$idx < 0} { set idx 0 };# if not there, then remove oldest instead
set C [lreplace $C[set C ""] $idx $idx]
lappend C $it
}

proc getMRU {idx} {
global C; lindex $C end-$idx
}

proc getAndreas {} {
global C
set C [lrange $C end-3 end]
}

# Christian's method
set mru {}

interp alias {} pushmru {} apply {{v} {global mru; dict unset mru $v;
dict set mru $v {}}}

proc getChristian {} {
global mru
set rv [lrange [dict keys $mru] end-3 end]
set mru {}
foreach v $rv { dict set mru $v {}}
return $rv
}

proc andreas N {
for {set i 0} {$i<$N} {incr i} {
addMRU $i
}
}
proc christian N {
for {set i 0} {$i<$N} {incr i} {
pushmru $i
}
}

proc empty N {
for {set i 0} {$i<$N} {incr i} {
}
}

# timing code
# first exec them to become compiled code
foreach x {1 2 3 4} {
addMRU $x
pushmru $x
}

# insert 100,000 times the same item
puts "Andreas [time {addMRU 2} 1000000]"
puts "Christian [time {pushmru 2} 1000000]"
puts [getAndreas]
puts [getChristian]

# add 100,000 distinct items
puts "Andreas [time {andreas 1000000}]"
puts "Christian [time {christian 1000000}]"
puts [getAndreas]
puts [getChristian]

=============================================

It seems that they are pretty equivalent in speed, mine is only
marginally faster on my machine:

(Tests) 49 % source mrutime.tcl
Andreas 1.322123284 microseconds per iteration
Christian 1.267695078 microseconds per iteration
1 3 4 2
1 3 4 2
Andreas 1496512 microseconds per iteration
Christian 1465378 microseconds per iteration
999996 999997 999998 999999
999996 999997 999998 999999
(Tests) 49 % empty 5
(Tests) 50 % time {empty 1000000}
55946 microseconds per iteration

Christian

Christian Gollwitzer

unread,

Jan 22, 2015, 1:07:45 PM1/22/15

to

Am 22.01.15 um 19:03 schrieb Christian Gollwitzer:

> It seems that they are pretty equivalent in speed, mine is only
> marginally faster on my machine:
>
> (Tests) 49 % source mrutime.tcl
> Andreas 1.322123284 microseconds per iteration
> Christian 1.267695078 microseconds per iteration
> 1 3 4 2
> 1 3 4 2
> Andreas 1496512 microseconds per iteration
> Christian 1465378 microseconds per iteration
> 999996 999997 999998 999999
> 999996 999997 999998 999999
> (Tests) 49 % empty 5
> (Tests) 50 % time {empty 1000000}
> 55946 microseconds per iteration
>

And it turns out that procs a faster than aliases, which I didn't expect:

proc pushmru v {

global mru
dict unset mru $v
dict set mru $v {}
}

instead of the alias cuts times down by 50%.

(Tests) 50 % source mrutime.tcl
Andreas 1.259782289 microseconds per iteration
Christian 0.801872726 microseconds per iteration

1 3 4 2
1 3 4 2

Andreas 1489505 microseconds per iteration
Christian 907244 microseconds per iteration

999996 999997 999998 999999
999996 999997 999998 999999

(Tests) 50 %

Christian

heinrichmartin

unread,

Jan 23, 2015, 4:39:29 AM1/23/15

to

> And it turns out that procs a faster than aliases, which I didn't expect:

Are you sure, that an alias to apply pre-compiled the body?
The overhead might be apply rather than alias.
Martin

Andreas Leitgeb

unread,

Jan 23, 2015, 9:58:28 AM1/23/15

to

Unfortunately (for this thread) there has been a "plot twist" in my task.
I had picked a wrong interpretation from the specification of the
value that indexes into the MRU. (nothing else in the whole document
mentioned the "cache")

" The value represents an index into a cache of the last four classes used.

With an MRU - so it turned out - it casually picked the wrong class,
and some experimentation on a sample file showed, that this wording
didn't really imply uniqueness in "the last four".

This is what I ended up with, and now it seems to be correct, even though
it is not really an MRU in the usual sense:
set C "c0 c1 c2 c3"; set Cc 0
proc addMRU {it} { global C Cc; lset C $Cc $it; set Cc [expr {($Cc+1)%4}] }
proc getMRU {idx} { lindex $::C $idx };# 0 isn't always the last one added

Sorry for the effort spent on helping me.
I hope that real MRU's are used often enough that it will eventually help
someone else down the road...

Thanks!