Re: Why would running the same command (SINTERSTORE) successively on Redis produce faster and faster results?

141 views
Skip to first unread message

AcidZombie24

unread,
Oct 10, 2012, 6:20:04 PM10/10/12
to redi...@googlegroups.com
You should wait for someone who actually knows what he is talking about to answer but my comment is do you have any idea how busy your machine was at the time? My initial thought is maybe its in the cache

On Wednesday, October 10, 2012 4:38:18 PM UTC-4, Royi Hagigi wrote:
Apologies for the cross-post from StackOverflow, but I figured I'd get a faster response here than there. I don't believe that redis caches the results of commands, correct? If so, then why would I see the following on my redis server for the same query. For reference, this is redis running in a VM. I checked the smaps file as described in http://redis.io/topics/latency and see no swapping on the OS level (all 0kb in Swap for the process), but is it possible that running redis in a VM pages memory to disk and back? Or... are these results expected due to some kind of redis optimization for commonly run commands? 


redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(4.46s)
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(3.77s)
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(0.92s)
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(0.64s)
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(0.67s)
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(0.73s)
redis 127.0.0.1:6379[4]> scard ClientId:1637
(integer) 796529
redis 127.0.0.1:6379[4]> scard PublisherId:1
(integer) 311092
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(1.88s)
redis 127.0.0.1:6379[4]> sinterstore testdb ClientId:1637 PublisherId:1
(integer) 240001
(0.69s)

Josiah Carlson

unread,
Oct 10, 2012, 6:39:09 PM10/10/12
to redi...@googlegroups.com
Your answer is closer than you think. My answers below are educated guesses founded on knowledge and experience of equivalent behavior in other systems.

CPU caching, branch prediction, memory allocator structure caching, and a variety of other interesting interactions, combined with other processes getting the core are probably what the OP is seeing.

When the sinterstore occurs the first time, the cache is "cold" for reading the data and for allocating the structures necessary to support the subsequent result set. The second time, reading is as cached as it can be, but all of the allocations still need to happen on potentially unused structures. But when the key is overwritten at the end of the second sinterstore, the first result is 'freed', but important pieces of the structures involved for the memory allocator that held that data have just been put into cache as part of the freeing process. Then the 3rd time, allocations are traversing many of the previously-cached memory allocation structures, combined with being able to read the somewhat-cached data better, etc.

The behavior the OP is seeing when they do something else after the 6th sinterstore is likely the result of letting the caches get cold as the processor, VM, and other layers do other things to dirty the cache, reduce the VM/Redis server process priority, etc.

Regards,
 - Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/nfnjiPZcCoYJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Didier Spezia

unread,
Oct 11, 2012, 5:03:37 AM10/11/12
to redi...@googlegroups.com

I second Josiah's answer.

Large sets are implemented with hash tables involving double-linked lists.
A large intersection will likely be dominated by the cost of cache misses (i.e. memory accesses).
So things like the L1, L2, L3 CPU caches, and the TLB (translation lookaside buffer) are probably
critical for the performance of such operations.

On top of this, the fact you use a VM probably also has a serious impact.
On a VM you have 3 layers for memory: virtual guest, physical guest and physical host memory.
At the VM level, you can only evaluate the swapping activity between virtual guest memory
and physical guest memory. But on modern hypervisors, you can also have memory compression,
transparent page sharing, and hypervisor memory swapping. These features tend to increase
the volatility of latency of memory bound operations.

Regards,
Didier.

M. Edward (Ed) Borasky

unread,
Oct 11, 2012, 12:52:38 PM10/11/12
to redi...@googlegroups.com
On Wed, Oct 10, 2012 at 3:39 PM, Josiah Carlson <josiah....@gmail.com> wrote:
Your answer is closer than you think. My answers below are educated guesses founded on knowledge and experience of equivalent behavior in other systems.

CPU caching, branch prediction, memory allocator structure caching, and a variety of other interesting interactions, combined with other processes getting the core are probably what the OP is seeing.

When the sinterstore occurs the first time, the cache is "cold" for reading the data and for allocating the structures necessary to support the subsequent result set. The second time, reading is as cached as it can be, but all of the allocations still need to happen on potentially unused structures. But when the key is overwritten at the end of the second sinterstore, the first result is 'freed', but important pieces of the structures involved for the memory allocator that held that data have just been put into cache as part of the freeing process. Then the 3rd time, allocations are traversing many of the previously-cached memory allocation structures, combined with being able to read the somewhat-cached data better, etc.

The behavior the OP is seeing when they do something else after the 6th sinterstore is likely the result of letting the caches get cold as the processor, VM, and other layers do other things to dirty the cache, reduce the VM/Redis server process priority, etc.

Regards,
 - Josiah

Branch prediction failure is the "nasty little secret" of low-level virtual machine / language interpreter design. Anton Ertl of gForth and David Gregg did a few papers on the subject. I can hunt them down if anyone's interested. On x86_64, there's also http://www.agner.org/optimize/

--
Twitter: http://twitter.com/znmeb; Computational Journalism Publishers Workbench: http://znmeb.github.com/Computational-Journalism-Publishers-Workbench/

How the Hell can the lion sleep with all those people singing "A weem oh way!" at the top of their lungs?

Javier Guerra Giraldez

unread,
Oct 11, 2012, 1:02:02 PM10/11/12
to redi...@googlegroups.com
On Thu, Oct 11, 2012 at 11:52 AM, M. Edward (Ed) Borasky
<zn...@znmeb.net> wrote:
> Branch prediction failure is the "nasty little secret" of low-level virtual
> machine / language interpreter design.

i think in this context, VM means virtual machine as in server
virtualization, ie Xen, VMWare, KVM, etc. i don't know if those
affect branch prediction... (maybe VMWare's old dynamic patching, but
not likely current hardware virtualzation features used by modern
hypervisors)

on language VM/JITs (ie, JVM, V8, LuaJIT...), definitely!

--
Javier

Dvir Volk

unread,
Oct 11, 2012, 1:11:12 PM10/11/12
to redi...@googlegroups.com
branch prediction affects native code, too.
here's an interesting discussion of it I just read the other day:




--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.




--
Dvir Volk
Chief Architect, Everything.me

Javier Guerra Giraldez

unread,
Oct 11, 2012, 1:14:02 PM10/11/12
to redi...@googlegroups.com
On Thu, Oct 11, 2012 at 12:11 PM, Dvir Volk <dvi...@gmail.com> wrote:
> branch prediction affects native code, too.

sure it affects!

but is that effect altered by virtualization? i don't think too
much... at least not as heavily as cache shortages

--
Javier

M. Edward (Ed) Borasky

unread,
Oct 11, 2012, 5:16:55 PM10/11/12
to redi...@googlegroups.com
Yes I was referring to language VMs
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Royi Hagigi

unread,
Oct 30, 2012, 11:54:46 PM10/30/12
to redi...@googlegroups.com
Just an update, reran the same test on a physical box. Running the same test on the same "hardware" as the virtual hardware (8gb dataset on 16gb memory) gave us significantly improved performance. For reference sake, it never showed a timing, even from the first time I ran it, and continued to give us results that showed much more consistent timings for similar commands.
Reply all
Reply to author
Forward
0 new messages