PFMERGE memory leaks?

32 views
Skip to first unread message

Антон Кочерин

unread,
Apr 6, 2015, 4:53:10 PM4/6/15
to redi...@googlegroups.com
Hi all.

I'm trying run simple python script :

import redis
import os
import sys
import time

print(time.strftime('%X %x %Z'))

rr
= redis.Redis(host='localhost',port=6379,db=0)

r
= rr.pipeline()
 
i
= 0
while i < 1000000:
    r
.pfadd('FPX1:' + str ( i ) , "ola1")
    r
.pfadd('FPX2:' + str ( i ) , "ola2")    
    i
= i + 1
r
.execute()

print(time.strftime('%X %x %Z'))
 
j
= 0
for key in rr.scan_iter("FPX1:*"):
    j
=j+1
   
if(j%10000 == 0):
       
print(j)
    r
.pfmerge (str( key ).replace('FPX1:', 'FPX3:'), str (key), str (key).replace('FPX1:', 'FPX2:') )
   
r
.execute()

print(time.strftime('%X %x %Z'))

redis-server service eats up all aviable memory and my computer stuck. 8Gb of RAM is installed on my computer.
When i'm trying use command "pfcount FPX1:1 FPX2:1" for all keys, all OK.
It repeated with redis 2.8 ~ 3.0. What i'm doing wrong ?




Itamar Haber

unread,
Apr 6, 2015, 5:09:04 PM4/6/15
to redi...@googlegroups.com
Your code creates 1000000 x 2 HLL keys (each counts a single element BTW).

Each HLL is about 12296 bytes, or 12KB so overall you have 1000000 x 2 x 12KB = 22.9GB of data.

Perhaps you're using it wrong - I'm guessing you want to count the same items (1 <= i <= 100000) under two keys? If so, read again about pfadd and the order of arguments that it expects.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Itamar Haber | Chief Developers Advocate
Redis Watch Newsletter - Curator and Janitor
Redis Labs - Enterprise-Class Redis for Developers

Mobile: +1 (415) 688 2443
Mobile (IL): +972 (54) 567 9692
Email: ita...@redislabs.com
Skype: itamar.haber

Blog  |  Twitter  |  LinkedIn


Josiah Carlson

unread,
Apr 6, 2015, 7:22:36 PM4/6/15
to redi...@googlegroups.com
Small HyperLogLogs use less than the full 12k of memory, but otherwise Itamar is right that you're creating 2 million keys.

But it's not just that you're creating 2 million keys, you've batched up 2 million commands on the Python side of things before sending to Redis. And the Python client will wait until it has sent the full 2 million commands to Redis before starting to read a single response. Then when performing the pfmerge side of things, 1 million commands are batched, sent all at once, then responses read after all is sent.

Assuming you're using Python 3.x, you should probably use the following instead, which keeps batch sizes to ~2000 in the first loop, and ~1000 in the second loop. Also assuming this is testing what you want to test.

 - Josiah

import redis
import os
import sys
import time

print(time.strftime('%X %x %Z'))

rr = redis.Redis(host='localhost',port=6379,db=0)

r = rr.pipeline()

for i in range(1000000):
r.pfadd('FPX1:' + str ( i ) , "ola1")
r.pfadd('FPX2:' + str ( i ) , "ola2")    
if i and not i % 1000:
r.execute()
r.execute()

print(time.strftime('%X %x %Z'))
 
for j, key in enumerate(rr.scan_iter("FPX1:*")):
if not (j+1) % 10000:
print(j)
r.pfmerge (str( key ).replace('FPX1:', 'FPX3:'), str (key), str (key).replace('FPX1:', 'FPX2:') )
if i and not i % 1000:
r.execue()
r.execute()

print(time.strftime('%X %x %Z')) 

Антон Кочерин

unread,
Apr 7, 2015, 2:48:59 AM4/7/15
to redi...@googlegroups.com
Tnx for reply.

HLL structure sizes :

N=10, M=47 
N=100, M=290 
N=500, M=1046 
N=1000, M=1894 
N=4000, M=12304

I'm understand what's wrong.
I thought PFMERGE keeps hll structure size, when merge same size structures, but I'm getting 12kb keys after merge two 47b keys.


вторник, 7 апреля 2015 г., 2:22:36 UTC+3 пользователь Josiah Carlson написал:
Reply all
Reply to author
Forward
0 new messages