Bulk performance through redis-py

1,342 views
Skip to first unread message

Matt Keranen

unread,
Jun 12, 2012, 1:50:11 PM6/12/12
to redi...@googlegroups.com
Using a contrived example at https://gist.github.com/2914601 , I am seeing a performance bottleneck in the Python driver (redis-py), where the script takes about 90 seconds to load a 150MB csv file.

On my test system, executing the script without calling HINCRBY on the Redis server, I get a 5 second runtime when simply parsing csv files and creating the key and increment values. Profiling the script confirms that most of the execution time is within the driver. Widening the pipeline set has minimal impact.

In cases such as this, does is make sense to call to a C module to write to Redis via the C driver, or perhaps write the process in a faster / compiled language entirely? I am working under the assumption that drivers for "faster" languages will run faster.

Or is there a flaw in my approach, that would 


Andy McCurdy

unread,
Jun 12, 2012, 1:53:47 PM6/12/12
to redi...@googlegroups.com
Curious, have you tried not using multiprocess and doing all the work in the main process?
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/o_Y7z6CSEz0J.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Matt Keranen

unread,
Jun 12, 2012, 2:08:41 PM6/12/12
to redi...@googlegroups.com
Yes, my first approach was single processed, and took about 4X the time. Parsing a single file with multiple processes is something I have not needed to do before - usually only needed to load multiple files concurrently.

My test case here is that I have these csv files arriving one per minute, so I need to load them in less than one minute each. I would prefer to keep this in Python, as it is the only language other than SQL I am currently (somewhat) adept at.

To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com.

Pieter Noordhuis

unread,
Jun 12, 2012, 2:11:28 PM6/12/12
to redi...@googlegroups.com
If you're that concerned with performance you can always write out the
protocol formatted commands to a file, and then run that file directly
against Redis.

Also see the --pipe option to redis-cli in 2.6.

Cheers,
Pieter
>> redis-db+u...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/redis-db?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/bG5DZf9aPZIJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.

Josiah Carlson

unread,
Jun 12, 2012, 2:11:42 PM6/12/12
to redi...@googlegroups.com
You are not piplining anything:

pl = rr.pipeline(transaction=False)
for f in fields:
rr.hincrby("%s:%s" % (key, f), samp[f], data)
cc = cc + 1
pl.execute()

You should be using pl.hincrby()

Regards,
- Josiah

Andy McCurdy

unread,
Jun 12, 2012, 2:14:19 PM6/12/12
to redi...@googlegroups.com
How many rows are in the sample CSV file you're testing with? How many HINCRBY commands are called for each row on average? I'd like to get a sense for how many calls you're making to Redis from the data in this sample file.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/bG5DZf9aPZIJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Matt Keranen

unread,
Jun 12, 2012, 2:14:58 PM6/12/12
to redi...@googlegroups.com
Oops, that is an error in the Gist.

I can confirm that that pipelining the HINCRBY results in a significant performance improvement, from about 200 seconds down to 90.


On Tuesday, June 12, 2012 2:11:42 PM UTC-4, Josiah Carlson wrote:
You are not piplining anything:

        pl = rr.pipeline(transaction=False)
        for f in fields:
            rr.hincrby("%s:%s" % (key, f), samp[f], data)
        cc = cc + 1
        pl.execute()

You should be using pl.hincrby()

Regards,
 - Josiah

On Tue, Jun 12, 2012 at 10:50 AM, Matt Keranen wrote:
> Using a contrived example at https://gist.github.com/2914601 , I am seeing a
> performance bottleneck in the Python driver (redis-py), where the script
> takes about 90 seconds to load a 150MB csv file.
>
> On my test system, executing the script without calling HINCRBY on the Redis
> server, I get a 5 second runtime when simply parsing csv files and creating
> the key and increment values. Profiling the script confirms that most of the
> execution time is within the driver. Widening the pipeline set has minimal
> impact.
>
> In cases such as this, does is make sense to call to a C module to write to
> Redis via the C driver, or perhaps write the process in a faster / compiled
> language entirely? I am working under the assumption that drivers for
> "faster" languages will run faster.
>
> Or is there a flaw in my approach, that would
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/o_Y7z6CSEz0J.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

Andy McCurdy

unread,
Jun 12, 2012, 2:15:01 PM6/12/12
to redi...@googlegroups.com
Ahh, good catch Josiah!

Matt Keranen

unread,
Jun 12, 2012, 2:17:20 PM6/12/12
to redi...@googlegroups.com
I have some code to put the Python dict into Redis protocol, and will give that a shot.

One of my goals however is to eliminate the filesystem, pushing data directly from the source into Redis. 


On Tuesday, June 12, 2012 2:11:28 PM UTC-4, Pieter Noordhuis wrote:
If you're that concerned with performance you can always write out the
protocol formatted commands to a file, and then run that file directly
against Redis.

Also see the --pipe option to redis-cli in 2.6.

Cheers,
Pieter

>> For more options, visit this group at
>> http://groups.google.com/group/redis-db?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/bG5DZf9aPZIJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

Josiah Carlson

unread,
Jun 12, 2012, 2:23:37 PM6/12/12
to redi...@googlegroups.com
How many items are you updating at once per pipeline execute() call?
From your numbers, it would seem it is only roughly 2-3 per. Try this
one, which will only run execute once every 256 lines:
https://gist.github.com/2919174

- Josiah
>> > redis-db+u...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.

Matt Keranen

unread,
Jun 12, 2012, 2:32:36 PM6/12/12
to redi...@googlegroups.com
My Gist executes for each 20 operations (20 columns to incr).

I previously ran a test wrapping the whole file chunk (1000 lines of 20 columns) in an exec, and the performance change was negligible.

However, a 100 line chunk runs in ~52 seconds, so there may be an optimal transaction size to find.


On Tuesday, June 12, 2012 2:23:37 PM UTC-4, Josiah Carlson wrote:
How many items are you updating at once per pipeline execute() call?
From your numbers, it would seem it is only roughly 2-3 per. Try this
one, which will only run execute once every 256 lines:
https://gist.github.com/2919174

 - Josiah

>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

Andy McCurdy

unread,
Jun 12, 2012, 2:33:56 PM6/12/12
to redi...@googlegroups.com
How many lines are you processing in that file? Does this mean there's 20 increments per line?
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/f0w5ME93QKUJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Josiah Carlson

unread,
Jun 12, 2012, 2:40:46 PM6/12/12
to redi...@googlegroups.com
At 20 per, that should have given you a fairly substantial speedup,
far better than 200 -> 90 seconds.

- Josiah
>> >> > redis-db+u...@googlegroups.com.
>> >> > For more options, visit this group at
>> >> > http://groups.google.com/group/redis-db?hl=en.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Redis DB" group.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>> >
>> > To post to this group, send email to redi...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > redis-db+u...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/f0w5ME93QKUJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.

Matt Keranen

unread,
Jun 12, 2012, 2:52:21 PM6/12/12
to redi...@googlegroups.com
800K lines, 14 increments each. As mentioned, this is a contrived / learning example, but close to a real implementation.

In RDBMS terms, I have source data with 14 possible dimensions other than the time key, and one numeric fact. Cutting down the number of fields does reduce load time proportionately, but I am testing the worse case scenario.

For the RDBMS perspective, I can bulk load the source files into unlogged PostgreSQL tables in about 5 seconds, but that is through a loader written in C. Using my same test case in Python, writing to MongoDB or PostgreSQL, via Python drivers, have similar performance characteristics - although my discovery of pipelining in Redis put it ahead of MongoDB (I don't know yet if there is a corresponding client option).

Dvir Volk

unread,
Jun 12, 2012, 3:21:45 PM6/12/12
to redi...@googlegroups.com
that's roughly 11M increments. not sure what machine this is but on my machine redis benchmark with highly optimized C code can do about 170k increments/sec. that's 58 seconds for 11M increments. so maybe this is just too much for a single redis instance.  how much does your machine do in redis-benchmark?

also, are you using the hiredis python module as an accelerator to redis-py? if not, just "sudo easy_install hiredis", it increases throughput considerably. 

To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/p0Kqb8Q6ZkEJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Andy McCurdy

unread,
Jun 12, 2012, 3:42:52 PM6/12/12
to redi...@googlegroups.com
In addition to what Dvir mentioned, one other point: Are you running Redis and the Python scripts on the same host? If so, you may be competing for CPU resources.

Matt Keranen

unread,
Jun 12, 2012, 3:44:58 PM6/12/12
to redi...@googlegroups.com
I have installed hiredis, but the readme in the Github repo mentions a patched version of redis-py that detects and uses hiredis. Are there specific steps to enable hiredis in redis-py?

Part of my redis-benchmark - this is 2.6RC4 - are there as yet unapplied optimizations?:

====== INCR ======
  10000 requests completed in 0.11 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

100.00% <= 0 milliseconds
88495.58 requests per second


On Tuesday, June 12, 2012 3:21:45 PM UTC-4, Dvir Volk wrote:
that's roughly 11M increments. not sure what machine this is but on my machine redis benchmark with highly optimized C code can do about 170k increments/sec. that's 58 seconds for 11M increments. so maybe this is just too much for a single redis instance.  how much does your machine do in redis-benchmark?

also, are you using the hiredis python module as an accelerator to redis-py? if not, just "sudo easy_install hiredis", it increases throughput considerably. 

Andy McCurdy

unread,
Jun 12, 2012, 3:46:30 PM6/12/12
to redi...@googlegroups.com
There's no need to patch redis-py. If hiredis is installed, installed, redis-py will automatically use it.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/PZ5DqDAnRZcJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Dvir Volk

unread,
Jun 12, 2012, 3:59:10 PM6/12/12
to redi...@googlegroups.com
I forgot to pipeline my benchmark, so now with piping every 1000 INCRs, I'm getting
INCR: 729927.06 requests per second
which should be enough to do what you want in 15-16 seconds, so even if your machine is 50% slower than mine as it appears from your data, you should be able to do this in 30 seconds. of course longer keys can make it much slower but you should still be fine. 

just out of curiosity, did you try doing it all in one process and one pipeline?  this should be close to what C can do.

To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/PZ5DqDAnRZcJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Matt Keranen

unread,
Jun 12, 2012, 4:10:15 PM6/12/12
to redi...@googlegroups.com
Splitting the work across 4 - 6 processes.

With your question, I am looking at how I am creating / using pipelined. Would the recommended approach be one persistent pipeline per process?


On Tuesday, June 12, 2012 3:59:10 PM UTC-4, Dvir Volk wrote:
I forgot to pipeline my benchmark, so now with piping every 1000 INCRs, I'm getting
INCR: 729927.06 requests per second
which should be enough to do what you want in 15-16 seconds, so even if your machine is 50% slower than mine as it appears from your data, you should be able to do this in 30 seconds. of course longer keys can make it much slower but you should still be fine. 

just out of curiosity, did you try doing it all in one process and one pipeline?  this should be close to what C can do.

Josiah Carlson

unread,
Jun 12, 2012, 4:16:30 PM6/12/12
to redi...@googlegroups.com
That was one of the changes I made in the forked version I sent
earlier. It shouldn't matter significantly, but every little bit
counts. Incidentally, the script as posted shouldn't run, as there is
a NameError due to the samp[f] reference; samp isn't defined in the
code.

pl.hincrby("%s:%s" % (key, f), samp[f], data)

- Josiah
>>>>>> >> > redis-db+u...@googlegroups.com.
>>>>>> >> > For more options, visit this group at
>>>>>> >> > http://groups.google.com/group/redis-db?hl=en.
>>>>>> >
>>>>>> > --
>>>>>> > You received this message because you are subscribed to the Google
>>>>>> > Groups
>>>>>> > "Redis DB" group.
>>>>>> > To view this discussion on the web visit
>>>>>> > https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>>>>>> >
>>>>>> > To post to this group, send email to redi...@googlegroups.com.
>>>>>> > To unsubscribe from this group, send email to
>>>>>> > redis-db+u...@googlegroups.com.
>>>>>> > For more options, visit this group at
>>>>>> > http://groups.google.com/group/redis-db?hl=en.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Redis DB" group.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msg/redis-db/-/f0w5ME93QKUJ.
>>>>>> To post to this group, send email to redi...@googlegroups.com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> redis-db+u...@googlegroups.com.
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/redis-db?hl=en.
>>>>>>
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Redis DB" group.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msg/redis-db/-/p0Kqb8Q6ZkEJ.
>>>>>
>>>>> To post to this group, send email to redi...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> redis-db+u...@googlegroups.com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/redis-db?hl=en.
>>>>
>>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Redis DB" group.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msg/redis-db/-/PZ5DqDAnRZcJ.
>>>
>>> To post to this group, send email to redi...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> redis-db+u...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/redis-db?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/r06xotUM_KAJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.

Dvir Volk

unread,
Jun 12, 2012, 4:17:08 PM6/12/12
to redi...@googlegroups.com
as long as it's simple code like this, then yes, just create a pipeline and execute it whenever you need. 
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/r06xotUM_KAJ.

To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.

Matt Keranen

unread,
Jun 12, 2012, 4:32:44 PM6/12/12
to redi...@googlegroups.com
I have updated https://gist.github.com/2914601 with the fastest revision so far (and fixed the typos resulting from stripping business specific information out).


On Tuesday, June 12, 2012 4:16:30 PM UTC-4, Josiah Carlson wrote:
That was one of the changes I made in the forked version I sent
earlier. It shouldn't matter significantly, but every little bit
counts. Incidentally, the script as posted shouldn't run, as there is
a NameError due to the samp[f] reference; samp isn't defined in the
code.

pl.hincrby("%s:%s" % (key, f), samp[f], data)

 - Josiah

>>>>>> >> > redis-db+unsubscribe@googlegroups.com.
>>>>>> >> > For more options, visit this group at
>>>>>> >> > http://groups.google.com/group/redis-db?hl=en.
>>>>>> >
>>>>>> > --
>>>>>> > You received this message because you are subscribed to the Google
>>>>>> > Groups
>>>>>> > "Redis DB" group.
>>>>>> > To view this discussion on the web visit
>>>>>> > https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>>>>>> >
>>>>>> > To post to this group, send email to redi...@googlegroups.com.
>>>>>> > To unsubscribe from this group, send email to
>>>>>> > redis-db+unsubscribe@googlegroups.com.
>>>>>> > For more options, visit this group at
>>>>>> > http://groups.google.com/group/redis-db?hl=en.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Redis DB" group.
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msg/redis-db/-/f0w5ME93QKUJ.
>>>>>> To post to this group, send email to redi...@googlegroups.com.
>>>>>> To unsubscribe from this group, send email to
>>>>>> redis-db+unsubscribe@googlegroups.com.
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/redis-db?hl=en.
>>>>>>
>>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Redis DB" group.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msg/redis-db/-/p0Kqb8Q6ZkEJ.
>>>>>
>>>>> To post to this group, send email to redi...@googlegroups.com.
>>>>> To unsubscribe from this group, send email to
>>>>> redis-db+unsubscribe@googlegroups.com.
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/redis-db?hl=en.
>>>>
>>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Redis DB" group.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msg/redis-db/-/PZ5DqDAnRZcJ.
>>>
>>> To post to this group, send email to redi...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> redis-db+unsubscribe@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/redis-db?hl=en.
>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/r06xotUM_KAJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+unsubscribe@googlegroups.com.

Josiah Carlson

unread,
Jun 12, 2012, 4:39:41 PM6/12/12
to redi...@googlegroups.com
You are only executing the transaction once at the end of each chunk,
not occasionally as I showed with my gist (which I deleted before
checking yours). Try sending somewhere between 500 and 10k hincrby
calls per pipeline.execute(). I'm suspect that your peak performance
will be somewhere in that range.

Regards,
- Josiah
>> >>>>>> >> > redis-db+u...@googlegroups.com.
>> >>>>>> >> > For more options, visit this group at
>> >>>>>> >> > http://groups.google.com/group/redis-db?hl=en.
>> >>>>>> >
>> >>>>>> > --
>> >>>>>> > You received this message because you are subscribed to the
>> >>>>>> > Google
>> >>>>>> > Groups
>> >>>>>> > "Redis DB" group.
>> >>>>>> > To view this discussion on the web visit
>> >>>>>> > https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>> >>>>>> >
>> >>>>>> > To post to this group, send email to redi...@googlegroups.com.
>> >>>>>> > To unsubscribe from this group, send email to
>> >>>>>> > redis-db+u...@googlegroups.com.
>> >>>>>> > For more options, visit this group at
>> >>>>>> > http://groups.google.com/group/redis-db?hl=en.
>> >>>>>>
>> >>>>>> --
>> >>>>>> You received this message because you are subscribed to the Google
>> >>>>>> Groups "Redis DB" group.
>> >>>>>> To view this discussion on the web visit
>> >>>>>> https://groups.google.com/d/msg/redis-db/-/f0w5ME93QKUJ.
>> >>>>>> To post to this group, send email to redi...@googlegroups.com.
>> >>>>>> To unsubscribe from this group, send email to
>> >>>>>> redis-db+u...@googlegroups.com.
>> >>>>>> For more options, visit this group at
>> >>>>>> http://groups.google.com/group/redis-db?hl=en.
>> >>>>>>
>> >>>>>>
>> >>>>> --
>> >>>>> You received this message because you are subscribed to the Google
>> >>>>> Groups "Redis DB" group.
>> >>>>> To view this discussion on the web visit
>> >>>>> https://groups.google.com/d/msg/redis-db/-/p0Kqb8Q6ZkEJ.
>> >>>>>
>> >>>>> To post to this group, send email to redi...@googlegroups.com.
>> >>>>> To unsubscribe from this group, send email to
>> >>>>> redis-db+u...@googlegroups.com.
>> >>>>> For more options, visit this group at
>> >>>>> http://groups.google.com/group/redis-db?hl=en.
>> >>>>
>> >>>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> "Redis DB" group.
>> >>> To view this discussion on the web visit
>> >>> https://groups.google.com/d/msg/redis-db/-/PZ5DqDAnRZcJ.
>> >>>
>> >>> To post to this group, send email to redi...@googlegroups.com.
>> >>> To unsubscribe from this group, send email to
>> >>> redis-db+u...@googlegroups.com.
>> >>> For more options, visit this group at
>> >>> http://groups.google.com/group/redis-db?hl=en.
>> >>
>> >>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Redis DB" group.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msg/redis-db/-/r06xotUM_KAJ.
>> >
>> > To post to this group, send email to redi...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > redis-db+u...@googlegroups.com.
>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/uCh44arTVbwJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.

Dvir Volk

unread,
Jun 12, 2012, 4:42:05 PM6/12/12
to redi...@googlegroups.com
can you send a sample data so I can run it locally and try playing with it?

Matt Keranen

unread,
Jun 12, 2012, 5:05:14 PM6/12/12
to redi...@googlegroups.com
I have been adjusting the frequency of executes by adjusting the chunk size (number of csv lines per process). Thus, 1000 lines * 14 incrs / line gives me 14K incrs per execute.

I have added the enumerator from your gist to see if I can dial into the optimal commit size.


On Tuesday, June 12, 2012 4:39:41 PM UTC-4, Josiah Carlson wrote:
You are only executing the transaction once at the end of each chunk,
not occasionally as I showed with my gist (which I deleted before
checking yours). Try sending somewhere between 500 and 10k hincrby
calls per pipeline.execute(). I'm suspect that your peak performance
will be somewhere in that range.

Regards,
 - Josiah

>> >>>>>> >> > For more options, visit this group at
>> >>>>>> >> > http://groups.google.com/group/redis-db?hl=en.
>> >>>>>> >
>> >>>>>> > --
>> >>>>>> > You received this message because you are subscribed to the
>> >>>>>> > Google
>> >>>>>> > Groups
>> >>>>>> > "Redis DB" group.
>> >>>>>> > To view this discussion on the web visit
>> >>>>>> > https://groups.google.com/d/msg/redis-db/-/4p5385Tk5GIJ.
>> >>>>>> >
>> >>>>>> > To post to this group, send email to redi...@googlegroups.com.
>> >>>>>> > To unsubscribe from this group, send email to
>> >>>>>> > For more options, visit this group at
>> >>>>>> > http://groups.google.com/group/redis-db?hl=en.
>> >>>>>>
>> >>>>>> --
>> >>>>>> You received this message because you are subscribed to the Google
>> >>>>>> Groups "Redis DB" group.
>> >>>>>> To view this discussion on the web visit
>> >>>>>> https://groups.google.com/d/msg/redis-db/-/f0w5ME93QKUJ.
>> >>>>>> To post to this group, send email to redi...@googlegroups.com.
>> >>>>>> To unsubscribe from this group, send email to
>> >>>>>> For more options, visit this group at
>> >>>>>> http://groups.google.com/group/redis-db?hl=en.
>> >>>>>>
>> >>>>>>
>> >>>>> --
>> >>>>> You received this message because you are subscribed to the Google
>> >>>>> Groups "Redis DB" group.
>> >>>>> To view this discussion on the web visit
>> >>>>> https://groups.google.com/d/msg/redis-db/-/p0Kqb8Q6ZkEJ.
>> >>>>>
>> >>>>> To post to this group, send email to redi...@googlegroups.com.
>> >>>>> To unsubscribe from this group, send email to
>> >>>>> For more options, visit this group at
>> >>>>> http://groups.google.com/group/redis-db?hl=en.
>> >>>>
>> >>>>
>> >>> --
>> >>> You received this message because you are subscribed to the Google
>> >>> Groups
>> >>> "Redis DB" group.
>> >>> To view this discussion on the web visit
>> >>> https://groups.google.com/d/msg/redis-db/-/PZ5DqDAnRZcJ.
>> >>>
>> >>> To post to this group, send email to redi...@googlegroups.com.
>> >>> To unsubscribe from this group, send email to
>> >>> For more options, visit this group at
>> >>> http://groups.google.com/group/redis-db?hl=en.
>> >>
>> >>
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "Redis DB" group.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msg/redis-db/-/r06xotUM_KAJ.
>> >
>> > To post to this group, send email to redi...@googlegroups.com.
>> > To unsubscribe from this group, send email to
>> > For more options, visit this group at
>> > http://groups.google.com/group/redis-db?hl=en.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/uCh44arTVbwJ.
>
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to

Matt Keranen

unread,
Jun 12, 2012, 7:23:40 PM6/12/12
to redi...@googlegroups.com
And finally, FWIW, my PostgreSQL comparison load rate was off by one order: Since I am executing 1 HINCRBY per dimension, I am actually performing 14X the updates as with a single bulk table insert. When adjusted to a timing comparison with an equal number of operations, execution time is much closer (with multiprocessed Python compared to single threaded C).
Reply all
Reply to author
Forward
0 new messages