Storing large CSV values in Redis

2,233 views
Skip to first unread message

Bala M

unread,
Jun 20, 2016, 8:52:08 AM6/20/16
to Redis DB
Hi,

I am working on storing values from large CSV files in redis. I splitting the data into chunks (say 200 rows of data) and storing as a seperate field in the same hash. Please see the structure below.


Redis_Hash_Key : Redis_Hash_field_key1: [Values1]
                             : Redis_Hash_field_key2: [Values2]
                             : Redis_Hash_field_key3: [Values3]

The values field sometimes is very high causing the memory spike in redis.

I clear the memory very soon but even then, is there any way to efficiently store these data? 

I tried playing with the  "hash-max-ziplist-entries"  and "hash-max-ziplist-value" in the redis conf. But that does not give any vast improvements to the memory.

Any suggestions would be very much appreciated. 

Thanks.



Stefano Fratini

unread,
Jun 20, 2016, 11:54:42 PM6/20/16
to Redis DB
Redis and "big data" are not best friends
Storing files in Redis is almost certainly not the best idea
A service like S3 would be better suited for this

If you host on AWS and stream the file from S3 you get ms latencies 

How big is altogether your CSV?

Bala M

unread,
Jun 21, 2016, 12:33:45 AM6/21/16
to redi...@googlegroups.com
Yeah I understand Redis is primarily a key value store purposed for caching at its best. I choose Redis as this data is actually cleared after certain process is complete, which at max takes up 1 hour of time. Also file write is not a good option as it would have its own read and write latency. 

CSV file can vary upto 10 mb of data. But when I parse a 10 mb compressed files such as xlsx and convert to CSV, it expands upto 50 mb of storage. I have optimised the hash table structure so that all small chunks of hashtable data is perceived as ziplist in Redis. However I find no visible improvement in the storage. 50 mb data takes up around 45 mb of redis storage. If Redis does not fit my need what would be a robust alternative for my usecase?
--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/37-WpqXQHFI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

AlexanderB

unread,
Jun 21, 2016, 2:09:03 PM6/21/16
to Redis DB
One option might be to keep the data compressed in redis using msg-pack. 

I ran across this great write up that would be a good read. http://blog.backslash.fr/optimizing-redis-memory-usage-with-messagepack-optimistic-locking-transaction-and-lua-script/
Essentially, they send and receive uncompressed data from redis, but are able to use the msg-pack library from a lua script, to pack /unpack the data before storing it in memory. If you're data already has a 5-1 compression ration as a compressed file, you might be able to get some big gains from something like this. 
To unsubscribe from this group and all its topics, send an email to redis-db+unsubscribe@googlegroups.com.

Bala M

unread,
Jun 24, 2016, 5:50:18 AM6/24/16
to Redis DB
Thanks for the suggestion. That solves the problem of data storage. But I have one issue using msg-pack. When I set the JSONObject as the value to the LUA Script that uses the msgpack to pack the data, it perceives it as String and does not encode it to the fullest. 

Here's the same of what I am trying to do. 

>SCRIPT LOAD "local elem = cmsgpack.pack(ARGV[1]); redis.call('HSET',KEYS[1],KEYS[2],elem); return elem;"
SHA_DIGEST_FOR_THE_SCRIPT_ABOVE

>EVALSHA "SHA_DIGEST_FOR_THE_SCRIPT_ABOVE" 2 k1 k2 '{\"jsonkey\":\"jsonvalue\"}

"\xb3{\"jsonkey\":\"jsonvalue\"}"


If you see the packed data from the LUA script using msgpack, its same as the JSONObject with the a character inserted at the start.


Can you identify what's wrong with this?

AlexanderB

unread,
Jun 24, 2016, 2:31:29 PM6/24/16
to Redis DB
You're best bet might be to avoid using json, and come up with a solution that passes in the keys / values as separate arguments, msgpack them, but still store the results of the pack as zip sized hashes in redis. This would be a fair bit more work, but would likely pack better than trying to pack the entire json string. 
Reply all
Reply to author
Forward
0 new messages