Large data set insertion Error

37 views
Skip to first unread message

Stephen Jones

unread,
May 1, 2014, 3:23:47 PM5/1/14
to pycassa...@googlegroups.com
Hey there - 

I keep running into an error that I'm not sure how to resolve. I have a large JSON string that I'd like to insert as a value for a given row's column. When writing out this JSON data to a file on local disk, the file is about 20mb. When I use insert to insert the JSON string into the column the following error is all I get. I've tried to adjust the number of retries to -1 for an unlimited amount of retries, but no matter what - nothing is inserted, and I get that the Connection is reset by peer error when I use listeners to track progress of the connection. I don't have much database experience, but I'm hoping someone is able to shed a bit of light on this issue and how it might be resolved. Thanks in advance for your help! 

Traceback:
Traceback:
  File "/home/user/CreateSequenceIndex.py", line 43, in AddSequenceData
    idx = self.cf.insert(self.SEQ_KEY,{self.version: data})
  File "/usr/lib/python2.6/site-packages/pycassa/columnfamily.py", line 977, in insert
    allow_retries=self._allow_retries)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 554, in execute
    return getattr(conn, f)(*args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
    return new_f(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
    return new_f(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
    return new_f(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
    return new_f(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 150, in new_f
    return new_f(self, *args, **kwargs)
  File "/usr/lib/python2.6/site-packages/pycassa/pool.py", line 145, in new_f
    (self._retry_count, exc.__class__.__name__, exc))
pycassa.pool.MaximumRetryException: Retried 6 times. Last failure was error: [Errno 104] Connection reset by peer



example code:
POOL_NAME = "mypool"
HOSTS = ['host001','host002]

class StatsIndex(object):
    def __init__(self,SEQ_KEY):
        self.SEQ_KEY = SEQ_KEY
        self.version = 1
        connection = ConnectionTools(pool_name=POOL_NAME,host=HOSTS)
        self.cf = connection.GetColumnFamily(family='SequenceData')

    def AddSequenceData(self,SEQ_STATS):
        data = json.dumps(SEQ_STATS)
        idx = self.cf.insert(self.SEQ_KEY,{self.version: data})

class ConnectionTools(object):
    def __init__(self,pool_name=None,host=[]):
        self.pool_name = pool_name
        self.host = host
        self.POOL = ConnectionPool(self.pool_name,server_list=self.host,listeners=[MyListener()],timeout=120)

    def GetColumnFamily(self,pool=None,family=None):
        if pool == None:
           return ColumnFamily(self.POOL,family)
        else:
           return ColumnFamily(pool,family)

SEQ_KEY = "some-key-name"
SEQ_STATS = {dictionary of my data}
si = StatsIndex(SEQ_KEY)
si.AddSequenceData(SEQ_STATS)


Tyler Hobbs

unread,
May 1, 2014, 8:04:09 PM5/1/14
to pycassa...@googlegroups.com
You're probably hitting this limit, which is set in cassandra.yaml:

# Frame size for thrift (maximum message length).
thrift_framed_transport_size_in_mb: 15

Instead of raising that, you may want to consider breaking up large values into several columns and inserting/fetching them separately.


--
You received this message because you are subscribed to the Google Groups "pycassa-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pycassa-discu...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Tyler Hobbs
DataStax

Stephen Jones

unread,
May 2, 2014, 12:10:00 PM5/2/14
to pycassa...@googlegroups.com
Seems that I'm asking a bit too much of Cassandra and dwindling things to the bare bones is the best option. Learning curve of a noob. Thanks for your help and guidance Tyler. Cheers

- S



--
You received this message because you are subscribed to a topic in the Google Groups "pycassa-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pycassa-discuss/d8A7j9x1aP0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pycassa-discu...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages