We're running a single node. I know that's wrong, but I don't have a
choice right now. Shortly we'll move to a 16 node cluster.
in the mean time:
I get lots of these in my log:
2011-07-25 03:45:34,791 Connection 7767968 (localhost:9160) in
ConnectionPool (id = 375856016) failed: Could not connect to localhost:9160
2011-07-25 03:45:34,792 Connection 7767968 (localhost:9160) in
ConnectionPool (id = 375856016) failed: Could not connect to localhost:9160
2011-07-25 03:45:34,803 Connection 7767968 (localhost:9160) in
ConnectionPool (id = 375856016) failed: Could not connect to localhost:9160
1: is there a way to tell pycassa to NOT assert these to the log at log
level INFO?
2: This message is sort of confusing. It seems like sometimes the
message comes in, but the connection is eventually successful, since
writes happen. What does this message actually mean?
3: Is there a way to reduce the number of connection failures I get?
I'm running a multiprocess script, and sometimes all connections are
fine, sometimes many of them fail, sometimes all of them fail. I can't
relate it to anything else going on. Nothing else runs on the machine,
it doesn't relate to compactions, it doesn't show up in the cassandra log.
Thanks! Any help would be appreciated!
I get lots of these in my log:
2011-07-25 03:45:34,791 Connection 7767968 (localhost:9160) in
ConnectionPool (id = 375856016) failed: Could not connect to localhost:9160
2011-07-25 03:45:34,792 Connection 7767968 (localhost:9160) in
ConnectionPool (id = 375856016) failed: Could not connect to localhost:9160
2011-07-25 03:45:34,803 Connection 7767968 (localhost:9160) in
ConnectionPool (id = 375856016) failed: Could not connect to localhost:9160
1: is there a way to tell pycassa to NOT assert these to the log at log
level INFO?
2: This message is sort of confusing. It seems like sometimes the
message comes in, but the connection is eventually successful, since
writes happen. What does this message actually mean?
3: Is there a way to reduce the number of connection failures I get?
I'm running a multiprocess script, and sometimes all connections are
fine, sometimes many of them fail, sometimes all of them fail. I can't
relate it to anything else going on. Nothing else runs on the machine,
it doesn't relate to compactions, it doesn't show up in the cassandra log.
2011/7/25 Ernst D Schoen-René <er...@peoplebrowsr.com>
Regarding #3 - I'm not sharing the connection pool between worker processes. Each worker process spawns its own pool and uses it, while events are piped to it from the parent via a queue.
Is there any way to diagnose the cause of these failures? Should I up cassandra's log level?
I suspect that you might be hitting an open file limit on the Cassandra node. What's the output of 'cat /proc/<pid>/limits' where <pid> is the Cassandra process?
If that's not the problem, then increasing the Cassandra log may help to uncover the issue.
2011/7/25 Ernst D Schoen-René <er...@peoplebrowsr.com>
I already raised the ulimit on cassandra significantly. The open files for cassandra is beneath that limit.
The cassandra logs show nothing relating to connections. That's part of the mystery.
Can you verify the open file limit through the method I described? It's common to misconfigure this limit because of several oddities with the limits module.
Are the connection failures happening quickly after startup (of the client), or do they take a while to appear? Can you describe any patterns with how the log messages show up?
Sorry I haven't been able to reply more quickly -- I unexpectedly had to fly out to OSCON :)
Did you make any progress on this?
If not, could you increase the pycassa log level to DEBUG and paste the logs here when the problem shows up? You can see an example of changing the log level here: http://pycassa.github.com/pycassa/api/pycassa/logging/pycassa_logger.html
"timed out" indicates that you're hitting the client-side (pycassa) timeout. Depending on the application, it may be desirable to retry the query on a different node if it hasn't completed by this timeout, but it's worth looking at that timeout to make sure it matches your application needs.
In any case, this is definitely a different failure case than the previous log messages.
2011/7/27 Ernst D Schoen-René <er...@peoplebrowsr.com>
I'm doing batch writes. Maybe that's the problem
On 7/28/11 4:13 PM, Tyler Hobbs wrote:The "timeout" kwarg in the ConnectionPool constructor controls the timeout; the default is 0.5 seconds. You're probably thinking of "pool_timeout", which controls how long a thread will wait when trying to get a connection from the connection pool (this only matters in multi-threaded environments).
Are you doing really large reads or writes that might take longer than 0.5 seconds to complete? Another possibility is that one particular node might be undergoing a large GC or compaction which could slow down the operation.
2011/7/28 Ernst D Schoen-René <er...@peoplebrowsr.com>
pycassa is *not* multi-process-safe. If one process forks itself and
shares the same cass connection, you'll have problems.
> On Jul 28, 4:24 pm, Ernst D Schoen-Ren�<er...@peoplebrowsr.com>
> wrote:
>> that would make sense. Thanks!
>>
>> On 7/28/11 4:20 PM, Tyler Hobbs wrote:
>>
>>
>>
>>
>>
>>
>>
>>> Oops, meant to reply to the list on that last one.
>>> Batch writes could certainly take longer than half a second in some cases.
>>> 2011/7/28 Ernst D Schoen-Ren�<er...@peoplebrowsr.com
>>> <mailto:er...@peoplebrowsr.com>>
>>> I'm doing batch writes. Maybe that's the problem
>>> On 7/28/11 4:13 PM, Tyler Hobbs wrote:
>>>> The "timeout" kwarg in the ConnectionPool constructor controls
>>>> the timeout; the default is 0.5 seconds. You're probably
>>>> thinking of "pool_timeout", which controls how long a thread will
>>>> wait when trying to get a connection from the connection pool
>>>> (this only matters in multi-threaded environments).
>>>> Are you doing really large reads or writes that might take longer
>>>> than 0.5 seconds to complete? Another possibility is that one
>>>> particular node might be undergoing a large GC or compaction
>>>> which could slow down the operation.
>>>> 2011/7/28 Ernst D Schoen-Ren�<er...@peoplebrowsr.com
>>>> <mailto:er...@peoplebrowsr.com>>
>>>> is the timeout set in the connection? I've got it set to
>>>> default, which I read as being 30 seconds. 30 seconds seems
>>>> like an awfully long time to have so many timeouts on all 8
>>>> of my cassandra nodes when this is the only app connecting to
>>>> it. I'm using connectionPool:
>>>> connection_pool =
>>>> pycassa.pool.ConnectionPool(keyspace = 'creds', server_list =
>>>> CASSANDRA_NODE, max_retries = 5, pool_size = 1, prefill = False)
>>>> is there anything there that would cause it to timeout so much?
>>>> Thanks!
>>>> On 7/28/11 3:21 PM, Tyler Hobbs wrote:
>>>>> "timed out" indicates that you're hitting the client-side
>>>>> (pycassa) timeout. Depending on the application, it may be
>>>>> desirable to retry the query on a different node if it
>>>>> hasn't completed by this timeout, but it's worth looking at
>>>>> that timeout to make sure it matches your application needs.
>>>>> In any case, this is definitely a different failure case
>>>>> than the previous log messages.
>>>>> 2011/7/27 Ernst D Schoen-Ren�<er...@peoplebrowsr.com
>>>>>> 2011/7/25 Ernst D Schoen-Ren�<er...@peoplebrowsr.com
>>>>>>> 2011/7/25 Ernst D Schoen-Ren�
>>>>>>> <er...@peoplebrowsr.com
>>>>>>> <mailto:er...@peoplebrowsr.com>>
>>>>>>> I already raised the ulimit on cassandra
>>>>>>> significantly. The open files for cassandra
>>>>>>> is beneath that limit.
>>>>>>> The cassandra logs show nothing relating to
>>>>>>> connections. That's part of the mystery.
>>>>>>> Can you verify the open file limit through the
>>>>>>> method I described? It's common to misconfigure
>>>>>>> this limit because of several oddities with the
>>>>>>> limits module.
>>>>>>> Are the connection failures happening quickly
>>>>>>> after startup (of the client), or do they take a
>>>>>>> while to appear? Can you describe any patterns
>>>>>>> with how the log messages show up?
>>>>>>> On 7/25/11 9:42 AM, Tyler Hobbs wrote:
>>>>>>>> 2011/7/25 Ernst D Schoen-Ren�
>>>>>>>> <er...@peoplebrowsr.com
>> ...
>>
>> read more �
If reads seem to be heavy ended, make sure you're doing a repair/compact
regularly.