One of my Appscale app which was working fine for long time, suddenly stop working and I am seeing lot of exceptions. Manually running remote_api also gives me similar exceptions (it sometimes works, and sometimes gives exception, for same query). Also 4 datastore_server processes are taking about 100% cpu continuously. Rebooting machine doesn't help. Can someone look into these exceptions and suggests what might be wrong? Is my Cassandra database corrupted? How can I debug/fix it?
app-queues> QueueStatus.all(projection=['bot_id'], distinct=True).fetch(1)
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/root/appscale/AppServer/google/appengine/ext/db/__init__.py", line 2157, in fetch
return list(self.run(limit=limit, offset=offset, **kwargs))
File "/root/appscale/AppServer/google/appengine/ext/db/__init__.py", line 2326, in next
return self.__model_class.from_entity(self.__iterator.next())
File "/root/appscale/AppServer/google/appengine/datastore/datastore_query.py", line 2892, in next
next_batch = self.__batcher.next()
File "/root/appscale/AppServer/google/appengine/datastore/datastore_query.py", line 2754, in next
return self.next_batch(self.AT_LEAST_ONE)
File "/root/appscale/AppServer/google/appengine/datastore/datastore_query.py", line 2791, in next_batch
batch = self.__next_batch.get_result()
File "/root/appscale/AppServer/google/appengine/api/apiproxy_stub_map.py", line 615, in get_result
return self.__get_result_hook(self)
File "/root/appscale/AppServer/google/appengine/datastore/datastore_query.py", line 2528, in __query_result_hook
self._batch_shared.conn.check_rpc_success(rpc)
File "/root/appscale/AppServer/google/appengine/datastore/datastore_rpc.py", line 1222, in check_rpc_success
rpc.check_success()
File "/root/appscale/AppServer/google/appengine/api/apiproxy_stub_map.py", line 581, in check_success
self.__rpc.CheckSuccess()
File "/root/appscale/AppServer/google/appengine/api/apiproxy_rpc.py", line 155, in _WaitImpl
self.request, self.response)
File "/root/appscale/AppServer/google/appengine/ext/remote_api/remote_api_stub.py", line 285, in MakeSyncCall
handler(request, response)
File "/root/appscale/AppServer/google/appengine/ext/remote_api/remote_api_stub.py", line 334, in _Dynamic_Next
self._Dynamic_RunQuery(query, query_result, cursor_id)
File "/root/appscale/AppServer/google/appengine/ext/remote_api/remote_api_stub.py", line 295, in _Dynamic_RunQuery
'datastore_v3', 'RunQuery', query, query_result)
File "/root/appscale/AppServer/google/appengine/ext/remote_api/remote_api_stub.py", line 200, in MakeSyncCall
self._MakeRealSyncCall(service, call, request, response)
File "/root/appscale/AppServer/google/appengine/ext/remote_api/remote_api_stub.py", line 234, in _MakeRealSyncCall
raise pickle.loads(response_pb.exception())
ProtocolBufferReturnError: 500
Logs from datastore_server-4000.log:
ERROR:root:Lock /appscale/apps/appscaledashboard/locks/appscaledashboard%00%00RequestLogLine%3Aapp-queues146.148.38.9915691894024000000%01 in use by /appscale/apps/appscaledashboard/txids/tx0027118185
WARNING:root:Concurrent transaction exception for app id appscaledashboard with info acquire_additional_lock: There is already another transaction using /appscale/apps/appscaledashboard/locks/appscaledashboard%00%00RequestLogLine%3Aapp-queues146.148.38.9915691894024000000%01 lock
WARNING:root:Trying again to acquire lockinfo acquire_additional_lock: There is already another transaction using /appscale/apps/appscaledashboard/locks/appscaledashboard%00%00RequestLogLine%3Aapp-queues146.148.38.9915691894024000000%01 lock with retry #5
ERROR:root:Doing a rollback on transaction id 83796396 for app id app-queues
ERROR:root:((), {})
Traceback (most recent call last):
File "/root/appscale/AppDB/zkappscale/zktransaction.py", line 1142, in notify_failed_transaction
for item in self.run_with_retry(self.handle.get_children, txpath):
File "/usr/local/lib/python2.7/dist-packages/kazoo/client.py", line 267, in _retry
return self._retry.copy()(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kazoo/retry.py", line 123, in __call__
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kazoo/client.py", line 1031, in get_children
return self.get_children_async(path, watch, include_data).get()
File "/usr/local/lib/python2.7/dist-packages/kazoo/handlers/threading.py", line 102, in get
raise self._exception
NoNodeError: ((), {})
Logs from datastore_server-4001.log:
WARNING:root:Concurrent transaction exception for app id appscaledashboard with info acquire_additional_lock: There is already another transaction using /appscale/apps/appscaledashboard/locks/appscaledashboard%00%00RequestLogLine%3Aapp-queues146.148.38.9915691893915000000%01 lock
WARNING:root:Trying again to acquire lockinfo acquire_additional_lock: There is already another transaction using /appscale/apps/appscaledashboard/locks/appscaledashboard%00%00RequestLogLine%3Aapp-queues146.148.38.9915691893915000000%01 lock with retry #5
ERROR:root:Notify failed transaction removing lock: /appscale/apps/appscaledashboard/txids/tx0027118272
ERROR:root:Notify failed transaction removing lock: /appscale/apps/appscaledashboard/txids/tx0027118273
ERROR:root:Notify failed transaction removing lock: /appscale/apps/appscaledashboard/txids/tx0027118248
ERROR:root:Notify failed transaction removing lock: /appscale/apps/appscaledashboard/txids/tx0027118275
ERROR:root:Doing a rollback on transaction id 83796438 for app id app-queues
ERROR:root:((), {})
Traceback (most recent call last):
File "/root/appscale/AppDB/zkappscale/zktransaction.py", line 1142, in notify_failed_transaction
for item in self.run_with_retry(self.handle.get_children, txpath):
File "/usr/local/lib/python2.7/dist-packages/kazoo/client.py", line 267, in _retry
return self._retry.copy()(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kazoo/retry.py", line 123, in __call__
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/kazoo/client.py", line 1031, in get_children
return self.get_children_async(path, watch, include_data).get()
File "/usr/local/lib/python2.7/dist-packages/kazoo/handlers/threading.py", line 102, in get
raise self._exception
NoNodeError: ((), {})
Would be great if someone can help quickly, as this is a live server and affects many users in my team.