Am 16.12.20 um 09:30 schrieb Botond Dénes:
> On Wed, 2020-12-16 at 09:16 +0100, Michael wrote:
>>
>> Am 16.12.20 um 08:35 schrieb Botond Dénes:
>>> Hi Michael,
>>>
>>>
>>> Is this a recently added node by any chance? Can you tell a bit
>>> about
>>> your cluster? What version are you running, do nodes have the same
>>> shard count? What is your schema? Is this happening with all tables
>>> or
>>> just one?
no, the node is not new, it was always an 8 node cluster.
>>
>>
>> I checked the shard count and it's 1-7 on 7 nodes and 0-7 on one
>> node,
>> does this matter?
>
> Yes, this means the new node has different shard count. This is
> supported but apparently there is a bug related to this that causes the
> symptoms you mention above. We couldn't find the bug yet
The node with one shard less was not the node with the "non null" exception.
And, it's only one table (out of 8). Is it possible that with one node
having one shard less, other nodes throw compaction errors?
If so, I stop the node, change the cpuset and restart it.
>> After removing the mc-* files with the non null error and issuing a
>> repair, the non null errors are gone. But the first repair failed,
>> now I
>> try a second "repair -pr" and see what happens.
>
> What error did repair fail with (if any)?
I will look through the logs.
> A cell is a value for a certain column in a table. We have detectors
> for large cells in place, as they often cause problems. This is not
> fatal, but you might want to check which cells are large.
yes, but what is large (in bytes)?
The table has a text field which mostly is short, but can degenerate and
be over 1MB in size
Michael