compaction failed: seastar::internal::backtraced<marshal

Michael

<micha-1@fantasymail.de>

unread,

Dec 15, 2020, 6:18:04 AM12/15/20

to scylladb-users@googlegroups.com

Hi,

I get a massive amount of messages like this in the log, mainly on 2 out
if 8 nodes, more so if I run a repair.

Any hints what to do?

Thanks,
Michael

[2020-12-15 00:18:59,193] Repair session 5 failed
[2020-12-15 00:18:59,194] Repair session 5 finished
error: Repair job has failed with the error message: [2020-12-15
00:18:59,193] Repair session 5 failed
-- StackTrace --
java.lang.RuntimeException: Repair job has failed with the error
message: [2020-12-15 00:18:59,193] Repair session 5 failed
at
org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:124)
at
org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
at
com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
at
com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

repair failed. aborting.

Scylla log:

Dez 15 09:01:25 mw-scylla08 scylla[7326]: [shard 3] compaction_manager
- compaction failed: seastar::internal::backtraced<marshal_exception>
(marshaling error: composite iterator - not enough bytes, expected
29801, got 2558 Backtrace: 0x334e5ad
0x334e8c0
0x334ed49
0xf1dbe9
0xf1de06
0xf1e051
0x12f6bfa
0x138ff72
0x13a3b80
0x13b8693
0x13b947a
0x13b9ea4
0x13bcad9
0x13bd635
0x116f281
0x117171d
0x1171c9a
0x140ae4e
0x140b993
0x140bf43
0x315bdfc
--------

N7seastar12continuationINS_8internal22promise_base_with_typeIJEEEZNS_5asyncIZZN8sstables10compaction5setupI33noop_compacted_fragments_consumerEENS_6futureIJEEET_ENUl20flat_mutation_readerE_clESC_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayISB_E4typeEDpNSH_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSB_DpOSK_EUlvE0_ZZNSA_14then_impl_nrvoISX_SA_EET0_SU_ENKUlvE_clEvEUlRS3_RSX_ONS_12future_stateIJEEEE_JEEE
--------

N7seastar12continuationINS_8internal22promise_base_with_typeIJEEENS_6futureIJEE12finally_bodyIZNS_5asyncIZZN8sstables10compaction5setupI33noop_compacted_fragments_consumerEES5_T_ENUl20flat_mutation_readerE_clESD_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayISC_E4typeEDpNSI_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSC_DpOSL_EUlvE1_Lb0EEEZZNS5_17then_wrapped_nrvoIS5_SZ_EENSG_ISC_E4typeEOT0_ENKUlvE_clEvEUlRS3_RSZ_ONS_12future_stateIJEEEE_JEEE
--------
seastar::(anonymous
namespace)::thread_wake_task
--------

N7seastar12continuationINS_8internal22promise_base_with_typeIJN8sstables15compaction_infoEEEEZNS_5asyncIZNS3_10compaction3runI33noop_compacted_fragments_consumerEENS_6futureIJS4_EEESt10unique_ptrIS7_St14default_deleteIS7_EET_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayISG_E4typeEDpNSK_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSG_DpOSN_EUlvE0_ZZNSA_IJEE14then_impl_nrvoIS10_SB_EET0_SX_ENKUlvE_clEvEUlRS5_RS10_ONS_12future_stateIJEEEE_JEEE
--------

seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_info>,
seastar::future<sstables::compaction_info>::finally_body<seastar::async<sstables::compaction::run<noop_compacted_fragments_consumer>(std::unique_ptr<sstables::compaction,
std::default_delete<sstables::compaction> >,
noop_compacted_fragments_consumer)::{lambda()#1}>(seastar::thread_attributes,
sstables::compaction::run<noop_compacted_fragments_consumer>(std::unique_ptr<sstables::compaction,
std::default_delete<sstables::compaction> >,
noop_compacted_fragments_consumer)::{lambda()#1}&&,
(std::decay<sstables::compaction::run<noop_compacted_fragments_consumer>(std::unique_ptr<sstables::compaction,
std::default_delete<sstables::compaction> >,
noop_compacted_fragments_consumer)::{lambda()#1}>::type&&)...)::{lambda()#3},
false>,
seastar::future<sstables::compaction_info>::then_wrapped_nrvo<seastar::future<sstables::compaction_info>,
{lambda()#3}>({lambda()#3}&&)::{lambda()#1}::operator()()
const::{lambda(seastar::internal::promise_base_with_type<sstables::compaction_info>&,
{lambda()#3}&, seastar::future_state<sstables::compaction_info>&&)#1},
sstables::compaction_info>
--------

seastar::continuation<seastar::internal::promise_base_with_type<>,
table::compact_sstables(sstables::compaction_descriptor)::{lambda(auto:1)#3},
seastar::future<sstables::compaction_info>::then_impl_nrvo<{lambda(auto:1)#3},
table::compact_sstables(sstables::compaction_descriptor)::{lambda(auto:1)#3}<>
>({lambda(auto:1)#3}&&)::{lambda()#1}::operator()()
const::{lambda(seastar::internal::promise_base_with_type<>&,
{lambda(auto:1)#3}&, seastar::future_state<seastar::future>&&)#1},
seastar::future>
--------

seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag>
>,
compaction_manager::submit(table*)::{lambda()#1}::operator()()::{lambda()#1}::operator()()::{lambda()#1}::operator()()::{lambda(seastar::future<>)#2},
{lambda()#1}::then_wrapped_nrvo<{lambda()#1}<seastar::bool_class<seastar::stop_iteration_tag>
>, {lambda()#1}>({lambda()#1}&&)::{lambda()#1}::operator()()
const::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag>
>&, {lambda()#1}&, seastar::future_state<>&&)#1}>
--------

seastar::continuation<seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag>
>,
seastar::with_lock<seastar::rwlock_for_read<std::chrono::_V2::steady_clock>,
compaction_manager::submit(table*)::{lambda()#1}::operator()()::{lambda()#1}>(seastar::rwlock_for_read<std::chrono::_V2::steady_clock>&,
compaction_manager::submit(table*)::{lambda()#1}::operator()()::{lambda()#1}&&)::{lambda({lambda()#1}&&)#2},
seastar::future<seastar::bool_class<seastar::stop_iteration_tag>
>::then_wrapped_nrvo<{lambda({lambda()#1}&&)#2},
compaction_manager::submit(table*)::{lambda()#1}::operator()()::{lambda()#1}&&>(seastar::rwlock_for_read<std::chrono::_V2::steady_clock>&)::{lambda()#1}::operator()()
const::{lambda(seastar::internal::promise_base_with_type<seastar::bool_class<seastar::stop_iteration_tag>
>&, auto:2&,
seastar::future_state<seastar::bool_class<seastar::stop_iteration_tag>
>&&)#1}, seastar::bool_class<seastar::stop_iteration_tag> >
--------

seastar::internal::repeater<compaction_manager::submit(table*)::{lambda()#1}>
--------

seastar::continuation<seastar::internal::promise_base_with_type<>,
seastar::future<>::finally_body<compaction_manager::submit(table*)::{lambda()#2},
false>, seastar::future<>::then_wrapped_nrvo<seastar::future<>,
compaction_manager::submit(table*)::{lambda()#2}>(compaction_manager::submit(table*)::{lambda()#2}&&)::{lambda()#1}::operator()()
const::{lambda(seastar::internal::promise_base_with_type<>&,
compaction_manager::submit(table*)::{lambda()#2}&,
seastar::future_state<>&&)#1}>
): retrying
Dez 15 09:01:25 mw-scylla08 scylla[7326]: [shard 3] compaction_manager
- compaction task handler sleeping for 300 seconds

Michael

<micha-1@fantasymail.de>

unread,

Dec 15, 2020, 10:53:57 AM12/15/20

to scylladb-users@googlegroups.com

On one of the two nodes I get get errors for one mc-...-big-Data.db file
like "malformed_sstable_exception" (non-null component after null
component in sstable)

The other 7 nodes don't show this.
Is it ok to remove the sstable with rm mc-xxxx-* (on the stopped node)
then restart it and repair the table?

Thanks

Michael

Am 15.12.20 um 12:18 schrieb Michael:

Botond Dénes

<bdenes@scylladb.com>

unread,

Dec 16, 2020, 2:35:14 AM12/16/20

to scylladb-users@googlegroups.com

Hi Michael,

On Tue, 2020-12-15 at 16:53 +0100, Michael wrote:
>
>
> On one of the two nodes I get get errors for one mc-...-big-Data.db
> file
> like "malformed_sstable_exception" (non-null component after null
> component in sstable)
>
> The other 7 nodes don't show this.
> Is it ok to remove the sstable with rm mc-xxxx-* (on the stopped
> node)
> then restart it and repair the table?

Yes, this should be fine if the other nodes don't have errors.

Is this a recently added node by any chance? Can you tell a bit about
your cluster? What version are you running, do nodes have the same
shard count? What is your schema? Is this happening with all tables or
just one?

FYI another user have seen theses errors, see
https://github.com/scylladb/scylla/issues/7623

> > y_clock>,

> > compaction_manager::submit(table*)::{lambda()#1}::operator()()::{la

Michael

<micha-1@fantasymail.de>

unread,

Dec 16, 2020, 3:16:59 AM12/16/20

to scylladb-users@googlegroups.com, Botond Dénes

Am 16.12.20 um 08:35 schrieb Botond Dénes:
> Hi Michael,

>
>
> Is this a recently added node by any chance? Can you tell a bit about
> your cluster? What version are you running, do nodes have the same
> shard count? What is your schema? Is this happening with all tables or
> just one?

Just one table.

It's a 8 node cluster using scylla version 4.2.1, just updated.
Each node has 1 CPU with 8 cores and 64GB RAM, 7TB hdd.

I checked the shard count and it's 1-7 on 7 nodes and 0-7 on one node,
does this matter?

After removing the mc-* files with the non null error and issuing a
repair, the non null errors are gone. But the first repair failed, now I
try a second "repair -pr" and see what happens.

Other node throw many of the "compaction failed.... not enough bytes,
expected .... but got ..... "

I also see "Writing large cell" for this table, what is a large cell?
Does this abort compactiion?

Michael

Botond Dénes

<bdenes@scylladb.com>

unread,

Dec 16, 2020, 3:30:53 AM12/16/20

to micha-1@fantasymail.de, scylladb-users@googlegroups.com

On Wed, 2020-12-16 at 09:16 +0100, Michael wrote:
>
>
> Am 16.12.20 um 08:35 schrieb Botond Dénes:
> > Hi Michael,
> >
> >
> > Is this a recently added node by any chance? Can you tell a bit
> > about
> > your cluster? What version are you running, do nodes have the same
> > shard count? What is your schema? Is this happening with all tables
> > or
> > just one?
>
> Just one table.
>
> It's a 8 node cluster using scylla version 4.2.1, just updated.
> Each node has 1 CPU with 8 cores and 64GB RAM, 7TB hdd.
>
> I checked the shard count and it's 1-7 on 7 nodes and 0-7 on one
> node,
> does this matter?

Yes, this means the new node has different shard count. This is
supported but apparently there is a bug related to this that causes the
symptoms you mention above. We couldn't find the bug yet

>
> After removing the mc-* files with the non null error and issuing a
> repair, the non null errors are gone. But the first repair failed,
> now I
> try a second "repair -pr" and see what happens.

What error did repair fail with (if any)?

>
>
> Other node throw many of the "compaction failed.... not enough bytes,
> expected .... but got ..... "

This is probably the symptom of the same bug.

>
> I also see "Writing large cell" for this table, what is a large cell?
> Does this abort compactiion?

A cell is a value for a certain column in a table. We have detectors
for large cells in place, as they often cause problems. This is not
fatal, but you might want to check which cells are large.

>
>
>
> Michael

Michael

<micha-1@fantasymail.de>

unread,

Dec 16, 2020, 3:54:37 AM12/16/20

to scylladb-users@googlegroups.com, Botond Dénes

Am 16.12.20 um 09:30 schrieb Botond Dénes:

> On Wed, 2020-12-16 at 09:16 +0100, Michael wrote:
>>
>> Am 16.12.20 um 08:35 schrieb Botond Dénes:
>>> Hi Michael,
>>>
>>>
>>> Is this a recently added node by any chance? Can you tell a bit
>>> about
>>> your cluster? What version are you running, do nodes have the same
>>> shard count? What is your schema? Is this happening with all tables
>>> or
>>> just one?

no, the node is not new, it was always an 8 node cluster.

>>
>>
>> I checked the shard count and it's 1-7 on 7 nodes and 0-7 on one
>> node,
>> does this matter?
>
> Yes, this means the new node has different shard count. This is
> supported but apparently there is a bug related to this that causes the
> symptoms you mention above. We couldn't find the bug yet

The node with one shard less was not the node with the "non null" exception.

And, it's only one table (out of 8). Is it possible that with one node
having one shard less, other nodes throw compaction errors?

If so, I stop the node, change the cpuset and restart it.

>> After removing the mc-* files with the non null error and issuing a
>> repair, the non null errors are gone. But the first repair failed,
>> now I
>> try a second "repair -pr" and see what happens.
>
> What error did repair fail with (if any)?

I will look through the logs.

> A cell is a value for a certain column in a table. We have detectors
> for large cells in place, as they often cause problems. This is not
> fatal, but you might want to check which cells are large.

yes, but what is large (in bytes)?

The table has a text field which mostly is short, but can degenerate and
be over 1MB in size

Michael

Botond Dénes

<bdenes@scylladb.com>

unread,

Dec 16, 2020, 6:14:08 AM12/16/20

to micha-1@fantasymail.de, scylladb-users@googlegroups.com

On Wed, 2020-12-16 at 09:54 +0100, Michael wrote:
>
> Am 16.12.20 um 09:30 schrieb Botond Dénes:
> > On Wed, 2020-12-16 at 09:16 +0100, Michael wrote:
> > >
> > > Am 16.12.20 um 08:35 schrieb Botond Dénes:
> > > > Hi Michael,
> > > >
> > > >
> > > > Is this a recently added node by any chance? Can you tell a bit
> > > > about
> > > > your cluster? What version are you running, do nodes have the
> > > > same
> > > > shard count? What is your schema? Is this happening with all
> > > > tables
> > > > or
> > > > just one?
>
> no, the node is not new, it was always an 8 node cluster.

So did you replace one of the nodes?

>
>
> > >
> > >
> > > I checked the shard count and it's 1-7 on 7 nodes and 0-7 on one
> > > node,
> > > does this matter?
> >
> > Yes, this means the new node has different shard count. This is
> > supported but apparently there is a bug related to this that causes
> > the
> > symptoms you mention above. We couldn't find the bug yet
>
>
> The node with one shard less was not the node with the "non null"
> exception.
>
> And, it's only one table (out of 8). Is it possible that with one
> node
> having one shard less, other nodes throw compaction errors?

We don't know exactly what is causing this yet. An investigation is
ongoing as another user already reported these errors.2

>
>
> If so, I stop the node, change the cpuset and restart it.

I think that should help the error from happening in new sstables.

>
>
> > > After removing the mc-* files with the non null error and issuing
> > > a
> > > repair, the non null errors are gone. But the first repair
> > > failed,
> > > now I
> > > try a second "repair -pr" and see what happens.
> >
> > What error did repair fail with (if any)?
>
> I will look through the logs.
>
> > A cell is a value for a certain column in a table. We have
> > detectors
> > for large cells in place, as they often cause problems. This is not
> > fatal, but you might want to check which cells are large.
>
>
> yes, but what is large (in bytes)?
>
> The table has a text field which mostly is short, but can degenerate
> and
> be over 1MB in size

The default warning limit is 1MB. You can change this via
compaction_large_cell_warning_threshold_mb config item.

>
>
> Michael
>
>

Michael

<micha-1@fantasymail.de>

unread,

Dec 16, 2020, 7:53:30 AM12/16/20

to scylladb-users@googlegroups.com, Botond Dénes

Am 16.12.20 um 12:14 schrieb Botond Dénes:

> On Wed, 2020-12-16 at 09:54 +0100, Michael wrote:
>> Am 16.12.20 um 09:30 schrieb Botond Dénes:
>>> On Wed, 2020-12-16 at 09:16 +0100, Michael wrote:
>> no, the node is not new, it was always an 8 node cluster.
>
> So did you replace one of the nodes?

no, we did update to 4.1, then to 4.2

>> The node with one shard less was not the node with the "non null"
>> exception.
>>
>> And, it's only one table (out of 8). Is it possible that with one
>> node
>> having one shard less, other nodes throw compaction errors?
>
> We don't know exactly what is causing this yet. An investigation is
> ongoing as another user already reported these errors.2
>
>>
>> If so, I stop the node, change the cpuset and restart it.
>
> I think that should help the error from happening in new sstables.

ok, this takes some time as all tables get resharded.

>
>> yes, but what is large (in bytes)?
>> The table has a text field which mostly is short, but can degenerate
>> and
>> be over 1MB in size
>
> The default warning limit is 1MB. You can change this via
> compaction_large_cell_warning_threshold_mb config item.

ok, thanks, for the answers.

I will see if those non null go away, still there, and some of the kind
"end of input, but not end of partition" appeared. I don't know yet, why
this happens on only one table, thankfully on a small one (40GB)

Michael

Botond Dénes

<bdenes@scylladb.com>

unread,

Dec 16, 2020, 10:09:32 AM12/16/20

to micha-1@fantasymail.de, scylladb-users@googlegroups.com

On Wed, 2020-12-16 at 13:53 +0100, Michael wrote:
>
> Am 16.12.20 um 12:14 schrieb Botond Dénes:
> > On Wed, 2020-12-16 at 09:54 +0100, Michael wrote:
> > > Am 16.12.20 um 09:30 schrieb Botond Dénes:
> > > > On Wed, 2020-12-16 at 09:16 +0100, Michael wrote:
> > > no, the node is not new, it was always an 8 node cluster.
> >
> > So did you replace one of the nodes?
>
> no, we did update to 4.1, then to 4.2

Oh, so the cluster didn't change and this started happening after the
update? Which version did you update from?

Michael

<micha-1@fantasymail.de>

unread,

Dec 16, 2020, 12:02:20 PM12/16/20

to scylladb-users@googlegroups.com, Botond Dénes

Am 16.12.20 um 16:09 schrieb Botond Dénes:

>
>>> So did you replace one of the nodes?
>> no, we did update to 4.1, then to 4.2
>
> Oh, so the cluster didn't change and this started happening after the
> update? Which version did you update from?

from 4.0

> ok, this takes some time as all tables get resharded.
>>

I restarted the node, it resharded tables, the last one took over
8000sec for 540GB, and now the scylla log is quiet for 2 hours, the node
is not up.

Now, what to do? Just kill it and restart it again, the resharding seems
complete...

Top shows 6 thread running with 30% cpu and some more with less than 1%
cpu. As nothing gets logged now, I have no idea what the database does.

Busy spinning and waiting for.... something?

Michael

Michael

<micha-1@fantasymail.de>

unread,

Dec 17, 2020, 4:08:30 AM12/17/20

to scylladb-users@googlegroups.com, Botond Dénes

Am 16.12.20 um 18:02 schrieb Michael:

After silence in the log for nearly 3 hours, the next message appeared
and 7 hours later the startup was complete.

I think there should be some hint periodically in the log what the
database does.

As for the table with the non null error: a repair on one of the nodes
fails instantly:

repair id 10 failed: std:out_of_range ()

What does this indicate? Is this an effect of the resharding? Isn't the
sharding something private to a node? How can it effect the other nodes?

Is this somehow recoverable?

Michael

Botond Dénes

<bdenes@scylladb.com>

unread,

Dec 17, 2020, 4:25:55 AM12/17/20

to scylladb-users@googlegroups.com, Raphael S. Carvalho, Asias He

What was the next message?

>
> I think there should be some hint periodically in the log what the
> database does.

Raphael, is this a reshape?

>
> As for the table with the non null error: a repair on one of the
> nodes
> fails instantly:
>
> repair id 10 failed: std:out_of_range ()
>
> What does this indicate? Is this an effect of the resharding? Isn't
> the
> sharding something private to a node? How can it effect the other
> nodes?
>
> Is this somehow recoverable?

Asias?

>
>
> Michael
>

Michael

<micha-1@fantasymail.de>

unread,

Dec 17, 2020, 6:16:36 AM12/17/20

to scylladb-users@googlegroups.com, Botond Dénes, Raphael S. Carvalho, Asias He

Am 17.12.20 um 10:25 schrieb Botond Dénes:

> On Thu, 2020-12-17 at 10:08 +0100, Michael wrote:
>>
>>

>>
>> After silence in the log for nearly 3 hours, the next message
>> appeared
>> and 7 hours later the startup was complete.
>
>
> What was the next message?

Here you see the gap in the log for 2.5 hours:

Dez 16 15:03:53 scylla01 scylla[8272]: [shard 0] compaction - Resharding...
Dez 16 15:03:53 scylla01 scylla[8272]: [shard 0] database - Resharded
546 GB in 8766.15 secconds, 62 MB/s

Dez 16 17:21:16 scylla01 scylla[8272]: [shard 6] compaction - Resharded
3 sstables to...
Dez 16 17:21:16 scylla01 scylla[8272]: [shard 6] compaction - Resharding...

Michael

Raphael S. Carvalho

<raphaelsc@scylladb.com>

unread,

Dec 17, 2020, 11:02:36 AM12/17/20

to micha-1@fantasymail.de, ScyllaDB users, Botond Dénes, Asias He

On Thu, Dec 17, 2020 at 8:16 AM Michael <mic...@fantasymail.de> wrote:
>
>
>
> Am 17.12.20 um 10:25 schrieb Botond Dénes:
> > On Thu, 2020-12-17 at 10:08 +0100, Michael wrote:
> >>
> >>
>
> >>
> >> After silence in the log for nearly 3 hours, the next message
> >> appeared
> >> and 7 hours later the startup was complete.
> >
> >
> > What was the next message?
>
> Here you see the gap in the log for 2.5 hours:
>
> Dez 16 15:03:53 scylla01 scylla[8272]: [shard 0] compaction - Resharding...
> Dez 16 15:03:53 scylla01 scylla[8272]: [shard 0] database - Resharded
> 546 GB in 8766.15 secconds, 62 MB/s
>
> Dez 16 17:21:16 scylla01 scylla[8272]: [shard 6] compaction - Resharded
> 3 sstables to...

looks like shard 6 was resharding when log was silent. I'd love to
take a look at the full log, could you please upload it?

how many tables and shards do you have? i am also interested in the
schema of your tables.

Michael

<micha-1@fantasymail.de>

unread,

Dec 18, 2020, 4:52:49 AM12/18/20

to scylladb-users@googlegroups.com, Raphael S. Carvalho, Botond Dénes, Asias He

Am 17.12.20 um 17:02 schrieb Raphael S. Carvalho:

>
> looks like shard 6 was resharding when log was silent. I'd love to
> take a look at the full log, could you please upload it?
>
> how many tables and shards do you have? i am also interested in the
> schema of your tables.
>

The log is not that small, how to upload this?

I can filter out the "writing large cell / row" messages.

8 nodes, 7 shards per node, 16 tables, replication 3

The schema of the table which has the errors is:

id blob ; < 100 bytes

id2 int

f1 text ; < 100 bytes

f2 text ; < 10000 bytes (90%), some > 1MB

f3 int static

primary key ((id, id2), f1)

with SizeTieredCompactionStrategy, LZ4Compressor

Most schemas are similar to this one.

The f3 field size often cause "writing large cell" messages, but I'm not
sure if or how to fix this at the moment.

Thanks,

Michael

Moreno Garcia

<moreno@scylladb.com>

unread,

Dec 18, 2020, 10:18:58 AM12/18/20

to scylladb-users@googlegroups.com, Raphael S. Carvalho, Botond Dénes, Asias He

On Fri, Dec 18, 2020 at 6:52 AM Michael <mic...@fantasymail.de> wrote:

Am 17.12.20 um 17:02 schrieb Raphael S. Carvalho:
>
> looks like shard 6 was resharding when log was silent. I'd love to
> take a look at the full log, could you please upload it?
>
> how many tables and shards do you have? i am also interested in the
> schema of your tables.
>
The log is not that small, how to upload this?

https://docs.scylladb.com/troubleshooting/report_scylla_problem/#send-files-to-scylladb-support

Maybe it would be a good idea to open a github issue to track the problem? https://github.com/scylladb/scylla/issues

I can filter out the "writing large cell / row" messages.

https://docs.scylladb.com/troubleshooting/large_partition_table/

https://docs.scylladb.com/troubleshooting/large_rows_large_cells_tables/

8 nodes, 7 shards per node, 16 tables, replication 3

The schema of the table which has the errors is:

id blob ; < 100 bytes

id2 int

f1 text ; < 100 bytes

f2 text ; < 10000 bytes (90%), some > 1MB

f3 int static

primary key ((id, id2), f1)

with SizeTieredCompactionStrategy, LZ4Compressor

Most schemas are similar to this one.

The f3 field size often cause "writing large cell" messages, but I'm not
sure if or how to fix this at the moment.

Thanks,

Michael

--
You received this message because you are subscribed to the Google Groups "ScyllaDB users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scylladb-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scylladb-users/740e415e-79c1-b057-7e90-6a8d38b4f8de%40fantasymail.de.

--

Moreno Garcia

Solutions Architect

Check our Scylla Summit 2019 Presentations on Youtube

Reply all

Reply to author

Forward

compaction failed: seastar::internal::backtraced<marshal_exception>, scylla 4.2

Michael

Michael

Botond Dénes

Michael

Botond Dénes

Michael

Botond Dénes

Michael

Botond Dénes

Michael

Michael

Botond Dénes

Michael

Raphael S. Carvalho

Michael

Moreno Garcia