How to remove stuck DROP STREAM when ksql states it is not ready to serve requests?

694 views
Skip to first unread message

Daniel

unread,
Jun 22, 2021, 8:58:02 AM6/22/21
to ksqldb-users
Hi, we are literally stuck with our 3 ksqldb servers (0.15 docker images) after a failed DROP STREAM via a CI/CD pipeline. The associated INSERTQUERY should have been terminated but that failed due to a timeout. Then unfortunatly the DROP STREAM has been attempted in the script and of course failed. Since that the server is kind of stuck and not responding to REST API calls. Restarting the containers did not help, maybe that itself only helped to trash all the containers and not only one. Currently all three are only responding with "KSQL is not yet ready to serve requests".  How do we get out of this situation when the KSQL REST API is not accessible via CLI and curl. How do we terminate the INSERTQUERY for good? The logs show repeatedly: ERROR Stack trace: io.confluent.ksql.util.KsqlStatementException: Cannot drop <STREAMNAME>.
The following queries read from this source: [INSERTQUERY_1761].
The following queries write into this source: [].
You need to terminate them before dropping <STREAMNAME>.
Statement: drop stream  <STREAMNAME>  ;
        at io.confluent.ksql.engine.EngineExecutor.executeDdl(EngineExecutor.java:466)
        at io.confluent.ksql.engine.EngineExecutor.lambda$execute$0(EngineExecutor.java:113)
        at java.base/java.util.Optional.map(Optional.java:265)
        at io.confluent.ksql.engine.EngineExecutor.execute(EngineExecutor.java:113)
        at io.confluent.ksql.engine.KsqlEngine.execute(KsqlEngine.java:217)
        at io.confluent.ksql.rest.server.computation.InteractiveStatementExecutor.executePlan(InteractiveStatementExecutor.java:243)
        at io.confluent.ksql.rest.server.computation.InteractiveStatementExecutor.handleStatementWithTerminatedQueries(InteractiveStatementExecutor.java:198)
        at io.confluent.ksql.rest.server.computation.InteractiveStatementExecutor.handleRestore(InteractiveStatementExecutor.java:135)
        at io.confluent.ksql.rest.server.computation.CommandRunner.lambda$null$3(CommandRunner.java:276)
        at io.confluent.ksql.util.RetryUtil.retryWithBackoff(RetryUtil.java:89)
        at io.confluent.ksql.util.RetryUtil.retryWithBackoff(RetryUtil.java:60)
        at io.confluent.ksql.util.RetryUtil.retryWithBackoff(RetryUtil.java:41)
        at io.confluent.ksql.rest.server.computation.CommandRunner.lambda$processPriorCommands$4(CommandRunner.java:272)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at io.confluent.ksql.rest.server.computation.CommandRunner.processPriorCommands(CommandRunner.java:269)
        at io.confluent.ksql.rest.server.KsqlRestApplication.initialize(KsqlRestApplication.java:459)
        at io.confluent.ksql.rest.server.KsqlRestApplication.startKsql(KsqlRestApplication.java:385)
        at io.confluent.ksql.rest.server.KsqlRestApplication.startAsync(KsqlRestApplication.java:367)
        at io.confluent.ksql.rest.server.KsqlServerMain.tryStartApp(KsqlServerMain.java:89)
        at io.confluent.ksql.rest.server.KsqlServerMain.main(KsqlServerMain.java:64)
Caused by: io.confluent.ksql.util.KsqlReferentialIntegrityException: Cannot drop <STREAMNAME>.
The following queries read from this source: [INSERTQUERY_1761].
The following queries write into this source: [].
You need to terminate them before dropping <STREAMNAME>.
        at io.confluent.ksql.engine.EngineContext.throwIfInsertQueriesExist(EngineContext.java:307)
        at io.confluent.ksql.engine.EngineContext.executeDdl(EngineContext.java:266)
        at io.confluent.ksql.engine.EngineExecutor.executeDdl(EngineExecutor.java:462)
        ... 19 more

Cheers, Daniel

Sergio Pena Anaya

unread,
Jun 22, 2021, 6:07:16 PM6/22/21
to Daniel, ksqldb-users
Hey Daniel,

Sad you're experiencing this. I'm not sure how you got in this situation, though. I think the only way to fix it is to remove the stream and insert query from the command topic (it involves regenerating the command topic), but that would cause other issues if not done correctly. 

First, you will need to stop the ksql servers, then backup the command topic to a file.
Second, you can remove the bad statements from the file and restore the command topic.
Third, start the ksql servers.

The only problem with the above steps is that queries have a number based on the topic offset and/or transaction. But I assume the TERMINATE and DROP statements were the last ones executed, so we may get rid of this problem in that case.

Here's a PR where we added the restore command in 0.15 - https://github.com/confluentinc/ksql/pull/6361
The backup file is just a file with the data shown in the command topic.  You can see the format of the backup file here (https://github.com/confluentinc/ksql/blob/master/design-proposals/klip-31-metastore-backups.md). Basically, it is just lines with KEY:VALUE format (as it appears in the command topic).

I'm happy to help on this issue in Slack if you join.

--
You received this message because you are subscribed to the Google Groups "ksqldb-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ksql-users+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/ksql-users/38a29c2e-3f30-4a01-b8cb-b3c1a77d1f50n%40googlegroups.com.

Daniel

unread,
Jun 22, 2021, 6:47:50 PM6/22/21
to ksqldb-users

Hey Sergio, thanks for the reply! I was hoping for an action plan like this. We'll try this first thing in the morning. We already suspected that without somehow getting this statement out of the command topic it will always be fed again into a new ksql Server with the same ksql_cluster name. I wasn't around when it got stuck but the logs I got from the DROP STREAM showed the correct order. First TERMINATE INSERTQUERY_1761, but that got into a timeout and failed. Then the pipeline continued with DROP Stream which should not have been attempted after the failed termination of the persistent query. Maybe there was a bit too much load on the system at the time. 
Feel free to continue there.
Thanks a lot! 

Daniel

unread,
Jun 24, 2021, 6:11:04 AM6/24/21
to ksqldb-users
Hi Sergio, we were not aware that there is an automated metadata backup is available in KSQL. That would have helped a lot I guess. I did not see it mentioned in https://docs.ksqldb.io/en/latest/reference/server-configuration/ especially the parameter documentation, so is this not yet available for general use? We currently have the 0.15 Docker image and the brokers running on the 5.5.x rpm. 
We tried to reconstruct the command topic, but failed to retain the streams. Maybe it was an issue with the offset since we saw that there were a few statements after the DROP STREAM. The TERMINATE however did not make it in the command topic. I guess that was due to the timeout that was shown in KSQL. However around the time the DROP STREAM was issued we saw the last activities on the command topic files. 
Do you have any documentaion links or blog entries regarding the metadata backup. We need to recover gracefully from a metadata desaster which we unfortunatly (and you as well) can not explain. 

Jim Galasyn

unread,
Jun 25, 2021, 1:52:23 PM6/25/21
to ksqldb-users
Re docs for ksql-restore-metadata, I've opened internal ticket DOCS-8831 to track this issue. Thank you for the feedback!

Jim Galasyn
Staff technical Writer at Confluent
Reply all
Reply to author
Forward
0 new messages