REPLACE OVERWRITE clarification

29 views
Skip to first unread message

richarde

unread,
Apr 18, 2024, 11:40:49 PMApr 18
to Druid User
Hi All,

I want to cleanup some data from a table and executed this to create a Test dataset. Worked just fine with the data removed.

REPLACE INTO Test OVERWRITE ALL
SELECT *
FROM TelegrafEvent
WHERE metric_group <> 'Lldp'
PARTITIONED BY MONTH

Is it safe to replace into the same table if there is a supervisor ingesting (Kafka) into the table? So if I execute this:

REPLACE INTO TelegrafEvent OVERWRITE ALL
SELECT *
FROM TelegrafEvent
WHERE metric_group <> 'Lldp'
PARTITIONED BY MONTH

The docs are not clear and I am not sure what will happen to the currently open segment held by the supervisor. I don't want to break anything or loose data.

Thanks.

John Kowtko

unread,
Apr 19, 2024, 8:54:10 AMApr 19
to Druid User
Hi Richard,

Unless you have concurrent append/replace enabled (HERE), the two ingestions will content for time interval locks, and streaming ingestion will win, and this REPACE job should fail/abort with a "lock revoked" error.

I suggest try it on a test table, set up streaming ingestion to it so you can see what happens to the job and how to identify from the logs and/or job report output that this is a lock revocation and not some other processing related error.

Let us know how it goes.

Thanks.  John

richarde

unread,
Apr 19, 2024, 8:57:42 PMApr 19
to Druid User
Ah ok, understood. Thanks for pointing out the new feature. What I may do is go slow and target only segments that are not being ingested into, wait for that segment to close and then do the final segment.

So my next question, if I use the REPLACE OVERWRITE with a TIMESTAMP do I need to use the corresponding date range in the SELECT query or can I still use the generic

SELECT *
FROM TelegrafEvent
WHERE metric_group <> 'Lldp'

REPLACE INTO <target table>
OVERWRITE WHERE __time >= TIMESTAMP '<lower bound>' AND __time < TIMESTAMP '<upper bound>'
< SELECT query >
PARTITIONED BY <time granularity>
[ CLUSTERED BY <column list> ]

John Kowtko

unread,
Apr 20, 2024, 10:37:09 AMApr 20
to Druid User
Hi Richard, it should fail with an "out of bounds" error ... e.g. ...

Screenshot 2024-04-20 at 7.36.59 AM.png

richarde

unread,
Apr 20, 2024, 5:47:59 PMApr 20
to Druid User
Awesome, thanks John. So I cannot break anything which is good.
Reply all
Reply to author
Forward
0 new messages