What is the preferred method for inserting 1 million rows of data each minute?

4,791 views
Skip to first unread message

tdingus

unread,
Jun 2, 2017, 11:33:27 AM6/2/17
to ClickHouse
Hi,

I would like to try clickhouse out. The use case is to insert 1 million records each minute, each row is 30 columns (about 300 bytes in each ROW).

a. What is the preferred method to do this insertion?

b. I may also need to delete old historical data to make room for the new data. Is there a such thing as bulk delete?

Thanks,

 - tdingus

tdingus

unread,
Jun 2, 2017, 11:47:22 AM6/2/17
to ClickHouse
For question, I meant to ask what the preferred interface to use,
command line client, HTTP interface, JDBC, or Native interface?

Alex Zatelepin

unread,
Jun 2, 2017, 1:48:54 PM6/2/17
to ClickHouse
Hi, tdingus!

a. 

First, JDBC driver formats data in TSV format and inserts it via HTTP, so in this regard they are the same.
Command line client parses the data and then transmits it via native interface, which is more efficient.
BTW which connector library do you imply when you say Native interface? If your data is not originally in textual form, then transmitting it via native interface without intermediate serialization into something like TSV is preferable but actual performance may depend on the maturity of the library.

If your data is originally in TSV form, then just use command-line client.

b.

Yes, you can delete data for the whole month using ALTER TABLE DROP PARTITION command. See documentation: https://clickhouse.yandex/docs/en/query_language/queries.html#manipulations-with-partitions-and-parts

tdingus

unread,
Jun 2, 2017, 2:56:16 PM6/2/17
to ClickHouse
Thanks, Alex.

a. Could you point me to the online document about how to use the native interface? 
    My data can be in any format as long as it is fast.

b. From the link, it seems to me that I can only delete a partition, not part(s). Is that right?
    For my use case, there would be lots of data, 1million (each minute) x 60 (minutes) x 24 (hours) x 30 (days).
    But, if it is the only way available, it is probably OK.

Thanks again,

 - tdingus

Alex Zatelepin

unread,
Jun 2, 2017, 3:51:13 PM6/2/17
to ClickHouse
We regard native interface as internal, so it is undocumented. But it is actually not so complicated and forward/backward compatibility is maintained. So other developers reimplemented it in their libraries using ClickHouse source code as documentation.

Here are some examples:

You can try using them.

b. Yes, you can only drop whole partitions.

mvav...@cloudflare.com

unread,
Jun 7, 2017, 2:19:00 PM6/7/17
to ClickHouse
We use https://github.com/kshvakov/clickhouse for insertions using the direct mode, each inserter can write around 100k events/s.
It reads from a Kafka topic with 81 partitions, so it can insert up to 8M events each second.
The important thing is using the direct mode to avoid sql driver conversion costs and choosing a good batch size.

tdingus

unread,
Jun 7, 2017, 3:52:12 PM6/7/17
to ClickHouse
It is good to know that clickhouse works with kafka. Could you provide me more info, or examples on the kafka setup for the clickhouse?

Thanks

mvav...@cloudflare.com

unread,
Jun 7, 2017, 5:09:04 PM6/7/17
to ClickHouse
I don't know what you're looking for specifically, see https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
There isn't anything special about the Kafka setup, you have to write a service that reads events from Kafka and writes to ClickHouse in batches.
See the https://github.com/kshvakov/clickhouse repo for examples.
Reply all
Reply to author
Forward
0 new messages