jdbc driver

1,999 views
Skip to first unread message

Dmitry Zaytsev

unread,
Jun 15, 2016, 11:38:33 AM6/15/16
to ClickHouse
Hi there, will jdbc driver be available for download ?

man...@gmail.com

unread,
Jun 15, 2016, 4:30:35 PM6/15/16
to ClickHouse
It is not available yet, but I hope, it will be available soon.
Soon is about a month.

Ivan Žužak

unread,
Jun 17, 2016, 7:18:33 PM6/17/16
to ClickHouse
+1. It would be great the JDBC driver supports the real-time stats for the queries that are executed. The HTTP API doesn't seem to support it and TCP API is not documented but if you can open-source the JDBC driver, I'm willing to help with that.

man...@gmail.com

unread,
Jun 18, 2016, 12:00:55 PM6/18/16
to ClickHouse
By the way, information about currently executed queries is available with SHOW PROCESSLIST query or system.processes table.

Also, there are query log for all queries that are started or finished executing. It is enabled with 'log_queries' setting, and written to system.query_log table. Enable with SET log_queries = 1 for current session; or add log_queries=1 as URL parameter in HTTP interface; or write SET GLOBAL log_queries = 1 to enable for all new sessions; or add <log_queries>1</log_queries> in users.xml in profiles/default to enable globally.

Ivan Žužak

unread,
Jun 19, 2016, 5:44:12 PM6/19/16
to ClickHouse
HTTP Api doesn't return the query id, how can I match the query ids returned from SHOW PROCESSLIST and the query that I execute via HTTP client? Maybe you can set query_id header and flush the socket buffer for http connection, wait the query to finish and write the response body but I'm not sure if the HTTP clients can read the response headers before the server writes the body and closes the connection.

Also users will probably want to INSERT data to ClickHouse efficiently and I gues that there's huge difference between Values format and Native format. Which formats are the most efficient ones for inserting data to TinyLog tables? If it's worth for the performance, maybe you can consider documenting the Native format? It seems that JSON format also returns column names and the column types and probably used for JDBC, am I right?

Ivan Žužak

unread,
Jun 19, 2016, 9:33:55 PM6/19/16
to ClickHouse
Alternatively, the HTTP API may support server-sent content type and stream the query stats in the content type is server-sent. I think either TCP API should be well-documented or HTTP API should me feature-rich so that users pick up one the APIs and write client libraries for different programming languages.

man...@gmail.com

unread,
Jun 19, 2016, 11:35:30 PM6/19/16
to ClickHouse
HTTP Api doesn't return the query id, how can I match the query ids returned from SHOW PROCESSLIST and the query that I execute via HTTP client?

It is possible to provide your own query_id when using HTTP interface.
If query with identical query_id is already exists withing queries of same user, then there are two options:
- exception is thrown (default);
- existing query is cancelled (with parameter replace_running_query set to 1);

https://clickhouse.yandex/reference_en.html#replace_running_query

man...@gmail.com

unread,
Jun 19, 2016, 11:57:41 PM6/19/16
to ClickHouse
Also users will probably want to INSERT data to ClickHouse efficiently and I gues that there's huge difference between Values format and Native format.

Yes, Native format is more efficient.
Also, RowBinary format is more efficient than Values, TabSeparated.

Amount of difference depends on:
- where the data comes from, could your application generate data at limiting speed;
- how much time spend on application to escape strings and format numbers in text form;
- what kind of tables you INSERT data into: when INSERTing to MergeTree, its slower than to TinyLog, Log, because MergeTree sorting blocks of data.

Rough estimate of expected INSERT performance on server side provided here:
https://clickhouse.yandex/reference_en.html#Performance%20on%20data%20insertion.

In case when you write data at real time and never delete it, throughput per single server will be not too much, because otherwise, servers will be full in few days.

For example, if you have advertisement network with 1 000 000 events/second, each event 1 KiB uncompressed (incoming traffic is 8 Gbit), data is compressed 10 times, it is 86 TB/day uncompressed, 8.6 TB/day compressed; when your servers have 6 HDD x 8 TB in RAID-5 and 2x replication factor, you need about 160 servers to store data for year. And if incoming data is evenly distributed per that servers, then it is just 12 500 rows/second per server, which is very small number. If you have servers with 24 HDD disk shelves, traffic per server will be 50 000 rows/second, which is also rather small.

Performance of INSERTing data will be limiting factor when you need to INSERT data for long period at once. Or when you store data for short period and incoming traffic is much higher.

man...@gmail.com

unread,
Jun 20, 2016, 12:03:48 AM6/20/16
to ClickHouse

It seems that JSON format also returns column names and the column types and probably used for JDBC, am I right?

Yes, JDBC driver, that we use internally, using HTTP interface and JSONCompact format.
Redundancy in JSON format doesn't matter much, if we read 'ready' reports to show.
But when read intermediate data to postprocessing, better to use another format.

Ivan Žužak

unread,
Jun 20, 2016, 8:27:56 AM6/20/16
to ClickHouse
Is there any documentation about RowBinary format? The other formats are human-readable so easy to generate, however most probably RowBinary uses custom binary encoding functions. For example we're using Java and there no unsigned numeric types so we don't know how to encode UInt64 and order unsigned types.

We're planning to create a StripeLog table for each API server and insert data to those tables in micro-batches (1 second), each API server will be able to INSERT data to its own table so there will be only one write request for a table. Another process will fetch data from these StripeLog tables, delete them and write it to ReplicatedMergeTree periodically every hour. We're also considering creating another tables for micro-batches (1 second per API server) in order to make the tables immutable but probably we will need to test the performance of it.

When querying data, we will merge the StripeLog and ReplicatedMergeTree table with UNION ALL so that the system will be real-time. Most of the queries will look like this one

SELECT ... FROM (SELECT * FROM distributedstripelog UNION ALL SELECT * FROM distributedmergetree) ...

However; we couldn't find how to replicate the stripelog tables. We also need to replicate them so that if a ClickHouse worker fails, we must not lose data. ReplicatedStripeLog and ReplicatedTinyLog table engines doesn't seem to be valid but the documentation states 

The 'Replicated' prefix is added to the table engine name. For example, ReplicatedMergeTree.

However replicated prefix is available only for MergeTree tables. If replication is not available for StripeLog tables and writing data to MergeTree tables in micro-batches is expensive, we need to find another way to make the system real-time. Maybe we can use a distributed file-system such as GlusterFS or something and do not use Replicated table engines but we would rather stick with Clickhouse replication since it's well-tested and probably have better performance. Do you have any suggestion, how do handle this issue at Yandex?

Thanks for the answers BTW, you're really helpful.

man...@gmail.com

unread,
Jun 20, 2016, 4:20:02 PM6/20/16
to ClickHouse
Quick description of RowBinary format:

1. Data is written in rows order, without any headers or delimiters.
2. Numbers is written in little endian, fixed width. For example, UInt64 takes 8 bytes.
3. DateTime is written as UInt32 with unix timestamp value.
4. Date is written as UInt16 with number of days since 1970-01-01 in value.
5. String is written as length in varint format and then bytes of string. Varint is written as in Protobuf, see "Base 128 varints" here: https://developers.google.com/protocol-buffers/docs/encoding
6. FixedString is written as just bytes.
7. Array is written as length in varint format and then all elements, contiguously.

Example:

$ clickhouse-client --query="SELECT toUInt64(1), '1'" --format=RowBinary | xxd
0000000: 0100 0000 0000 0000 0131                 .........1

man...@gmail.com

unread,
Jun 20, 2016, 4:26:58 PM6/20/16
to ClickHouse
Most of the queries will look like this one SELECT ... FROM (SELECT * FROM distributedstripelog UNION ALL SELECT * FROM distributedmergetree) ...

I have checked right now... Removing unneeded columns from subquery doesn't work when there are UNION ALL inside.

When you write

SELECT avg(RegionID) FROM (SELECT * FROM test.hits)

it works fine, only RegionID column is read from test.hits table.

And if you write

SELECT avg(RegionID) FROM (SELECT * FROM test.hits UNION ALL SELECT * FROM test.hits)

all columns are read, an query works slow.

It is a flaw.

man...@gmail.com

unread,
Jun 20, 2016, 4:38:16 PM6/20/16
to ClickHouse
However; we couldn't find how to replicate the stripelog tables. We also need to replicate them so that if a ClickHouse worker fails, we must not lose data. ReplicatedStripeLog and ReplicatedTinyLog table engines doesn't seem to be valid but the documentation states

I reccomend, for development, first to write data in batches directly to ReplicatedMergeTree and only then think about storing of intermediate data.

StripeLog is not replicated.
Our application use immutable StripeLog tables, that is manually written to 3 different locations; then it writes information about where tables are located to ZooKeeper.

Also, we have 'cloud databases' functionality in development (scheduled for 1-2 months). Its intent is to provide exactly same capabilities in ClickHouse.
In short, it is a special kind of databases, which have identical view to "cloud", where tables are stored in N locations, and CREATE/DROP/RENAME operations are atomic.

man...@gmail.com

unread,
Jun 20, 2016, 4:41:15 PM6/20/16
to ClickHouse
Maybe we can use a distributed file-system such as GlusterFS or something and do not use Replicated table engines

It's not recommended to use any distributed filesystems because of higher network traffic and slower IO.

Ivan Žužak

unread,
Jun 20, 2016, 6:40:43 PM6/20/16
to ClickHouse
Currently we use Raptor connector of Prestodb which is quite similar to Clickhouse. It uses ORC format and store the files in local filesystem, however; instead of replication, it provides backup stores such as AWS S3 (document store) and GlusterFS. I actually think that using another layer for backup data is a more clean and cheap solution. We use SSD for hot data and S3 for backups. Since the files are immutable similar to Clickhouse, object stores such as S3 are good alternative for backups. If a failure occurs, it fetches data from S3 (cost-efficient but slow compared to SSD), since the data is always in local filesystem (SSD) it works great. However; we're not experienced C++ developers so I guess it could be hard for us to implement this mechanism to ClickHouse.

When we evaluated ClickHouse the results were great compared to Prestodb. Even though the columnar storage optimizations for ORC and Clickhouse is quite similar, Clickhouse uses CPU and Memory resources more efficiently (Presto also uses vectorized execution but cannot take advantage of hardware level optimizations such as SIMD instruction sets because it's written in Java so that's fair) so we also wanted to add support for Clickhouse for our open-source analytics platform Rakam (https://github.com/rakam-io/rakam)

Clickhouse is capable of executing complex analytical queries such as funnel and retention queries and that's great but we also need to make the system real-time. I think manually writing data to multiple StripeLog tables would be hard for us so currently we may use Kafka and write the data to ReplicatedMergeTree tables each minute in micro-batches.

My understanding is that we should wait you to make the replication and scaling (adding / removing nodes) easy to configure and convenient. Otherwise, when you release new features such as 'cloud databases' we will need to adapt them anyway.

One last question: The OPTIMIZE query doesn't seem to affect the MergeTree tables. Each insert to MergeTree table creates a new directory and new set of files for columns so if we perform too much INSERT queries, I guess it will eventually become inefficient. Does ClickHouse automatically merge the the files in order to make them efficient for querying?

BTW, We already developed a prototype ClickHouse client using HTTP API for Rakam but if you open-source JDBC client, we would be happy to contribute. I think ClickHouse may be widely-adopted technology in near-future with the help of the community.

Ivan Žužak

unread,
Jun 20, 2016, 11:06:43 PM6/20/16
to ClickHouse
Never mind, it seems ClickHouse automatically merges the files eventually. It turns out that the INSERT performance of MergeTree is not that bad even if we push data in small intervals. Also I'm able to use the RowBinary format with Java, it's more compact and the performance is better than human-readable formats. Thanks for the guideline.

We evaluated some AWS S3 backed filesystems that writes data to both local filesystem & S3 and read from local filesystem but since ClickHouse creates single file for each column of each batch, it didn't work as we expected because we need to put each file separately to S3 which takes some time since each operation is a single network request. ClickHouse replication seems to be well-tested and used in production so we decided to stick with that. 

man...@gmail.com

unread,
Jun 21, 2016, 10:55:43 AM6/21/16
to ClickHouse
Since the files are immutable similar to Clickhouse, object stores such as S3 are good alternative for backups. If a failure occurs, it fetches data from S3

 
Just for clarification: files are immutable, but they are constantly merged in background (as you see), and old files get removed (not instantly, after 8 minutes after merge). So, if just do backup of all files, its size will be much large that size of dataset.



When we evaluated ClickHouse the results were great compared to Prestodb. Even though the columnar storage optimizations for ORC and Clickhouse is quite similar, Clickhouse uses CPU and Memory resources more efficiently (Presto also uses vectorized execution but cannot take advantage of hardware level optimizations such as SIMD instruction sets because it's written in Java so that's fair) so we also wanted to add support for Clickhouse for our open-source analytics platform Rakam (https://github.com/rakam-io/rakam)

Nice to hear! I appreciate, if it would be used in Rakam.


use Kafka and write the data to ReplicatedMergeTree tables each minute in micro-batches.

Reasonable.


 We already developed a prototype ClickHouse client using HTTP API for Rakam but if you open-source JDBC client, we would be happy to contribute.

We are going to open-source JDBC driver.

man...@gmail.com

unread,
Jul 18, 2016, 3:47:11 PM7/18/16
to ClickHouse
JDBC driver has been open-sourced: https://github.com/yandex/clickhouse-jdbc

Just sources right now; pre-built binaries is expected in few days.
Reply all
Reply to author
Forward
0 new messages