Cannot parse input (и похоже дело не в самих данных)

6,028 views
Skip to first unread message

Aleksey Studnev

unread,
Feb 2, 2018, 3:32:00 AM2/2/18
to ClickHouse
Странная ошибка, видимо как-то связанная с данными но непонятно как.
Путем экспериментов удалось выяснить что при небольшом изменении в данных ошибка исчезает.
Например, достаточно просто поменять порядок или исключить несколько ( от 2-х ) первых записей
Сами первые записи грузятся без проблем при этом.

прикрепляю файл, trace и схема ниже

Кто может посоветовать как это дальше дебажить?

Получаю ошибку:

<Error> HTTPHandler: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected , before: 76043,2017,7,12,3,16,1,4013462,\'0x3148a5385906ff85b007b47dd71cdce3e8bc5fcb7bdd17b8406664b3d39f6aac\',\'0xd997e1beb7450d6d07085a6357451364b992844e\',\'0xd997e1beb745: (at row 722)


при загрузке данных при помощи запроса:


http://clichouse?query=INSERT%20INTO%20references(tx_date,tx_time,year,month,mday,wday,hour,blockchain_id,block,tx_hash,tx_sender,tx_from,tx_to,address,smart_contract_address,smart_contract_id,signature_id,argument,external,internal)%20VALUES%20

схема таблицы:

CREATE TABLE references
(
tx_date Date,
tx_time DateTime,
year UInt16,
month UInt8,
mday UInt8,
wday UInt8,
hour UInt8,


blockchain_id UInt32,
block UInt32,
tx_hash String,
tx_sender String,
tx_from String,
tx_to String,
address String,
smart_contract_address String,
smart_contract_id UInt32,
signature_id UInt32,
argument String,

external UInt32,
internal UInt32

) ENGINE = MergeTree(tx_date, tx_date, 8192);
trace.log
references.txt

Vitaliy Lyudvichenko

unread,
Feb 6, 2018, 9:33:58 AM2/6/18
to ClickHouse

Формат VALUES подразумевает, что значения DateTime должны передаваться в формате '2017-07-12 19:14:03'.

Если вы будете передавать tx_time в таком формате, то всё будет вставляться (причем будет парситься даже быстрее).

Так же лучше на всякий случай выставить настройку input_format_values_interpret_expressions=0


Логичный вопрос почему при передаче tx_time как unix timestamp оно все-таки через раз работает?

Изначально CH пытается распарсить данные быстрым, специальным для формата VALUES  потоковым парсером.

Если парсинг спотыкается, то по-умолчанию происходит переключение на общий парсер (который умеет парсить любые SQL-выражения).

Это сделано для того, чтобы могли отрабатывать маленькие запросы типа INSERT INTO table VALUES (2 + 2 * 2).

К сожалению, общий парсер пока не умеет парсить большие объемы данных (больше 1 МБ, т.к. в этом случае данные разбиваются на несколько блоков).

Собственно в вашем случае так и происходит, когда у общего парсера получается отработать, все вставляется, когда нет - появляется такая непонятная ошибка.

Чтобы отключить общий парсер, надо выставить настройку input_format_values_interpret_expressions=0



пятница, 2 февраля 2018 г., 11:32:00 UTC+3 пользователь Aleksey Studnev написал:

Aleksey Studnev

unread,
Feb 6, 2018, 9:54:14 AM2/6/18
to ClickHouse
спасибо, попробую

Aleksey Studnev

unread,
Feb 6, 2018, 9:54:54 AM2/6/18
to ClickHouse
правда это объяснение не объясняет почему ошибка исчезает при перемещивании данных

On Tuesday, February 6, 2018 at 5:54:14 PM UTC+3, Aleksey Studnev wrote:
спасибо, попробую

Vitaliy Lyudvichenko

unread,
Feb 6, 2018, 10:41:55 AM2/6/18
to ClickHouse
Ну на самом деле объясняет (просто сложно и бесмысленно объяснять все детали реализации).
Если запись какого-то tx_time попадает на границу 1MB блока, то все падает.

вторник, 6 февраля 2018 г., 17:54:54 UTC+3 пользователь Aleksey Studnev написал:

kaushik subramanian

unread,
Oct 23, 2019, 2:59:39 PM10/23/19
to ClickHouse
Hello,
I saw the same issue and based on recommendation in this thread, I used YYYY-MM-DD hh:mm:ss format for timestamps along with setting input_format_values_interpret_expressions = 0. In spite of this I continue to see the issue. I am using build 19.13.1.11. Can you please help to understand why the workaround is not working?

Kaushik

Denis Zhuravlev

unread,
Oct 23, 2019, 4:32:14 PM10/23/19
to ClickHouse
>I saw the same issue and based on recommendation in this thread,
>I used YYYY-MM-DD hh:mm:ss format for timestamps along with setting input_format_values_interpret_
>expressions = 0. In spite of this I continue to see the issue. I am using build 19.13.1.11.
>Can you please help to understand why the workaround is not working?

What error? How do you insert data?

Where are thousands reasons for this error and they could be not related to DateTime.

create table Y (A Int64, B String) Engine=Memory;
insert into Y(A, B) Format TSV (5, ('a')
Exception on client:
Code: 27. DB::Exception: Cannot parse input: expected \t before: (5, (\'a\'): (at row 1)


input_format_values_interpret_expressions -- this setting enable streaming parser by force.
Sometimes it is needed for performance. It's not a workaround for any error.

kaushik subramanian

unread,
Oct 23, 2019, 4:44:56 PM10/23/19
to ClickHouse
Hi Denis,
Here is the error returned back from the database,
Error while inserting - test: { Error: Cannot parse input: expected , before: 180.28.44.179,92.168.133.152,443,www.saas30268.com,12,2,240,1,5,46474,36393,829,8333,51,89,87:22:71:62:32:46,TestPhoton,TestPhone,Kaushik,\n11,22,1,33,2019-10-21: (at row 9966)Could not print diagnostic info because two last rows aren't in buffer (rare case) (version 19.13.1.11 (official build))
    at parseError (/home/vagrant/work/velocloud.src/vco/src/server/node/node_modules/@apla/clickhouse/src/parse-error.js:2:15)
    at errorHandler (/home/vagrant/work/velocloud.src/vco/src/server/node/node_modules/@apla/clickhouse/src/clickhouse.js:29:13)
    at IncomingMessage.<anonymous> (/home/vagrant/work/velocloud.src/vco/src/server/node/node_modules/@apla/clickhouse/src/clickhouse.js:97:11)
    at emitNone (events.js:111:20)
    at IncomingMessage.emit (events.js:208:7)
    at endReadableNT (_stream_readable.js:1064:12)
    at _combinedTickCallback (internal/process/next_tick.js:139:11)
    at process._tickCallback (internal/process/next_tick.js:181:9) code: 27, scope: 'DB::Exception:' }


Here is the error logged in the database log file:
2019.10.23 20:35:36.780895 [ 164 ] {a72fbd57-52f3-4d40-9c51-4431c373fecc} <Error> HTTPHandler: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected , before: 180.28.44.179,92.168.133.152,443,www.saas30268.com,12,2,240,1,5,46474,36393,829,8333,51,89,87:22:71:62:32:46,TestPhoton,TestPhone,Kaushik,\n11,22,1,33,2019-10-21: (at row 9966)
Could not print diagnostic info because two last rows aren't in buffer (rare case)
, Stack trace:

0. clickhouse-server(StackTrace::StackTrace()+0x30) [0x7f58f30]
1. clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x3c427b5]
2. clickhouse-server() [0x37b4139]
3. clickhouse-server(DB::CSVRowInputStream::read(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0x5d6) [0x7580536]
4. clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x170) [0x757d1c0]
5. clickhouse-server(DB::IBlockInputStream::read()+0x187) [0x6c04667]
6. clickhouse-server(DB::AddingDefaultsBlockInputStream::readImpl()+0x36) [0x6bfb4e6]
7. clickhouse-server(DB::IBlockInputStream::read()+0x187) [0x6c04667]
8. clickhouse-server(DB::InputStreamFromASTInsertQuery::readImpl()+0x28) [0x6e899f8]
9. clickhouse-server(DB::IBlockInputStream::read()+0x187) [0x6c04667]
10. clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x6b) [0x6c1c9db]
11. clickhouse-server(DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)+0x290) [0x6e889c0]
12. clickhouse-server(DB::HTTPHandler::processQuery(Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&)+0x1a5e) [0x3c8fc3e]
13. clickhouse-server(DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&)+0x45a) [0x3c91fba]
14. clickhouse-server(Poco::Net::HTTPServerConnection::run()+0x2a9) [0x7932009]
15. clickhouse-server(Poco::Net::TCPServerConnection::start()+0x10) [0x792cde0]
16. clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0xed) [0x792d4fd]
17. clickhouse-server(Poco::PooledThread::run()+0x81) [0x8057e21]
18. clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x3c) [0x8055bcc]
19. clickhouse-server() [0xba2baa0]
20. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f3cdb5f6184]
21. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f3cdae1903d]
 (version 19.13.1.11 (official build))


We are inserting from a CSV file. I have attached a sample file for you to see. Also i have set input_format_values_interpret_expressions to 0 as per recommendation (Assuming i am seeing the same problem)

- Kaushik
sample_insert.csv

Denis Zhuravlev

unread,
Oct 23, 2019, 5:14:41 PM10/23/19
to ClickHouse
something wrong (at row 9966)

try 
sed '9965q;d' sample_insert.csv
sed '9966q;d' sample_insert.csv
sed '9967q;d' sample_insert.csv

and provide the create table  ....

Denis Zhuravlev

unread,
Oct 23, 2019, 6:23:18 PM10/23/19
to ClickHouse
row 9966 came form error message


Error while inserting - test: { Error: Cannot parse input: expected , before: 180.28.44.179,92.168.133.152,
443,www.saas30268.com,12,2,240,1,5,46474,36393,829,8333,51,89,87:22:71:62:32:46,TestPhoton,TestPhone,Kaushik,\n11,22,1,33,2019-10-21: (at row 9966)


kaushik subramanian

unread,
Oct 23, 2019, 6:27:46 PM10/23/19
to ClickHouse
Hi Denis,
I have attached the create table command as per your request. Please note that when i rearrange the data, the insert goes just fine. Some times the data gets inserted without any issues and some times it fails. Please note that the data set i am trying to insert is the same. Researching on the issue got me to this thread where Aleksey also saw the data getting processed by simply rearranging it. Thoughts?

Kaushik
sample_create

kaushik subramanian

unread,
Oct 23, 2019, 6:44:42 PM10/23/19
to ClickHouse
Hi Denis,
Here is more information on the issue. I just caught an insert that failed and clickhouse-server log file complained about row 5050. 

2019.10.23 22:30:23.052581 [ 129 ] {b41290ae-61eb-4bf5-9f26-3a974aac2e19} <Error> executeQuery: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected , before: 1,8,51571,96625,7185,4627,49,100,64:33:35:55:32:84,TestPhoton,TestPhone,Kaushik,\n11,22,4,33,2019-10-21 07:43:58,2019-10-21 07:48:58,1,1234,54.193.11.251,148.81.: (at row 5050)
Could not print diagnostic info because two last rows aren't in buffer (rare case)
 (version 19.13.1.11 (official build)) (from 127.0.0.1:46113) (in query: INSERT INTO velocloud_stats.VELOCLOUD_FLOW_STATS FORMAT CSV ), Stack trace:

0. clickhouse-server(StackTrace::StackTrace()+0x30) [0x7f58f30]
1. clickhouse-server(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x3c427b5]
2. clickhouse-server() [0x37b4139]
3. clickhouse-server(DB::CSVRowInputStream::read(std::vector<COW<DB::IColumn>::mutable_ptr<DB::IColumn>, std::allocator<COW<DB::IColumn>::mutable_ptr<DB::IColumn> > >&, DB::RowReadExtension&)+0x5d6) [0x7580536]
4. clickhouse-server(DB::BlockInputStreamFromRowInputStream::readImpl()+0x170) [0x757d1c0]
5. clickhouse-server(DB::IBlockInputStream::read()+0x187) [0x6c04667]
6. clickhouse-server(DB::AddingDefaultsBlockInputStream::readImpl()+0x36) [0x6bfb4e6]
7. clickhouse-server(DB::IBlockInputStream::read()+0x187) [0x6c04667]
8. clickhouse-server(DB::InputStreamFromASTInsertQuery::readImpl()+0x28) [0x6e899f8]
9. clickhouse-server(DB::IBlockInputStream::read()+0x187) [0x6c04667]
10. clickhouse-server(DB::copyData(DB::IBlockInputStream&, DB::IBlockOutputStream&, std::atomic<bool>*)+0x6b) [0x6c1c9db]
11. clickhouse-server(DB::executeQuery(DB::ReadBuffer&, DB::WriteBuffer&, bool, DB::Context&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)>)+0x290) [0x6e889c0]
12. clickhouse-server(DB::HTTPHandler::processQuery(Poco::Net::HTTPServerRequest&, HTMLForm&, Poco::Net::HTTPServerResponse&, DB::HTTPHandler::Output&)+0x1a5e) [0x3c8fc3e]
13. clickhouse-server(DB::HTTPHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&)+0x45a) [0x3c91fba]
14. clickhouse-server(Poco::Net::HTTPServerConnection::run()+0x2a9) [0x7932009]
15. clickhouse-server(Poco::Net::TCPServerConnection::start()+0x10) [0x792cde0]
16. clickhouse-server(Poco::Net::TCPServerDispatcher::run()+0xed) [0x792d4fd]
17. clickhouse-server(Poco::PooledThread::run()+0x81) [0x8057e21]
18. clickhouse-server(Poco::ThreadImpl::runnableEntry(void*)+0x3c) [0x8055bcc]
19. clickhouse-server() [0xba2baa0]
20. /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184) [0x7f3cdb5f6184]
21. /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f3cdae1903d]

2019.10.23 22:30:23.053150 [ 129 ] {b41290ae-61eb-4bf5-9f26-3a974aac2e19} <Debug> MemoryTracker: Peak memory usage (total): 4.04 MiB.
2019.10.23 22:30:23.053190 [ 129 ] {b41290ae-61eb-4bf5-9f26-3a974aac2e19} <Debug> MemoryTracker: Peak memory usage (for query): 4.04 MiB.
2019.10.23 22:30:23.053229 [ 129 ] {b41290ae-61eb-4bf5-9f26-3a974aac2e19} <Error> HTTPHandler: Code: 27, e.displayText() = DB::Exception: Cannot parse input: expected , before: 1,8,51571,96625,7185,4627,49,100,64:33:35:55:32:84,TestPhoton,TestPhone,Kaushik,\n11,22,4,33,2019-10-21 07:43:58,2019-10-21 07:48:58,1,1234,54.193.11.251,148.81.: (at row 5050)


Here are the rows 5049 to 5051:
11,22,2,33,2019-10-21 07:43:58,2019-10-21 07:48:58,3,1234,54.193.11.251,57.4.38.141,162.137.98.146,443,www.saas96434.com,12,7,835,1,4,89623,82408,4379,6296,152,74,98:30:94:69:80:80,TestPhoton,TestPhone,Kaushik,
11,22,4,33,2019-10-21 07:43:58,2019-10-21 07:48:58,2,1234,54.193.11.251,11.12.33.141,88.176.69.26,443,www.saas48599.com,15,3,69,1,8,51571,96625,7185,4627,49,100,64:33:35:55:32:84,TestPhoton,TestPhone,Kaushik,
11,22,4,33,2019-10-21 07:43:58,2019-10-21 07:48:58,1,1234,54.193.11.251,148.81.53.165,111.189.102.180,443,www.saas99654.com,13,6,70,1,4,90090,61159,7887,7383,31,38,23:49:19:62:38:21,TestPhoton,TestPhone,Kaushik,

Also please note that when i inserted the same file again, it went through. Just wanted to mention this additional data point.

- Kaushik

Denis Zhuravlev

unread,
Oct 23, 2019, 6:46:37 PM10/23/19
to ClickHouse
your sample_insert.csv has excessive NewLine in the end.

cat sample_insert.csv |cl -q "insert into VELOCLOUD_FLOW_STATS format CSV"
Code: 27. DB::Exception: Cannot parse input: expected , before: \n: (at row 101)

I removed it and everything loaded just fine

cat sample_insert.csv |cl -q "insert into VELOCLOUD_FLOW_STATS format CSV"
Ok

kaushik subramanian

unread,
Oct 23, 2019, 7:00:39 PM10/23/19
to ClickHouse
Sure. I manually created a sample for attaching in the thread which accidentally had a new line at the end. The files which are used to insert are clean and do not have any extra new lines.

Denis Zhuravlev

unread,
Oct 23, 2019, 7:11:50 PM10/23/19
to ClickHouse
Well , probably you're experiencing https://github.com/ClickHouse/ClickHouse/issues/6426
Try to upgrade to 19.13.5.44

kaushik subramanian

unread,
Oct 23, 2019, 7:38:17 PM10/23/19
to ClickHouse
Ah, great! That explains it all. Moved to the latest stable version and the problem disappeared. Thank you so much Denis for spending the time to help me through this. Much appreciated.

Regards,
Kaushik
Reply all
Reply to author
Forward
0 new messages