Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

load data infile somewhat flaky

2 views
Skip to first unread message

DFS

unread,
Aug 17, 2021, 1:11:18 PM8/17/21
to
Finally got a big file (3 columns x 612K rows) of messy data loaded.

load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';

I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was the
real issue.

On one run 5 lines didn't post, and it turns out they were the next 5
lines after a field ending in "\". Those 5 lines got concatenated with
the offending line, so I had one large clump of data in one row.

Moral: watch for fields ending in \ or "\".

Johann Klammer

unread,
Aug 18, 2021, 4:10:41 AM8/18/21
to
I seem to recall it also has problems with whitespace around the commata.


DFS

unread,
Aug 18, 2021, 9:43:23 AM8/18/21
to
Yes, I saw that too.

It was a fiasco. Wasted a fair amt of my time. A couple times it would
spend 10 minutes reading a file (Stage 1 of 1...) then tell me it bombed
on row 1 (which was valid data).
Near the end of my db cloning process (15 tables, 30M rows total, SQLite
to MariaDB) I wrote a little python program to copy the data.

The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.

'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.

Dr Eberhard Lisse

unread,
Aug 19, 2021, 6:33:53 AM8/19/21
to
I find that using tools like CSVQ helps with checking CSV data
integrity.

Turning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.

However,

https://www.google.com/search?q=sqlite+to+mariadb

has 1.5 Million results :-)-O

I have used

https://pypi.org/project/sqlite3-to-mysql/

in the past successfully


el
--
To email me replace 'nospam' with 'el'

DFS

unread,
Aug 19, 2021, 8:54:15 AM8/19/21
to
On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:

> I find that using tools like CSVQ helps with checking CSV data
> integrity.

The data came out of a SQLite db and was fairly clean, but encoding
issues meant the SQLite data had 'dupes':

this field has a question mark ? in it
this field has a question mark � in it


> Turning off indexes and such during the load and generating them
> afterwards speeds loading of data up significantly.
>
> However,
>
>      https://www.google.com/search?q=sqlite+to+mariadb
>
> has 1.5 Million results :-)-O


www.google.com/search?q=mariadb+to+sqlite

has 1.4M results

SQLite has been very robust for this db of 30M rows. Easy to backup and
administer, never crashed, fast querying... but it's very slow to alter
big tables, and I wanted to see what all the fuss about MySQL/MariaDB
was about.

It (MariaDB) is a nice dbms, I must say. Especially for no cost/open
source.

I also looked at Oracle Express Edition, but it was limited to 10GB of
data, and my database with indexes is at least 15GB.



> I have used
>
>      https://pypi.org/project/sqlite3-to-mysql/
>
> in the past successfully


Thanks.

I'll look at it. I wrote my own source to dest python code, and the
INSERT rate into MariaDB was fast, as high as 8600 per second on some
tables.

Overall: 30.6M rows copied in 5770 seconds = 5304 per second (on my
11-year-old i5-750 computer). That's still 96 minutes, so I wonder if
there's a better way.

J.O. Aho

unread,
Aug 19, 2021, 9:23:24 AM8/19/21
to
On 19/08/2021 14.54, DFS wrote:
> On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:
>
>> I find that using tools like CSVQ helps with checking CSV data
>> integrity.
>
> The data came out of a SQLite db and was fairly clean, but encoding
> issues meant the SQLite data had 'dupes':
>
> this field has a question mark ? in it
> this field has a question mark � in it

As long as it's the same charset used in the sqlite, it shouldn't be too
much work to fix it, but a completely different thing is it's a mix.


>> Turning off indexes and such during the load and generating them
>> afterwards speeds loading of data up significantly.
>>
>> However,
>>
>>       https://www.google.com/search?q=sqlite+to+mariadb
>>
>> has 1.5 Million results :-)-O
>
>
> www.google.com/search?q=mariadb+to+sqlite
>
> has 1.4M results
>
> SQLite has been very robust for this db of 30M rows.  Easy to backup and
> administer, never crashed, fast querying... but it's very slow to alter
> big tables, and I wanted to see what all the fuss about MySQL/MariaDB
> was about.
>
> It (MariaDB) is a nice dbms, I must say.  Especially for no cost/open
> source.

PostgreSQL is worth to mention (I know shouldn't say it here), also open
source and do have some nice features missing in MariaDB/MySQL.


--

//Aho

Axel Schwenke

unread,
Aug 19, 2021, 10:49:03 AM8/19/21
to
On 19.08.2021 14:54, DFS wrote:
> On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:
>
> The data came out of a SQLite db and was fairly clean, but encoding issues
> meant the SQLite data had 'dupes':

That makes no sense. If the data source has other encoding (charset) than
the destination, it is the DBAs fault. He has to make sure that destination
encoding allows all characters from the source.

>>>>> On one run 5 lines didn't post, and it turns out they were the next
>>>>> 5 lines after a field ending in "\".  Those 5 lines got concatenated
>>>>> with the offending line, so I had one large clump of data in one
>>>>> row.

Invalid data in exported files is also the DBAs fault. He must use proper
quoting if the data contains special characters like '\'

The Natural Philosopher

unread,
Aug 19, 2021, 12:00:54 PM8/19/21
to
On 19/08/2021 13:54, DFS wrote:
> and my database with indexes is at least 15GB.
...indices.... ^^^^^^^

--
Outside of a dog, a book is a man's best friend. Inside of a dog it's
too dark to read.

Groucho Marx


DFS

unread,
Aug 19, 2021, 12:31:56 PM8/19/21
to
On 8/19/2021 12:00 PM, The Natural Philosopher wrote:
> On 19/08/2021 13:54, DFS wrote:
>> and my database with indexes is at least 15GB.
> ...indices....         ^^^^^^^


or indexes.


Jerry Stuckle

unread,
Aug 20, 2021, 1:58:57 PM8/20/21
to
On 8/19/2021 12:00 PM, The Natural Philosopher wrote:
> On 19/08/2021 13:54, DFS wrote:
>> and my database with indexes is at least 15GB.
> ...indices....         ^^^^^^^
>

Both are acceptable. https://www.dictionary.com/browse/index

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================
0 new messages