I'm running the s32imdbpy.py script to import the gz files into my SQL database.
I'm seeing this error a lot, example, when processing name.basics.tsv.gz:
ERROR:<username>:error processing data: 10000 entries lost: 'charmap' codec can't encode characters in position 0-9: character maps to <undefined>
My database table is set to charset utf8_unicode_ci as per instructions.
I guess my obvious question is how can I prevent this, but also, have I really lost 1,000 database entries? Or have I got those 1,000 database entries in my database but with some problem unicode characters missing, and the message is misleading?
TIA