[Imdbpy-help] s32imdbpy.py Error message: "'charmap' codec can't encode character"

9 views
Skip to first unread message

Ambrose Chapel

unread,
Sep 17, 2020, 6:21:35 AM9/17/20
to imdbp...@lists.sourceforge.net
I'm running the s32imdbpy.py script to import the gz files into my SQL database.

I'm seeing this error a lot, example, when processing name.basics.tsv.gz:

ERROR:<username>:error processing data: 10000 entries lost: 'charmap' codec can't encode characters in position 0-9: character maps to <undefined>


My database table is set to charset utf8_unicode_ci as per instructions.

I guess my obvious question is how can I prevent this, but also, have I really lost 1,000 database entries? Or have I got those 1,000 database entries in my database but with some problem unicode characters missing, and the message is misleading?

TIA

Davide Alberani

unread,
Sep 21, 2020, 4:12:01 PM9/21/20
to Ambrose Chapel, imdbp...@lists.sourceforge.net
Hi Ambrose,

Can you specify the complete command line and the database you are using?

Yes, I fear you have lost 1000 entries for each error.

I'm not sure about the root cause of the problem; maybe you need to specify
some additional parameter to the database URI?
See https://imdbpy.readthedocs.io/en/latest/usage/s3.html for an example.

Another obvious source of information is the logs of the database.
Anything useful there?

Hope this helps,
> _______________________________________________
> Imdbpy-help mailing list
> Imdbp...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/imdbpy-help



--
Davide Alberani <davide....@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD]
http://www.mimante.net/


_______________________________________________
Imdbpy-help mailing list
Imdbp...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help
Reply all
Reply to author
Forward
0 new messages