[Imdbpy-help] imdbpy2sql 4.7 - invalid byte sequence for encoding "UTF8"

379 views
Skip to first unread message

darklow

unread,
Apr 11, 2011, 12:35:23 PM4/11/11
to imdbp...@lists.sourceforge.net
Hello,

Getting error all the time at the same place:
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320

System:
imdbpy 4.7 (also tried with latest version from SVN: 4.8dev20110317)
python 2.6.6
PostgreSQL: 8.4.7 (Database encoding is en_US.UTF8)
IMDB data: (tried with the latest and also version from february)

Lines i tried to run:
./imdbpy2sql.py -d /www/imdb/data/ -u postgres://imdb:imdb@localhost/imdb2
also tried:
./imdbpy2sql.py -d /www/imdb/data/ -u postgres://imdb:imdb@localhost/imdb2 -e 'AFTER_CREATE:SET client_encoding TO utf8'

Some facts to help diagnose problem:
IMDBPy is not installed, running from sources.
Dependancies like SQLObject are installed (SQLObject-0.12.4)
Running on Debian Linux.
Some time ago we used installed version IMDBPy and everything went fine even with the same data files as now, but since there is no stable version 4.7 for debian yet, we uninstalled and now we are running from source.

After 30 minutes of script running i recieve following errors:

Error:
SCANNING actor: Havel, Jir?
 * FLUSHING CharactersCache...
Traceback (most recent call last):
  File "./imdbpy2sql.py", line 2959, in <module>
    run()
  File "./imdbpy2sql.py", line 2820, in run
    castLists(_charIDsList=characters_imdbIDs)
  File "./imdbpy2sql.py", line 1584, in castLists
    doCast(f, roleid, rolename)
  File "./imdbpy2sql.py", line 1543, in doCast
    cid = CACHE_CID.addUnique(role)
  File "./imdbpy2sql.py", line 966, in addUnique
    else: return self.add(key, miscData)
  File "./imdbpy2sql.py", line 959, in add
    self[key] = c
  File "./imdbpy2sql.py", line 869, in __setitem__
    self.flush()
  File "./imdbpy2sql.py", line 892, in flush
    self._toDB(quiet)
  File "./imdbpy2sql.py", line 1194, in _toDB
    CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

Any suggestions? I found similar topic, but there were also no solutions.
Run out of ideas :/
Anyone could help?
Thank you.

Davide Alberani

unread,
Apr 11, 2011, 3:46:39 PM4/11/11
to darklow, imdbp...@lists.sourceforge.net
On Mon, Apr 11, 2011 at 18:35, darklow <dar...@gmail.com> wrote:
>
>   File "./imdbpy2sql.py", line 1194, in _toDB
>     CURS.executemany(self.sqlstr, self.converter(l))
> psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
> HINT:  This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by "client_encoding".
>
> Any suggestions? I found similar topic, but there were also no solutions.

Yes, I've had other reports about this bug.
Seems to be related to some garbage in the actors.list.gz file.
I hope to have time to investigate the problem within a week or two.

Thanks for the bug report!

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Forrester Wave Report - Recovery time is now measured in hours and minutes
not days. Key insights are discussed in the 2010 Forrester Wave Report as
part of an in-depth evaluation of disaster recovery service providers.
Forrester found the best-in-class provider in terms of services and vision.
Read this report now! http://p.sf.net/sfu/ibm-webcastpromo
_______________________________________________
Imdbpy-help mailing list
Imdbp...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

darklow

unread,
Apr 13, 2011, 2:45:17 AM4/13/11
to Davide Alberani, imdbp...@lists.sourceforge.net
Since i am not familiar with python, maybe you could suggest some fast fix so that scripts doesn't hangs?
Maybe this helps: In PHP we have perfeclty same error with encoding when importing some wrong decoded data. When we have no control over data and we cant all the time do utf8_encode since it could encode string twice - to bypass this error i use this function which at least prevents from postgresql error:

function  fix_encoding($in_str) {
        $cur_encoding = mb_detect_encoding($in_str) ;
        if($cur_encoding == "UTF-8" && mb_check_encoding($in_str,"UTF-8")){
            return $in_str;
        }else{
            return utf8_encode($in_str);
        }
}

Maybe you can help to adapt this function to Python if similar functions are available so we can use it as a quick fix?
Thanks a lot.

darklow

unread,
Apr 13, 2011, 2:46:30 AM4/13/11
to Davide Alberani, imdbp...@lists.sourceforge.net
Maybe someone knows some fast dirty fix at least how to skip such invalid byte sequence strings while there are no official fix, so i can finish the import?
Can we detect invalid byte characters? Maybe we can somehow replace or get rid of 0xc320 character, which mostly is appearing. Or skip these rows.

Ananlyzed error a bit more. Mostly these errors occur in Japanese actors (actors.list), in filmography there apperars strange characters:

Hayakawa, Yuzo
Burai hij8)

Tried to delete these rows manually, but the are too much of them :/
Thank you.

Davide Alberani

unread,
Apr 13, 2011, 4:56:42 PM4/13/11
to imdbp...@lists.sourceforge.net, IMDbPY development, Thomas Stewart
On Mon, Apr 11, 2011 at 18:35, darklow <dar...@gmail.com> wrote:
>
>   File "./imdbpy2sql.py", line 1194, in _toDB
>     CURS.executemany(self.sqlstr, self.converter(l))
> psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
> HINT:  This error can also happen if the byte sequence does not match the
> encoding expected by the server, which is controlled by "client_encoding".

Hi all,
I'm writing regarding the recent "0xc320" problem with IMDbPY.
The above notice is extremely interesting, and should be investigated:
how can it be that 0xc320 is not UTF8 encodable?
It should work; from the Python prompt:
>>> unichr(0xc320).encode('utf8')
'\xec\x8c\xa0'

Anyway, as a very fast and dirty fix (the main problem is probably some
crap in the data files), try this: after line 1181 of imdbpy2sql.py, add:
k = k.replace('\xec\x8c\xa0', '')

So that the nearby lines will become:
try:
k = k.replace('\xec\x8c\xa0', '')
t = analyze_name(k)
except IMDbParserError:

Please be aware that this fix was not tested at all, but I'm
almost sure that, at the above point, 'k' is a string encoded in utf8.

Anyway, beside the "garbage theory", I have another idea
about the source of the error, but I have to verify it later...

Bye, and let me know if it works!

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev

darklow

unread,
Apr 14, 2011, 3:54:54 AM4/14/11
to Davide Alberani, IMDbPY development, Thomas Stewart, imdbp...@lists.sourceforge.net
Unfortunately adding this line
k = k.replace('\xec\x8c\xa0', '') in the place you mentioned wont help.

Still same error on same place :(

SCANNING actor: Havel, Jir?
 * FLUSHING CharactersCache...
Traceback (most recent call last):
 .........
    self.flush()
  File "./imdbpy2sql.py", line 1195, in _toDB
    CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320

Davide Alberani

unread,
Apr 16, 2011, 11:01:38 AM4/16/11
to imdbp...@lists.sourceforge.net, tho...@stewarts.org.uk
On Wed, Apr 13, 2011 at 08:46, darklow <dar...@gmail.com> wrote:
> Maybe someone knows some fast dirty fix at least how to skip such invalid
> byte sequence strings while there are no official fix, so i can finish the
> import?
> Can we detect invalid byte characters?

Hi again,
actually my problem is that I'm unable to reproduce this bug. :-)
Using Postgresql and SQLObject, my run goes on smooth.

I have downloaded the 'actors.list.gz' file today, so it's possible that some
garbage was removed.

Anyway, the previously proposed solution was obviously flawed, since
the problem was on _character_ names.

So, let's edit again the imdbpy2sql.py file and change the lines around 1540
so that they become:

movieid = CACHE_MID.addUnique(title)
if role is not None:
roles = filter(None, [x.strip() for x in role.split('/')])
for role in roles:
role = role.replace('\xec\x8c\xa0', '') # TEMPORARY FIX
cid = CACHE_CID.addUnique(role)
sqldata.add((pid, movieid, cid, note, order))

Maybe this will help... who knows? :-)

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------


Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev

darklow

unread,
Apr 17, 2011, 8:04:53 AM4/17/11
to Davide Alberani, tho...@stewarts.org.uk, imdbp...@lists.sourceforge.net
Updated this morning to latest data files, no change and unfortunately this fix also doesn't work.
I even tried adding
self.sqlstr = self.sqlstr.replace('\xec\x8c\xa0', '') in _toDB function and still get the same error.
Maybe this unicode character replacement method is wrong?

This error started when we uninstalled imdbpy (left all the dependency libs) and started run it without installation. Maybe there is some kind of problem and some kind of hidden unicode dependencies? Maybe you can try to run without installation, jus from source?

Also every time i start the script i receive two warnings:
2011-04-17 11:13:37,398 WARNING [imdbpy.parser.sql.aux] /data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:125: Unable to import the cutils.ratcliff function.  Searching names and titles using the "sql" data access system will be slower.
2011-04-17 11:13:37,399 WARNING [imdbpy.parser.sql.aux] /data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:332: Unable to import the cutils.soundex function.  Searches of movie titles and person names will be a bit slower.
IMPORTING psyco... FAILED (not a big deal, everything is alright...)

maybe that is some kind related?

Petite Abeille

unread,
Apr 17, 2011, 9:31:39 AM4/17/11
to imdbpy-help

On Apr 13, 2011, at 8:46 AM, darklow wrote:

> Ananlyzed error a bit more. Mostly these errors occur in Japanese actors
> (actors.list), in filmography there apperars strange characters:

Sounds like a character set encoding issue.

Originally, something like actors.list is ISO-8859-1 encoded. IMDbPY converts it to UTF-8 internally:

http://imdbpy.sourceforge.net/docs/README.utf8.txt

You can check if actors.list is properly encoded by converting it to UTF-8 outside of IMDbPY.

For example, using iconv:

iconv -f ISO-8859-1 -t UTF-8 < actors.list > actors.list.txt

This should result in a proper UTF-8 encoded file. If anything goes wrong, iconv should point out the issue.

For example, the entries for Hayakawa, Yuzo should look like the following:

A, zerosen (1965) [Tokunaga]
Abunai Deka ritaanzu (1996)
Akumyo ichidai (1967)
Aru joshi kôkôi no kiroku: shisshin (1969)
Chijin no ai (1967) [Namikawa]
Dai akutô (1968)
Daikaijû kettô: Gamera tai Barugon (1966) [Kawajiri] <3>
Dorodarake no junjô (1977) [Det. Seki]
Furin (1965) [Saruoka] <6>
Genkai yûkyôden: Yabure kabure (1970) [Yanagawa]
Haru kôrô no hana no en (1958) [Sata]
Hiroshima (1995) (TV) [Koshiro Oikawa] <70>
Jet F-104 dassyutsu seyo (1968)
Kaidan otoshiana (1968) [Sakabe]
Kawaki (1958) <4>
Kimimachi-bune (1954) (as Yûji Hayakawa) [Tomii]
Konki (1961)
Malenkiy beglets (1966)
Mi wa jukushitari (1959) [Chef at Mizumi]
Mushukunin Mikogami no Jôkichi: Kiba wa hikisaita (1972) <9>
Nagasugita haru (1957) [Student]
Nihonkai daikaisen: Umi yukaba (1983) [Kataoka]
Nippon chinbotsu (1974) [SDF General]
Nobi (1959) (as Yuji Hayakawa)
Obi o toku Natsuko (1965) [Kwashima] <6>
Okoto to Sasuke (1961) (as Yûzô Hayakawa)
Onna ga aishite nikumu toki (1963) [Iwashita]
Onna tobakushi (1967)
Rikugun Nakano gakko (1966) [Colonel Iwakura] <6>
Rikugun Nakano gakko: Ryu-sango shirei (1967)
Sakura no ki no shita de (1989)
Salary man donto bushi - Kiraku na kagyô to kita monda (1962) (as Yûzô Hayakawa) [Shibayama]
Satsujinsha (1966)
Seisaku no tsuma (1965) [Sergeant]
Sekkusu chekku: Daini no sei (1968) [Sasanuma] <5>
Shiroi Kyotou (1966) <14>
Shuntou (1989) (TV) <15>
Tokyo no josei (1960)
Tokyo onigiri musume (1961) (as Yûzô Hayakawa) [Draper]
Uchu kaijû Gamera (1980) [Policeman]
Yoru no wana (1967) [Fumikichi Hayashi]
Zatôichi rôyaburi (1967)
Zoku sex doctor no kiroku (1968) <5>
"Kôya no surônin" (1972) <11>
"Sukeban Deka" (1985) {Nerawareta atakkâ (#1.10)} (as Yûzô Hayakawa) <14>
"Zoku zoku jiken: Tsuki no keshiki" (1980) {(#1.2)} [Dr. Arai] <11>

There shouldn't be any "strange" characters in sight :)


------------------------------------------------------------------------------
Benefiting from Server Virtualization: Beyond Initial Workload
Consolidation -- Increasing the use of server virtualization is a top
priority.Virtualization can reduce costs, simplify management, and improve
application availability and disaster protection. Learn more about boosting
the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev

Davide Alberani

unread,
Apr 17, 2011, 10:13:20 AM4/17/11
to imdbp...@lists.sourceforge.net, tho...@stewarts.org.uk
On Sun, Apr 17, 2011 at 14:04, darklow <dar...@gmail.com> wrote:
> Updated this morning to latest data files, no change and unfortunately this
> fix also doesn't work.

Hmm... to debug a problem like this without being able to reproduce,
is extremely difficult. :-/

> This error started when we uninstalled imdbpy (left all the dependency libs)
> and started run it without installation. Maybe there is some kind of problem
> and some kind of hidden unicode dependencies? Maybe you can try to run
> without installation, jus from source?

Have you some very good reason to do so? :-)
Can't you try to purge every reference to IMDbPY left on the
system (search for the scripts in /usr/bin/ and /usr/local/bin/ and
be sure that "import imdb" fails, at the python prompt) and see
if the problem is solved, after IMDbPY 4.7 is reinstalled?

If you have problems locating the IMDbPY package, just open
the Python prompt and:
>>> import imdb
>>> print imdb

> Also every time i start the script i receive two warnings:
> 2011-04-17 11:13:37,398 WARNING [imdbpy.parser.sql.aux]
> /data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:125: Unable to
> import the cutils.ratcliff function.  Searching names and titles using the
> "sql" data access system will be slower.

This will force IMDbPY to use some pure-python fall-back functions.
It's entirely possible that there are some bug in these functions, even
if a run without cutils.so is running fine, for me (so far).

> IMPORTING psyco... FAILED (not a big deal, everything is alright...)

That's not a problem for sure.

Right now, my first guess is that somewhere, after the *.list files ar
read and turned into utf-8 encoded strings, the imdbpy2sql.py
script does Something Very Wrong(tm) to a string (like cutting it at a certain
place, ending up cutting a single utf-8 encoded char in two: this could
explain the error).

I've tried the conversion suggested by Petite Abeille, and it works fine.

Please, could you cut a small piece (few kilobytes) of the actors.list file,
and attach it (no cut-and-paste)?
It goes without saying that you should chose a portion where you see
(or guess are) the "strange chars" :-)

Thanks!

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

darklow

unread,
Apr 18, 2011, 2:53:13 AM4/18/11
to Davide Alberani, tho...@stewarts.org.uk, imdbp...@lists.sourceforge.net
On Sun, Apr 17, 2011 at 5:13 PM, Davide Alberani <davide....@gmail.com> wrote:
On Sun, Apr 17, 2011 at 14:04, darklow <dar...@gmail.com> wrote:
> Updated this morning to latest data files, no change and unfortunately this
> fix also doesn't work.

Hmm...  to debug a problem like this without being able to reproduce,
is extremely difficult. :-/

> This error started when we uninstalled imdbpy (left all the dependency libs)
> and started run it without installation. Maybe there is some kind of problem
> and some kind of hidden unicode dependencies? Maybe you can try to run
> without installation, jus from source?

Have you some very good reason to do so? :-)

We have Debian linux on our server and our sysadmin allows only stable packs. However latest version of imdbpy has these md5 checksum that are quite important in our situation, that is why i have to run it from source.
 
Can't you try to purge every reference to IMDbPY left on the
system (search for the scripts in /usr/bin/ and /usr/local/bin/ and
be sure that "import imdb" fails, at the python prompt) and see
if the problem is solved, after IMDbPY 4.7 is reinstalled?


Unfortunately right now i can't do reinstall, just to run it by source. However if this is the reason and there will be no way to fix this, i'll try to convince sysadmin to install this version from unofficial debian packs
 
If you have problems locating the IMDbPY package, just open
the Python prompt and:
>>> import imdb
>>> print imdb

> Also every time i start the script i receive two warnings:
> 2011-04-17 11:13:37,398 WARNING [imdbpy.parser.sql.aux]
> /data/web/imdb/imdbpy4.7-159671/imdb/parser/sql/__init__.py:125: Unable to
> import the cutils.ratcliff function.  Searching names and titles using the
> "sql" data access system will be slower.

This will force IMDbPY to use some pure-python fall-back functions.
It's entirely possible that there are some bug in these functions, even
if a run without cutils.so is running fine, for me (so far).

> IMPORTING psyco... FAILED (not a big deal, everything is alright...)

That's not a problem for sure.

Right now, my first guess is that somewhere, after the *.list files ar
read and turned into utf-8 encoded strings, the imdbpy2sql.py
script does Something Very Wrong(tm) to a string (like cutting it at a certain
place, ending up cutting a single utf-8 encoded char in two: this could
explain the error).

I've tried the conversion suggested by Petite Abeille, and it works fine.

Please, could you cut a small piece (few kilobytes) of the actors.list file,
and attach it (no cut-and-paste)?
It goes without saying that you should chose a portion where you see
(or guess are) the "strange chars" :-)

I attached the small part of actors.list file right the place with the broken characters (used unix sed command to cut the problematic lines out).
actors.list.small

Davide Alberani

unread,
Apr 18, 2011, 3:30:26 AM4/18/11
to imdbp...@lists.sourceforge.net, tho...@stewarts.org.uk
On Mon, Apr 18, 2011 at 08:53, darklow <dar...@gmail.com> wrote:
>
> We have Debian linux on our server and our sysadmin allows only stable
> packs. However latest version of imdbpy has these md5 checksum that are
> quite important in our situation, that is why i have to run it from source.

Ehhh... what about a virtual machine or - even easier - virtualenv [0]

Thanks for the file, I hope to look at it within a day or two.


+++
[0] http://pypi.python.org/pypi/virtualenv


--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

Davide Alberani

unread,
Apr 19, 2011, 4:11:55 PM4/19/11
to imdbp...@lists.sourceforge.net, tho...@stewarts.org.uk
On Mon, Apr 18, 2011 at 09:30, Davide Alberani
<davide....@gmail.com> wrote:
>
> Thanks for the file, I hope to look at it within a day or two.

Ok: the file is correctly encoded in iso8859-1, as expected, and contains
no garbage.

Using it as the only input for imdbpy2sql.py (putting the attached file in
a directory by itself), I can run the script with no errors (besides
the expected
warnings about missing files).

I'm using the version from the Mercurial repository, without the cutils.so
library.

Please, if you can't install IMDbPY in your system, consider the use
of virtualenv.
Having tried that, I have to recommend you to double check the
settings of your Postgresql server for some kind of incoherences
about encodings and collations.

HTH,

actors.list.gz

darklow

unread,
Apr 20, 2011, 8:08:17 AM4/20/11
to Davide Alberani, tho...@stewarts.org.uk, imdbp...@lists.sourceforge.net
Still no luck :/ maybe the problem is in some environmental variables or settings, which on installed version are present, but running from source are missing or incorrect?

What about this, i printed out some variables:

print sys.stdout.encoding -> UTF-8
print sys.stdin.encoding   -> UTF-8
print sys.getdefaultencoding(); -> ascii

Is it ok that  sys.getdefaultencoding(); == ascii ?

Maybe there are some more variables i should check?

Davide Alberani

unread,
Apr 23, 2011, 8:23:06 AM4/23/11
to imdbpy-help, tho...@stewarts.org.uk
On Wed, Apr 20, 2011 at 14:08, darklow <dar...@gmail.com> wrote:
> Still no luck :/ maybe the problem is in some environmental variables or
> settings, which on installed version are present, but running from source
> are missing or incorrect?

Seems unlikely to me.

> What about this, i printed out some variables:
> print sys.stdout.encoding -> UTF-8
> print sys.stdin.encoding   -> UTF-8
> print sys.getdefaultencoding(); -> ascii
> Is it ok that  sys.getdefaultencoding(); == ascii ?

These are fine.

I've reproduced - at the best of my capabilities - your environment:
- no IMDbPY installed in the system.
- IMDbPY from source (the latest version in the Mercurial repository),
setting the PYTHONPATH environment variable to point to the
source directory.
- the cutils C module was not compiled.
- the last actors.list.gz file.
- postgres 8.4; my database was created with these settings:
CREATE DATABASE imdb
WITH OWNER = postgres
ENCODING = 'UTF8'
TABLESPACE = pg_default
LC_COLLATE = 'it_IT.utf8'
LC_CTYPE = 'it_IT.utf8'
CONNECTION LIMIT = -1;

I've run it with your and other portions of the actors.list.gz file, and
everything went fine.

Now... if I were you, I'd:
- create a virtualenv environment with:
virtualenv --no-site-packages
- install in it IMDbPY, using easy_install or pip (the executable in
your virtualenv, I mean) so that you'll have all the correct dependecies
available.
- run the imdbpy2sql.py within your virtualenv.

If it still fails:
- check your postgres settings.
- try using SQLite (just for a test) - see notes in README.sqldb


HTH,
--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Fulfilling the Lean Software Promise
Lean software platforms are now widely adopted and the benefits have been
demonstrated beyond question. Learn why your peers are replacing JEE
containers with lightweight application servers - and what you can gain
from the move. http://p.sf.net/sfu/vmware-sfemails

Thomas Stewart

unread,
Apr 24, 2011, 2:03:32 PM4/24/11
to Davide Alberani, imdbpy-help
* Davide Alberani <davide....@gmail.com> [2011-04-23 13:23:18]:

> If it still fails:
> - check your postgres settings.
> - try using SQLite (just for a test) - see notes in README.sqldb

Hi,

I've just had a try using sqlite with fresh lists and on my Debian
system and I get this:

thomas@ikaite:~$ /tmp/imdbpy2sql.py -d /home/thomas/Desktop/imdb/lists -u sqlite:///home/thomas/Desktop/imdb/imdb.db --sqlite-transactions
IMPORTING psyco... DONE!

RUNNING imdbpy2sql.py
EXECUTING "BEGIN:PRAGMA synchronous = OFF;"...
EXECUTING "PRAGMA synchronous = OFF;"... DONE!
# TIME BEGIN command : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec (system)
SAVING imdbID values for movies... SKIPPING: no data.
SAVING imdbID values for people... SKIPPING: no data.
SAVING imdbID values for characters... SKIPPING: no data.
SAVING imdbID values for companies... SKIPPING: no data.
DROPPING current database... DONE!
CREATING new tables... DONE!
# TIME dropping and recreating the database : 0min, 2sec (wall) 0min, 0sec (user) 0min, 0sec (system)
SCANNING movies: !Women Art Revolution (2010) (movieID: 1)
SCANNING movies: A Child's Garden of Verses (1992) (TV) (movieID: 10001)
SCANNING movies: A Strict Affair... Lessons in Discipline and Obedience (1992) (V) (movieID: 20001)
SCANNING movies: AIDS: What Everyone Needs to Know (1987) (movieID: 30001)
SCANNING movies: Amour (1922) (movieID: 40001)
SCANNING movies: Arktinen lumous (1997) (TV) (movieID: 50001)
SCANNING movies: Baby's Storytime (1989) (V) (movieID: 60001)
SCANNING movies: Bei jiu gao ge (1974) (movieID: 70001)
SCANNING movies: Black in the Ass 8 (2005) (V) (movieID: 80001)
SCANNING movies: Breaking In (1925) (movieID: 90001)
EXECUTING "BEFORE_MOVIES_TODB:BEGIN TRANSACTION;"...
EXECUTING "BEGIN TRANSACTION;"... DONE!
* FLUSHING MoviesCache...


Traceback (most recent call last):

File "/tmp/imdbpy2sql.py", line 2950, in <module>
run()
File "/tmp/imdbpy2sql.py", line 2786, in run
readMovieList()
File "/tmp/imdbpy2sql.py", line 1467, in readMovieList
mid = CACHE_MID.addUnique(title, yearData)
File "/tmp/imdbpy2sql.py", line 1073, in addUnique
else: return self.add(key, miscData)
File "/tmp/imdbpy2sql.py", line 950, in add
self[key] = c
File "/tmp/imdbpy2sql.py", line 860, in __setitem__
self.flush()
File "/tmp/imdbpy2sql.py", line 883, in flush
self._toDB(quiet)
File "/tmp/imdbpy2sql.py", line 1057, in _toDB
self._runCommand(l)
File "/tmp/imdbpy2sql.py", line 1061, in _runCommand
CURS.executemany(self.sqlstr, self.converter(dataList))
pysqlite2.dbapi2.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.
thomas@ikaite:~$

Regards
--
Tom

darklow

unread,
Apr 24, 2011, 3:03:45 PM4/24/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
There has never been any issues with our PostgresSQL database, we always have used UTF-8 and are using this time.
I have tried plenty of scripts, workarounds so far, many decode().encode() tries, but nothing helps, just gettings different errors by these.
I also tried adding following lines, to be sure everything is fine with connection to Database:

import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

import codecs
sys.setdefaultencoding('utf-8')

CURS.execute("SET NAMES 'utf8'")
CURS.execute("SET CLIENT_ENCODING TO 'utf8'")

But still nothing helps.
I tried reinstalling all installed dependancies and run from clean sources, but no luck.
I tried to run scripts with SQLAlchemy instead of SQLObject, but same error, so the problem is not there.

I woud like to ask you one thing.
Every test takes about 1h, because error takes place in Actors Cast list. 
Can you please tell what are the exact list of commands that are converting lines from file to line to sql. 
So i could create new script, that tries small version of actors.list with problematic lines only, runs few unicode() and decode() lines in correct order and try to insert these lines in some test table into database. So i could try, more faster and not to wait 1 hour for every try...

What i tried already is to open actor.list file with PHP, read every line and using iconv converted string to UTF8 and inserted into PostgreSQL database and everything worked fine. It makes me think that problem might be somewhere in cutting line in peaces, maybe it does something wrong, cuts some good unicode character into peaces and so invalid byte sequence appears. If i had correct function list for Python, i could run more tests.

PS. Just run test with 4.6 version, to see if it still works with 4.6 version, then we could more easy diagnose by looking in file changes.
I'll post the results

Thank you.

darklow

unread,
Apr 24, 2011, 4:44:48 PM4/24/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
Yes i can confirm - Script version 4.6 works perfectly on same server with same files.
And i think by this we come closer to solution.
Maybe this helps to identify the problem, this is what we did on our server.
(Remember, we are doing this copying because there are only stable versions for Debian on server allowed, but we need those md5 hashes from 4.7 version)

1. We installed imdbpy 4.6 with all the dependancies (python-psycopg2, python-dns python-formencode python-pkg-resources python-sqlobject)
2. I downloaded version 4.7 and overwritten following directories with files from 4.7 source:

cp -r imdbpy4.7/docs/* /usr/share/doc/python-imdb/
cp -r imdbpy4.7/imdb/* /usr/share/pyshared/imdb/

3. Now i run imdbpy2sql.py from version 4.7 source like before and it fails with invalid byte sequence. 
4. I copied back 4.6. version files to mentioned directories and import for version 4.6 works again. 

By looking on install log, i didnt see any more relative files, that i should overwrite. So the problem might be at dependancies.
You have any idea, where could be the problem and what else should we overwrite or update so that v4.7 works?
Thank you.

Davide Alberani

unread,
Apr 24, 2011, 5:36:38 PM4/24/11
to Thomas Stewart, imdbpy-help
On Sun, Apr 24, 2011 at 20:03, Thomas Stewart <tho...@stewarts.org.uk> wrote:
>
> I've just had a try using sqlite with fresh lists and on my Debian
> system and I get this:
>
> thomas@ikaite:~$ /tmp/imdbpy2sql.py -d /home/thomas/Desktop/imdb/lists -u sqlite:///home/thomas/Desktop/imdb/imdb.db --sqlite-transactions
> IMPORTING psyco... DONE!
[...]

>    CURS.executemany(self.sqlstr, self.converter(dataList))
> pysqlite2.dbapi2.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

This specific bug (a bad interaction between SQLObject and SQLite) should
be fixed in the version in the Mercurial repository; isn't it?


--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

Davide Alberani

unread,
Apr 24, 2011, 5:48:35 PM4/24/11
to darklow, tho...@stewarts.org.uk, imdbpy-help
On Sun, Apr 24, 2011 at 21:03, darklow <dar...@gmail.com> wrote:
>
> I tried reinstalling all installed dependancies and run from clean sources,
> but no luck.
> I tried to run scripts with SQLAlchemy instead of SQLObject, but same error,
> so the problem is not there.

Perfect - these tests are really important to spot the problem.

> Every test takes about 1h, because error takes place in Actors Cast list.

Wait: I'll read the rest of your mails tomorrow, but this can help you
to do things faster: you don't need the other files at all.
Simply put the actors.list.gz file in a directory by itself, and run
imdbpy2sql.py
with this directory as "-d" argument.
You can even use a shorter version of actors.list.gz, just remember to leave
the lines at the begin and at the end (various separators are used to identify
where the data begin), like I did with the actors.lists.gz file that I attached
some days ago.

In the 'docs/goodies' directory you'll find the 'reduce.sh' script, which
takes a whole directory of *.list.gz files and reduce them to 1% of
their length.

> It makes me think that problem might be
> somewhere in cutting line in peaces, maybe it does something wrong, cuts
> some good unicode character into peaces and so invalid byte sequence
> appears.

My guess, too... it's just that I can't see where it happens... :-/

Thanks for your tests!

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

Davide Alberani

unread,
Apr 24, 2011, 6:19:42 PM4/24/11
to darklow, tho...@stewarts.org.uk, imdbpy-help
On Sun, Apr 24, 2011 at 22:44, darklow <dar...@gmail.com> wrote:
> Yes i can confirm - Script version 4.6 works perfectly on same server with
> same files.
> And i think by this we come closer to solution.

Excellent! (well, it still baffles me why I'm absolutely unable to
reproduce the problem on my system, but that's another story...)

> Maybe this helps to identify the problem, this is what we did on our server.
> (Remember, we are doing this copying because there are only stable versions
> for Debian on server allowed, but we need those md5 hashes from 4.7 version)

I'll look at your setup tomorrow. I'll surely sound pedantic, but... seriously:
why you don't use a virtualenv environment? It's easy to install and
doesn't require root privileges.


--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

darklow

unread,
Apr 26, 2011, 3:36:04 AM4/26/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
Thanks, let me know if you have any ideas, how to fix the problem...
About virtalenv. I was also quite pedantic on ignoring virtualenv solution - i am programmer, not a system administrator, i am not familiar with python, i understand the code logic, but haven't coded any application so far, just one test parser to diagnose error.
I looked at virtualenv documentation, i didn't understand how to use it, the problem is my little knowledge in Python and its components, so i think you have to be more familiar with Python and its libraries and way they are installed and configured before installing and configuring virtualenv.
Also our sysadmin is quite pain in the a.. It is hard to prove the need of that or another new tool to install. If it has a stable debian package, then it is easier. But for all the other packages, almost impossible. Also i am not sure i want to intrude in sysadmins environment and do some installs by myself, even if it doesn't require root access.. 

Davide Alberani

unread,
Apr 26, 2011, 2:40:26 PM4/26/11
to darklow, tho...@stewarts.org.uk, imdbpy-help
On Tue, Apr 26, 2011 at 09:36, darklow <dar...@gmail.com> wrote:
> Thanks, let me know if you have any ideas, how to fix the problem...

Eh... As usual, right now I'm really busy. :-(

> I looked at virtualenv documentation, i didn't understand how to use it,

Ok, let's try:
- download virtualenv from http://pypi.python.org/pypi/virtualenv#downloads
- tar xvfz virtualenv-1.6.tar.gz
- cd virtualenv-1.6
- python virtualenv.py --no-site-packages ~/myvenv
- cd ~/myvenv
- . ./bin/activate # notice the initial dot
- pip install formencode # bug with the dependencies. :(
- pip install IMDbPY # or download from the Mercurial repository and
run 'python setup.py install'

The most important step is the activation of the virtualenv: your prompt
should change to something like "(myvenv)$" to denote that your virtualenv
is active.

Now, always from inside the virtualenv, you can run the imdbpy2sql.py script:
everything was installed locally to your ~/myvenv/ directory (the local python
interpreter is in ~/myvenv/bin/python).
If you need to deactivate the virtualenv, simply run the deactivate command.

HTH,


--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd

Davide Alberani

unread,
Apr 28, 2011, 5:00:56 PM4/28/11
to darklow, tho...@stewarts.org.uk, imdbpy-help
On Thu, Apr 28, 2011 at 22:52, darklow <dar...@gmail.com> wrote:
>
> However last command pip install IMDbPY didn't succeeded so well, looks like
> i got exactly the same error, that another user reported some days ago in
> the same discussion and he has also UTF-8 encoding problem:

Sure: you don't have the python-dev package installed
in your system. :-/
A per-user installation is possible, but a little tricky...

> By running python setup.py install  I receive the same error. I also tried
> latest version (4.8dev20110425) but got same error.

Using the latest version sources, run (after you've activated your
virtualenv!):
python setup.py install --without-cutils

> Maybe this explains the problem why the script doesn't handle UTF-8 at first
> place - some strange incapabilities with cutils.c

I've run some tests without the compiled C module, so I think this
is not the cause, but at this point... who knows. :-)

darklow

unread,
Apr 28, 2011, 4:52:29 PM4/28/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
Thanks, virtualenv is not so hard at all :)
I did everything you said and so far it went fine.

However last command pip install IMDbPY didn't succeeded so well, looks like i got exactly the same error, that another user reported some days ago in the same discussion and he has also UTF-8 encoding problem:

creating build/temp.linux-i686-2.6/imdb/parser/sql
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.6 -c imdb/parser/sql/cutils.c -o build/temp.linux-i686-2.6/imdb/parser/sql/cutils.o
imdb/parser/sql/cutils.c:54:20: error: Python.h: No such file or directory
imdb/parser/sql/cutils.c: In function ‘strings_check’:
imdb/parser/sql/cutils.c:74: warning: implicit declaration of function ‘strlen’
imdb/parser/sql/cutils.c:74: warning: incompatible implicit declaration of built-in function ‘strlen’
imdb/parser/sql/cutils.c:82: warning: implicit declaration of function ‘strcmp’
imdb/parser/sql/cutils.c: In function ‘ratcliff’:

I also attached full install log file where there are all lines of errors and also all the log of what i did since started last command about installing IMDbPY.
By running python setup.py install  I receive the same error. I also tried latest version (4.8dev20110425) but got same error.
Maybe this explains the problem why the script doesn't handle UTF-8 at first place - some strange incapabilities with cutils.c 



pip.log

darklow

unread,
Apr 28, 2011, 6:55:46 PM4/28/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
Thanks. This time i was lucky, sysadmin just installed python-dev package.
Also to get install IMDbPYscript without errors, we needed to install also:
install libxml2-dev libxslt-dev
and afterwards psycopg was missing too, so i run from virtualenv
pip install psycopg

And only now i was available to run imdbpy2sql script withour errors.
Also since now i got virtualenv i installed also psyco.
Now script is running, but unfortunately i got the same error :((((

Traceback (most recent call last):
  File "./bin/imdbpy2sql.py", line 2950, in <module>
    run()
  File "./bin/imdbpy2sql.py", line 2811, in run
    castLists(_charIDsList=characters_imdbIDs)
  File "./bin/imdbpy2sql.py", line 1575, in castLists
    doCast(f, roleid, rolename)
  File "./bin/imdbpy2sql.py", line 1534, in doCast
    cid = CACHE_CID.addUnique(role)
  File "./bin/imdbpy2sql.py", line 957, in addUnique
    else: return self.add(key, miscData)
  File "./bin/imdbpy2sql.py", line 950, in add
    self[key] = c
  File "./bin/imdbpy2sql.py", line 860, in __setitem__
    self.flush()
  File "./bin/imdbpy2sql.py", line 921, in flush
    raise
  File "./bin/imdbpy2sql.py", line 883, in flush
    self._toDB(quiet)
  File "./bin/imdbpy2sql.py", line 1185, in _toDB
    CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320
HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

(myvenv)darklow@moon:~/myvenv$ 
 
I just run out of ideas :((((

darklow

unread,
May 1, 2011, 5:54:28 AM5/1/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
Finally! removing portion by portion from actors.list i found the exact line that creates error.
This is following line from actors.list characters:

Guillaume, Fran?s (I) 23 d?mbre 2008: le jour orance s'est arr?e (2005) (TV)
                        "Michel, l'enfant-roi" (1972)  (uncredited)  [(1972) Le rescap?[8;25H"Une Su?ise ?aris" (1975)  [Le photographe]

As you can see brackets doesn't match.
I also found where these invalid bytes appeared.
If i remove following code from imdbp2sql.py then script works without errors:

Lines (1506 - 1513) version 4.7:

            if role[-1:] == ']':
                    role = role[:-1]
                if role[-1:] == ')':
                    nidx = role.find('(')
                    if nidx != -1:
                        note = role[nidx:]
                        role = role[:nidx].rstrip()
                        if not role: role = None

So that means because of these matching brackets this split script does something wrong and so these invalid bytes appears.
Also it is hard for me to understand why this thing is happening only to me. One idea was there may be something with unzip function that is used to decompress actors.list.gz. 

I attached sample files: exact character line with error, and also compiled version of actors.list with correct head and foot and error
I created these files using gunzip, sed and cat functions to decompress, cut exact lines and combine head, middle and foot parts.

Any suggestions how to fix these lines so error doesnt appears?
I am afraid by removing these lines will make some wrong data to import.

Bu finally i have feeling we are very close to discover the real problem :)
actors.list
actors.list.middle
actors.list.gz

Davide Alberani

unread,
May 1, 2011, 8:48:54 AM5/1/11
to darklow, tho...@stewarts.org.uk, imdbpy-help
On Sun, May 1, 2011 at 11:54, darklow <dar...@gmail.com> wrote:
> Finally! removing portion by portion from actors.list i found the exact line
> that creates error.

Very good, thanks!
And now, the long awaited news: this bug is fixed! :D

The problem was fired by this role:
[(1972) Le rescapé]

The imdbpy2sql.py script did nothing wrong, stripping the first and last
square brackets; after that, the string is considered to be a character
name (remember that we're parsing a role) and parsed using the
imdb.utils.analyze_name function.
This function parses also people names, and in some circumstances
the name contains a reference to the dates of birth and death.
Unfortunately, stripping these notes, I made some wrong assumption (like
the fact that a name can't begin with a parenthesis), and this lead to the
last char of the name stripped - whichever it was.
In our case, the name started with an open parenthesis and ended with
a char that in utf8 occupies two bytes... the rest of the story is known. :-)

The fix is already in the mercurial repository; thank you very much for
the extensive tests and debug!

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

darklow

unread,
May 2, 2011, 2:47:25 AM5/2/11
to Davide Alberani, tho...@stewarts.org.uk, imdbpy-help
Thank you very much! Yesterday evening downloaded latest version and by this morning script has been successfully finished! :)
Thank you for your patience and guiding through the tests, i really glad we finally found the problem and fixed it.
Just curious, why only me and one another user encountered this problem, but when you run the same tests, you didn't see the error? :)

Davide Alberani

unread,
May 2, 2011, 2:32:03 PM5/2/11
to darklow, tho...@stewarts.org.uk, imdbpy-help
On Mon, May 2, 2011 at 08:47, darklow <dar...@gmail.com> wrote:
>
> Thank you for your patience and guiding through the tests, i really glad we
> finally found the problem and fixed it.

Yep, even if it took a little too long. :-)

> Just curious, why only me and one another user encountered this problem, but
> when you run the same tests, you didn't see the error? :)

It may have something to do with the use python library to connect to
Postgres. Maybe some libraries handle gracefully this kind of error; I have
to check better the versions installed on my system and on the virtualenv
I've used to reproduce the bug.
In fact the right thing to do in such cases is to raise an exception (like in
our case); other databases - or libraries to connect to databases - like MySQL
simply ignore with a warning these errors (not a great idea).

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------

darklow

unread,
Sep 19, 2011, 3:36:11 PM9/19/11
to imdbpy-help
Hi,

This fix worked for about some months and unfortunately there is similar encoding error in latest data files (16.sep.2011)
Using latest DEV version on virtualenv: IMDbPY==4.8dev-20110822 and python 2.6
This configuration worked perfectly with previous data files. So it means there must be some kind of a trash again for actor and characters files.
Here is the full log for error:

SCANNING actor: Ribeiro, Freddy
SCANNING actor: Richard, Darryl
 * FLUSHING SQLData...
SCANNING actor: Richardson, Ian (I)
SCANNING actor: Richter, Friedrich
 * FLUSHING SQLData...
SCANNING actor: Riebisi, Romeo
SCANNING actor: Rignault, Alexandre
 * FLUSHING CharactersCache...
Traceback (most recent call last):
  File "./bin/imdbpy2sql.py", line 5, in <module>
    pkg_resources.run_script('IMDbPY==4.8dev-20110822', 'imdbpy2sql.py')
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/pkg_resources.py", line 489, in run_script
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg/pkg_resources.py", line 1207, in run_script
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 2959, in <module>
    run()
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 2820, in run
    castLists(_charIDsList=characters_imdbIDs)
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 1584, in castLists
    doCast(f, roleid, rolename)
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 1543, in doCast
    cid = CACHE_CID.addUnique(role)
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 966, in addUnique
    else: return self.add(key, miscData)
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 959, in add
    self[key] = c
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 869, in __setitem__
    self.flush()
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 892, in flush
    self._toDB(quiet)
  File "/usr/share/nginx/store/imdb/virtualenv/lib/python2.6/site-packages/IMDbPY-4.8dev_20110822-py2.6-linux-x86_64.egg/EGG-INFO/scripts/imdbpy2sql.py", line 1194, in _toDB
    CURS.executemany(self.sqlstr, self.converter(l))
psycopg2.DataError: invalid byte sequence for encoding "UTF8": 0xc320


Any ideas?
Thanks.

Davide Alberani

unread,
Sep 20, 2011, 3:16:20 PM9/20/11
to darklow, imdbpy-help
On Mon, Sep 19, 2011 at 21:36, darklow <dar...@gmail.com> wrote:
>
> This fix worked for about some months and unfortunately there is similar
> encoding error in latest data files (16.sep.2011)

Tried right now, and everything went fine for me, using
the same version. :-(

Are you using SQLObject or SQLAlchemy?
Version of MySQL?

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1

Reply all
Reply to author
Forward
0 new messages