[Imdbpy-help] imdbpy2sql Duplicate entry

43 views

Skip to first unread message

Bjørn von Rimscha

unread,

Feb 1, 2010, 6:18:53 AM2/1/10

to imdbp...@lists.sourceforge.net

Hi there,

I'm trying to build a local copy of the IMDB database to import queries
into a statistics software package. Thus I'm trying to use imdbpy to
create a local database.

I sticked to the instructions in the README.sqldb but I ran in to
problems. I got the following error message

* FLUSHING MoviesCache...
* TOO MANY DATA (100000 items in MoviesCache), recursion: 1
* SPLITTING (run 1 of 2), recursion: 1
* FLUSHING MoviesCache...
Traceback (most recent call last):
File "C:\Python26\Scripts\imdbpy2sql.py", line 2786, in<module> run()
File "C:\Python26\Scripts\imdbpy2sql.py", line 2634, in run readMovieList()
File "C:\Python26\Scripts\imdbpy2sql.py", line 1428, in readMovieList mid = CACHE_MID.addUnique(title, yearData)
File "C:\Python26\Scripts\imdbpy2sql.py", line 1036, in addUnique else: return self.add(key, miscData)
File "C:\Python26\Scripts\imdbpy2sql.py", line 915, in add self[key] = c
File "C:\Python26\Scripts\imdbpy2sql.py", line 825, in __setitem__ self.flush()
File "C:\Python26\Scripts\imdbpy2sql.py", line 877, in flush self.flush(quiet=quiet, _recursionLevel=_recursionLevel)
File "C:\Python26\Scripts\imdbpy2sql.py", line 848, in flush self._toDB(quiet)
File "C:\Python26\Scripts\imdbpy2sql.py", line 1020, in _toDB self._runCommand(l)
File "C:\Python26\Scripts\imdbpy2sql.py", line 1024, in _runCommand CURS.executemany(self.sqlstr, self.converter(dataList))
File "C:\Python26\lib\site-packages\MySQLdb\cursors.py", line 205, in executemany r = r + self.execute(query, a)
File "C:\Python26\lib\site-packages\MySQLdb\cursors.py", line 173, in execute self.errorhandler(self, exc, value)
File "C:\Python26\lib\site-packages\MySQLdb\connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue_mysql_exceptions.IntegrityError: (1062, "Duplicate entry '262918' for key 'PRIMARY'")

Does that mean splitting created dupicates? Or is there a mistake in the
imdb data - wich I don't believe since I tried it with last weeks
database plain text files. I got the same error message only with an
other duplicate number.
I also increased the max_allowed_packet value as suggested in the read
me file only with the same result.

Any help is much apprechiated!

Thanks
Bjørn

System:
MySQL 5.1
Python 2.6.4
Windows 7
Intel Core2 Duo P8400
4 GB RAM

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Imdbpy-help mailing list
Imdbp...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Davide Alberani

unread,

Feb 1, 2010, 11:17:02 AM2/1/10

to imdbp...@lists.sourceforge.net, b.vonr...@gmail.com

On Feb 01, Bjørn von Rimscha <b.vonr...@gmail.com> wrote:

> * SPLITTING (run 1 of 2), recursion: 1

[...]

> File "C:\Python26\lib\site-packages\MySQLdb\connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue_mysql_exceptions.IntegrityError: (1062, "Duplicate entry '262918' for key 'PRIMARY'")
>
> Does that mean splitting created dupicates?

Something like that. :-/
It goes without saying that it shouldn't. :-)

> Or is there a mistake in the imdb data

Improbable: the plain text data files contain some crap, but
imdbpy2sql.py is supposed to work around these problems.
Moreover, the IDs used as primary key in the 'title' (and other)
tables are made up by imdbpy2sql.py itself, and not taken from
the plain text data files.

> I also increased the max_allowed_packet value as suggested in the read
> me file only with the same result.

[...]

> MySQL 5.1
> Python 2.6.4
> Windows 7

Honestly, I'm getting a lot of bug reports about MySQL on Windows 7
or Vista.
It's crazy that nobody is able to make it work: splitting the data
we're sending to the database is just a fail-safe measure, and as
you can imagine it would be better if it never happens.
Under Linux I've never seen these problems: tomorrow I'll try to
set up my own MySQL to reproduce the problem, if I can (as you can
guess the code used to split the data set is not extensively tested).

I have some solutions you can try (beside moving to a sane Unix
environment, I mean ;-) :

1. use CSV files, as described in the README.sqldb file.
Basically, add to your command line something like:
-c C:/path/to/an/empty/directory/
But also read the notices in README.sqldb about CSV files
and Windows paths.
PS: right now in the SVN, I'm improving support for CSV handling,
so that you can decouple the creation of the CSV file from the
insertion of the data in the database.

2. use PostgreSQL or another supported database.

In the meanwhile, if someone has any idea about why MySQL on Vista/7
accepts so few data at a time, is solicited to share his thoughts. :-)

Let me know if/how you fix the problem!

--
Davide Alberani <davide....@gmail.com> [GPG KeyID: 0x465BFD47]
http://www.mimante.net/

Reply all

Reply to author

Forward

0 new messages