[Imdbpy-help] Duplicate titles, other issues

8 views
Skip to first unread message

Joseph H

unread,
Dec 31, 2021, 9:50:04 AM12/31/21
to imdbp...@lists.sourceforge.net, michae...@gmail.com
Hello--

I am putting together a project utilizing IMDBPy. The early issues I'm having I've covered on StackOverflow, although not all of what I mention there is specific to IMDBPy. One of the big problems I've encountered is with duplicate titles appearing, as you can see here:

Screenshot 2021-12-29 233304.png
I'm using Django templating language to create the output, like so:

Screenshot 2021-12-31 080218.png

I just don't understand why there are so many redundant titles for the same movie id. I would like to remove them if possible. I was thinking about utilizing the fuzzywuzzy package to recognize similarity in titles, and remove redundancies, but I also don't fully understand why they exist in the first place. When I run a print statement for the little "slice" of movie title database that I've created, it shows 15 objects, just like I'd anticipated. And when I encounter the problem with displaying only unique titles later on, many of the titles ARE similar, but not the same, i.e. like "Animal House" and "Animal House (1978)".

I can provide further background on the project if you desire.

Many thanks for any insight that you can provide!

Joseph Hooker

Davide Alberani

unread,
Jan 1, 2022, 7:49:07 AM1/1/22
to imdbp...@lists.sourceforge.net, michae...@gmail.com, Joseph H
Hi Joseph,

You are iterating over k1, v1 of a dict with the format
{'movie_1100000': <Movie Object>}
Then you iterate over the keys and values k2, v2 of the Movie instance
(which behaves like a dict)
From there, you filter only the k2 keys which contains 'title'; there
are various, like 'title', 'canonical title',
'long imdb canonical title' and various others (see the
_additional_keys method of the Movie class).
And then you print its value.

You don't need the second for cycle.
Just print v1['title'] or the key that you need (you can obviously
check if it exists, beforehand).


Hope this helps,




On Fri, Dec 31, 2021 at 3:50 PM Joseph H <jcole...@gmail.com> wrote:
>
> Hello--
>
> I am putting together a project utilizing IMDBPy. The early issues I'm having I've covered on StackOverflow, although not all of what I mention there is specific to IMDBPy. One of the big problems I've encountered is with duplicate titles appearing, as you can see here:
>
>
> I'm using Django templating language to create the output, like so:
>
>
>
> I just don't understand why there are so many redundant titles for the same movie id. I would like to remove them if possible. I was thinking about utilizing the fuzzywuzzy package to recognize similarity in titles, and remove redundancies, but I also don't fully understand why they exist in the first place. When I run a print statement for the little "slice" of movie title database that I've created, it shows 15 objects, just like I'd anticipated. And when I encounter the problem with displaying only unique titles later on, many of the titles ARE similar, but not the same, i.e. like "Animal House" and "Animal House (1978)".
>
> I can provide further background on the project if you desire.
>
> Many thanks for any insight that you can provide!
>
> Joseph Hooker
> _______________________________________________
> Imdbpy-help mailing list
> Imdbp...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/imdbpy-help



--
Davide Alberani <davide....@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD]
http://www.mimante.net/


_______________________________________________
Imdbpy-help mailing list
Imdbp...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help

Davide Alberani

unread,
Jan 1, 2022, 7:56:29 AM1/1/22
to imdbp...@lists.sourceforge.net, michae...@gmail.com, Joseph H
Hi,
As a minor update I've updated the answer in
https://stackoverflow.com/a/70549011/253358
to take care of how the Django templates work.
Reply all
Reply to author
Forward
0 new messages