[Imdbpy-help] Urgent: Retrieving all movies with specific genre,

20 views
Skip to first unread message

Camille Sanchez

unread,
Jul 17, 2020, 9:33:42 AM7/17/20
to imdbp...@lists.sourceforge.net
Hi team,

For a python class, I am trying to sort the IMDb films through different criteria (such as title types, genre, keywords, plot, etc. - similar to this page of the IMDb website) and get a list of movies that match these criteria. However, I am not quite sure how to start, could you help? I have downloaded IMDbPy but I am not sure what to do next.

I saw on different post such as "Retrieving all movies and csv file" and " Retrieving a List of Movies in a Given Year" that it is possible to retrieve the data without having a specific movie ID or movie title. 

Logically, I would assume that it is easy to query the data from a database. I have done it on smaller projects with SQL database for instance, but I am not sure I understand what is the best approach to do such a task. Is it:
- to store all the movies into a server like AWS and then do the searches directly from it
- to access directly the database using this README.sqldb.txt file
Or is there another way?

Are there steps anywhere I can follow? Sorry I am a beginner at all this.

Thank you for your help!

Best.
Camille


Davide Alberani

unread,
Jul 17, 2020, 4:32:01 PM7/17/20
to Camille Sanchez, imdbp...@lists.sourceforge.net
Hi Camille,
Your idea to import the data you need into a SQL database sounds good,
and IMDbPY could help.

There are, however, some caveats: first of all, the document you found
(README.sqldb.txt)
refers to an obsolete set of data no longer updated by IMDb since some years.
A new, up-to-date, dataset exists ( https://www.imdb.com/interfaces/ )
and IMDbPY is able to import it
into a SQL database of your choice; you can find the documentation
here: https://imdbpy.readthedocs.io/en/latest/usage/s3.html

But you may face another problem: IMDb includes very little
information, in this new dataset.
Look at it, and decide if it's okay for your project.
If it is, you can proceed.

More or less, the workflow would be as follow:

1.
install the latest version of https://github.com/alberanid/imdbpy/ -
see https://imdbpy.readthedocs.io/en/latest/#installation

2.
Download the dataset.
You can do it manually or, if you prefer, you can use the
"download-from-s3" script you'll find in the docs/goodies directory
(it requires a Linux system)

3.
import the dataset; as an example, to import the data into a SQLite
database, you can do something like:
s32imdbpy.py /path/to/the/imdb-dataset-2020-07-17/ sqlite:///imdb.db --verbose

(notice the three / in sqlite:///imdb.db - they are all needed)
After a while, you will have an "imdb.db" file in the current
directory, containing the imported data.

4.
you can now search and analyze the data in this file using the
Python's "sqlite3" module.


Let me know if you have questions or something is not clear.

Hope this helps.
> _______________________________________________
> Imdbpy-help mailing list
> Imdbp...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/imdbpy-help



--
Davide Alberani <davide....@gmail.com> [PGP KeyID: 0x3845A3D4AC9B61AD]
http://www.mimante.net/


_______________________________________________
Imdbpy-help mailing list
Imdbp...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help
Reply all
Reply to author
Forward
0 new messages