[Imdbpy-devel] Fixes to title2

18 views
Skip to first unread message

darklow

unread,
Mar 28, 2013, 11:04:48 AM3/28/13
to Imdbpy...@lists.sourceforge.net
Hi Davide,

I fixed few bugs with title2imdbid()
https://github.com/alberanid/imdbpy/pull/8

Not sure github is right place to do that, but i just don't have
Mercurial, so i did it in git.
Also i specified few ideas and propositions there, decided it will be
easier to discuss them here, so all the -devel list users can participate.

Tell me what you think on that search string and match separation.


darklow


------------------------------------------------------------------------------
Own the Future-Intel® Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest.
Compete for recognition, cash, and the chance to get your game
on Steam. $5K grand prize plus 10 genre and skill prizes.
Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
_______________________________________________
Imdbpy-devel mailing list
Imdbpy...@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-devel

Davide Alberani

unread,
Mar 28, 2013, 3:18:48 PM3/28/13
to darklow, IMDbPY development
On Thu, Mar 28, 2013 at 4:04 PM, darklow <dar...@gmail.com> wrote:
>
> I fixed few bugs with title2imdbid()
> https://github.com/alberanid/imdbpy/pull/8

Great, thank you very much!
The patch seems fine; no problem at all if you use github:
I can do the pull/push between the various repositories.

> Also i specified few ideas and propositions there, decided it will be
> easier to discuss them here, so all the -devel list users can participate.
>
> Tell me what you think on that search string and match separation.

Sounds reasonable.
In the last years, IMDb almost stopped using the canonical forms almost
everywhere, I think (even in the plain text data files). We can probably
greatly simplify our code using just what we read from the web of from
the files, leaving some simple functions to do the conversions from the
normal format to the "canonical" one, when a user requires it (I think it's
still useful in many cases).

As anybody noticed, lately I've had very little time for IMDbPY, and I surely
appreciate any help (especially code and testing).
So, if you already have something or you plan to do some fixes and
introduce improvements, I'll be more than glad!


Thanks!

--
Davide Alberani <davide....@gmail.com> [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

darklow

unread,
Mar 28, 2013, 3:26:43 PM3/28/13
to Davide Alberani, IMDbPY development
I was using imdbpy for a long time, but only few things of it, as we worked with PHP mostly.
Now we moved all our development to python, so we can use this nice library with full features.
I'll be happy to help.

I'll can start with some refactoring in search, as i mentioned before episode search does not work at all, therefore we need to separate search strings and match strings.

Question about that canonical names:
Maybe you know why using browser search returns normalized names, but searching by imdbpy lib result names are still in canonical format? If we could find a way to force results to appear in normal name, then we could slowly move canonical names out of search at all and refactorings would be much smaller.


ceturtdiena, 2013. gada 28. martā 21:18
On Thu, Mar 28, 2013 at 4:04 PM, darklow <dar...@gmail.com> wrote:
I fixed few bugs with title2imdbid()
https://github.com/alberanid/imdbpy/pull/8
Great, thank you very much!
The patch seems fine; no problem at all if you use github:
I can do the pull/push between the various repositories.

Also i specified few ideas and propositions there, decided it will be
easier to discuss them here, so all the -devel list users can participate.

Tell me what you think on that search string and match separation.
Sounds reasonable.
In the last years, IMDb almost stopped using the canonical forms almost
everywhere, I think (even in the plain text data files).  We can probably
greatly simplify our code using just what we read from the web of from
the files, leaving some simple functions to do the conversions from the
normal format to the "canonical" one, when a user requires it (I think it's
still useful in many cases).

As anybody noticed, lately I've had very little time for IMDbPY, and I surely
appreciate any help (especially code and testing).
So, if you already have something or you plan to do some fixes and
introduce improvements, I'll be more than glad!


Thanks!

--
Davide Alberani <davide....@gmail.com>  [PGP KeyID: 0x465BFD47]
http://www.mimante.net/
ceturtdiena, 2013. gada 28. martā 17:04
Hi Davide,

I fixed few bugs with title2imdbid()
https://github.com/alberanid/imdbpy/pull/8

Not sure github is right place to do that, but i just don't have Mercurial, so i did it in git.
Also i specified few ideas and propositions there, decided it will be easier to discuss them here, so all the -devel list users can participate.

Tell me what you think on that search string and match separation.


darklow

Davide Alberani

unread,
Mar 29, 2013, 5:19:38 PM3/29/13
to darklow, IMDbPY development
On Thu, Mar 28, 2013 at 8:26 PM, darklow <dar...@gmail.com> wrote:
>
> I'll can start with some refactoring in search, as i mentioned before
> episode search does not work at all, therefore we need to separate search
> strings and match strings.

Perfect!

> Question about that canonical names:
> Maybe you know why using browser search returns normalized names, but
> searching by imdbpy lib result names are still in canonical format? If we
> could find a way to force results to appear in normal name, then we could
> slowly move canonical names out of search at all and refactorings would be
> much smaller.

Hmm... I can't find a case in which we use the canonical format; do
you have an example?

It makes sense, for me, to internally store the data in the provided format.
Until it was in a "Title, The" format (or at least until it was
available somewhere
on the web page) that was the format of choice (it also provide some more
info: which part of a title is the article)

Now, from what I see, we already store internally the "The Title" format,
but due to the recent changes to the imdbIndex, the functions to output
it in canonical format are somewhat broken (I can fix them).

For the 'sql' data: that's also stored in "The Title" format, now (not sure if
it had any impact on the md5sum column, which was previsouly calculated
on canonical format and now *I guess* it's calculate on the normal formal,
by the way... this should have made impossible to update the imdb_id
column, during an update of the db :-/ )


--
Davide Alberani <davide....@gmail.com> [PGP KeyID: 0x465BFD47]
http://www.mimante.net/

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete
for recognition, cash, and the chance to get your game on Steam.
$5K grand prize plus 10 genre and skill prizes. Submit your demo
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
Reply all
Reply to author
Forward
0 new messages