Perl port

2 views
Skip to first unread message

Stanis

unread,
Dec 21, 2008, 4:15:52 PM12/21/08
to Mimeparse-dev
Hi all,
I've recently ported mimeparse.py 1.0.2 to Perl, because I didn't find
anything suitable on CPAN (but maybe I just didn't search hard
enough).
I've uploaded mimeparse-perl-0.1.2.tar.gz to this discussion group,
feel free to add it to this project.

cheers,
Stanis

Joe Gregorio

unread,
Dec 21, 2008, 7:02:52 PM12/21/08
to mimepa...@googlegroups.com
Excellent! I've added you as an administrator to the project. Feel
free to add your code
and please update the main wiki page with a Perl example.

Thanks,
-joe

>
> cheers,
> Stanis
> >
>



--
Joe Gregorio http://bitworking.org

Stanis

unread,
Dec 21, 2008, 8:09:23 PM12/21/08
to Mimeparse-dev
On Dec 22, 1:02 am, "Joe Gregorio" <j...@bitworking.org> wrote:
> Excellent! I've added you as an administrator to the project.

Thank you :)

> Feel free to add your code and please update the main wiki page with a Perl example.

Done.

Steve Vinoski

unread,
Dec 21, 2008, 9:14:22 PM12/21/08
to stanis.tre...@gmail.com, mimepa...@googlegroups.com

Nice work, Stanis, I was about to start writing one in Perl but you've
saved me the trouble. :-)

--steve

Aristotle Pagaltzis

unread,
Dec 21, 2008, 10:39:26 PM12/21/08
to mimepa...@googlegroups.com
* Steve Vinoski <vin...@gmail.com> [2008-12-22 03:15]:

> Nice work, Stanis, I was about to start writing one in Perl but
> you've saved me the trouble. :-)

Argh, I just spent last night writing a port. I took some time
to refactor a bit to make it easier to read, though. Both
`fitness_and_quality_parsed` and `best_match` are overly dense,
IMO. I factored out bits of the `best_match` type match
conditional in and rewrote both loops in a slightly more
straight-line fashion.

Note that the way Stanis wrote `parse_media_range` will throw “is
not numeric” warnings when the `q` parameter happens to contain
garbage rather than a numeric value; I took care to avoid that.

The original code was missing a function `parse_media_range_list`
which admittedly is just one line, but was copypasted into both
`quality` and `best_match` and left the user having to that by
themselves, rendering `fitness_and_quality_parsed` a lot less
useful.

I interpreted the interfaces to functions that take a pre-parsed
list of media ranges differently as well, having them take a flat
list rather than a list of arrays. It’s no harder to consume but
easier to produce.

In terms of docs, I took care to use POD links for crossrefs
between functions. I also shunted the docs to a single block at
the bottom of the code because I find the interleaved style very
distracting to read, regardless of whether I am looking at the
docs or the code. Of course there is debate about which style is
preferrable…

As well, I factored the copypasting out of the tests, rewriting
them in a table-driven fashion.

I also split them out into separate `.t` files, as is the custom
in the Perl world. In fact I went a bit further and cast the
whole thing into CPAN module structure. That does mean it’s no
longer a single file, though. Also in deference to CPAN customs,
I used the name Parse::MIME rather than MIMEParse.

I’ve put the code up for grabs on github [1]. How do we proceed
from here?

[1]: http://github.com/ap/parse--mime

--
*AUTOLOAD=*_;sub _{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1}
&Just->another->Perl->hack;
#Aristotle Pagaltzis // <http://plasmasturm.org/>

Steve Vinoski

unread,
Dec 21, 2008, 11:15:08 PM12/21/08
to paga...@gmx.de, mimepa...@googlegroups.com
Hi Aristotle, good ol' Perl -- there's always more than one way to do
it. ;-) Comments below.

On Sun, Dec 21, 2008 at 10:39 PM, Aristotle Pagaltzis <paga...@gmx.de> wrote:
> Argh, I just spent last night writing a port.

Hmm, I guess this mimeparse-dev team is too efficient at taking in ports! :-)

> I took some time
> to refactor a bit to make it easier to read, though. Both
> `fitness_and_quality_parsed` and `best_match` are overly dense,
> IMO. I factored out bits of the `best_match` type match
> conditional in and rewrote both loops in a slightly more
> straight-line fashion.
>
> Note that the way Stanis wrote `parse_media_range` will throw "is
> not numeric" warnings when the `q` parameter happens to contain
> garbage rather than a numeric value; I took care to avoid that.

I agree, that should be handled.

> The original code was missing a function `parse_media_range_list`
> which admittedly is just one line, but was copypasted into both
> `quality` and `best_match` and left the user having to that by
> themselves, rendering `fitness_and_quality_parsed` a lot less
> useful.
>
> I interpreted the interfaces to functions that take a pre-parsed
> list of media ranges differently as well, having them take a flat
> list rather than a list of arrays. It's no harder to consume but
> easier to produce.
>
> In terms of docs, I took care to use POD links for crossrefs
> between functions. I also shunted the docs to a single block at
> the bottom of the code because I find the interleaved style very
> distracting to read, regardless of whether I am looking at the
> docs or the code. Of course there is debate about which style is
> preferrable…

My preferences here match yours. In fact when I first looked at the
current version I was confused; it looked like it was all docs and no
code!

> As well, I factored the copypasting out of the tests, rewriting
> them in a table-driven fashion.

Did you happen to put the test data in language-neutral external text
files? That was something Joe mentioned to me at one point, as he
wanted to prevent duplication throughout the various language
versions, but neither of us ever got to making that change. If you did
this, let me know so I can switch Erlang over to use the files. (I'm
also willing to switch the other languages over too.)

> I also split them out into separate `.t` files, as is the custom
> in the Perl world. In fact I went a bit further and cast the
> whole thing into CPAN module structure. That does mean it's no
> longer a single file, though. Also in deference to CPAN customs,
> I used the name Parse::MIME rather than MIMEParse.

I also wondered about how this relates to CPAN modules.

> I've put the code up for grabs on github [1]. How do we proceed
> from here?

Well, I think you've raised some important points, so hopefully you
and Stanis can work together and reach consensus on the Perl version.
My own preference would be to add you as a contributor to allow you
and Stanis to do that, but I always defer to Joe when it comes to
adding new developers.

--steve

Joe Gregorio

unread,
Dec 21, 2008, 11:24:11 PM12/21/08
to mimepa...@googlegroups.com, paga...@gmx.de
Aristotle, let me know a gmail account I can use to add you
as a developer to the project.

Thanks,
-joe

>
> --steve

Chris Anderson

unread,
Dec 21, 2008, 11:25:10 PM12/21/08
to mimepa...@googlegroups.com
On Sun, Dec 21, 2008 at 8:15 PM, Steve Vinoski <vin...@gmail.com> wrote:
>
> Did you happen to put the test data in language-neutral external text
> files? That was something Joe mentioned to me at one point, as he
> wanted to prevent duplication throughout the various language
> versions, but neither of us ever got to making that change. If you did
> this, let me know so I can switch Erlang over to use the files. (I'm
> also willing to switch the other languages over too.)
>

If I may suggest JSON as a test data format? It's pretty
straightforward and all the languages have a facility for consuming
it.

Converting the JavaScript tests so they just operate over a list of
static data should be easy. Maybe I'll get inspired if the
snow-weekend we're having in Portland keeps up it's promise to become
a snow week.

I really like the order of implementation the tests encourage, even if
it's a bit funny to have so many public methods when most people just
want best_match. Anyway, one concern we'll need to address is how to
specify (in the test fixtures) which functions the [expected_result,
function_args] tuples are addressed as. It won't be that hard but it
does lean toward me changing the JavaScript library to use Python's
method-name spellings.

Having declarative tests is good in that it lets us standardize the
requirements, hopefully we can keep the barrier to entry low. Seeing
more implementations will always be nice.

--
Chris Anderson
http://jchris.mfdz.com

Aristotle Pagaltzis

unread,
Dec 21, 2008, 11:31:43 PM12/21/08
to mimepa...@googlegroups.com
* Joe Gregorio <j...@bitworking.org> [2008-12-22 05:25]:

> Aristotle, let me know a gmail account I can use to add you as
> a developer to the project.

Do I need a GMail account or will any Google account do? My
Google account would be the same as this mail: paga...@gmx.de

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

Joe Gregorio

unread,
Dec 21, 2008, 11:35:53 PM12/21/08
to mimepa...@googlegroups.com
On Sun, Dec 21, 2008 at 11:31 PM, Aristotle Pagaltzis <paga...@gmx.de> wrote:
>
> * Joe Gregorio <j...@bitworking.org> [2008-12-22 05:25]:
>> Aristotle, let me know a gmail account I can use to add you as
>> a developer to the project.
>
> Do I need a GMail account or will any Google account do? My
> Google account would be the same as this mail: paga...@gmx.de

OK, you should be all set, added as project admin.

Thanks,
-joe

>
> Regards,
> --
> Aristotle Pagaltzis // <http://plasmasturm.org/>
>
> >
>



Aristotle Pagaltzis

unread,
Dec 21, 2008, 11:37:57 PM12/21/08
to mimepa...@googlegroups.com
Hi Chris,

* Chris Anderson <jch...@apache.org> [2008-12-22 05:30]:


> If I may suggest JSON as a test data format? It's pretty
> straightforward and all the languages have a facility for
> consuming it.

that sounds reasonable, although I think that none of the other
languages include JSON libraries in their core distributions, so
it would require relying on external dependencies. (Actually this
is true even of Javascript if you don’t like the `eval` approach
of parsing JSON.) Are we OK with that?

(I don’t have a strong opinion either way, just pointing out the
consequence.)

> Anyway, one concern we'll need to address is how to specify (in
> the test fixtures) which functions the [expected_result,
> function_args] tuples are addressed as.

Take a look at what I did:
http://github.com/ap/parse--mime/tree/master/t

Those are just a few search&replaces away from JSON, btw.

Steve Vinoski

unread,
Dec 21, 2008, 11:45:06 PM12/21/08
to mimepa...@googlegroups.com
On Sun, Dec 21, 2008 at 11:37 PM, Aristotle Pagaltzis <paga...@gmx.de> wrote:
>
> Hi Chris,
>
> * Chris Anderson <jch...@apache.org> [2008-12-22 05:30]:
>> If I may suggest JSON as a test data format? It's pretty
>> straightforward and all the languages have a facility for
>> consuming it.
>
> that sounds reasonable, although I think that none of the other
> languages include JSON libraries in their core distributions, so
> it would require relying on external dependencies. (Actually this
> is true even of Javascript if you don't like the `eval` approach
> of parsing JSON.) Are we OK with that?

I also had been planning to suggest JSON for this, due to its ubiquity
and simplicity. I think it's OK to have extra JSON dependencies for
testing purposes.

--steve

Steve Vinoski

unread,
Dec 22, 2008, 12:02:44 AM12/22/08
to mimepa...@googlegroups.com
On Sun, Dec 21, 2008 at 11:25 PM, Chris Anderson <jch...@apache.org> wrote:
>
> If I may suggest JSON as a test data format? It's pretty
> straightforward and all the languages have a facility for consuming
> it.

As mentioned in my other email, I agree completely.

> I really like the order of implementation the tests encourage, even if
> it's a bit funny to have so many public methods when most people just
> want best_match. Anyway, one concern we'll need to address is how to
> specify (in the test fixtures) which functions the [expected_result,
> function_args] tuples are addressed as. It won't be that hard but it
> does lean toward me changing the JavaScript library to use Python's
> method-name spellings.

I'd prefer that you keep the JavaScript function names in the style
you already have because that's what JavaScript programmers expect.
That was one of the first things I checked in your code, in fact, and
was glad to see you had used mixedCase for the function names. You
could either use a static table to convert from python style, given
there's just a few of them, or just

targetFunctionName.replace(/_([a-z])/g, function(m, g1) { return
g1.toUpperCase(); });

to convert the function names.

> Having declarative tests is good in that it lets us standardize the
> requirements, hopefully we can keep the barrier to entry low. Seeing
> more implementations will always be nice.

Agreed.

--steve

Aristotle Pagaltzis

unread,
Dec 22, 2008, 12:26:56 AM12/22/08
to mimepa...@googlegroups.com

OK, I’ve now converted the testcases to JSON.

There are no function names in there, so that won’t be a concern.

Stanis

unread,
Dec 22, 2008, 9:03:33 AM12/22/08
to Mimeparse-dev
Hi Aristotle,
I wondered why noone had ported this to Perl yet, and always assumed
that there
was a module on CPAN that offered this functionality, hidden among all
the
MIME::* modules that deal with MIME Email, and I just couldn't find
it.
Seems I was not the only one. :)

On Dec 22, 4:39 am, Aristotle Pagaltzis <pagalt...@gmx.de> wrote:

> Note that the way Stanis wrote `parse_media_range` will throw “is
> not numeric” warnings when the `q` parameter happens to contain
> garbage rather than a numeric value; I took care to avoid that.

Hm, I can't reproduce this behaviour. What input generates this
warning? I
thought Scalar::Util::looks_like_number() takes care of that.

Your points are all valid, though. In my defense, I wanted the
functionality
quickly, and after a CPAN search did not turn up anything, I decided
to do a line-by-line port of the original python version, including
the inline
documentation and tests, to make backporting changes as easy as
possible.
Also, in the spirit of the other language versions released so far, I
tried to
not change the API and put it all into a single file with no
dependencies that
works out-of-the-box.

I eventually planned to do a CPAN release of my code, but I'm not very
familiar
with the CPAN release process, and your code seems to have all the
necessary
bits and pieces already in place.
So please, go ahead and upload your code, and also please release it
on CPAN.

regards,
Stanis
Reply all
Reply to author
Forward
0 new messages