Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: Programmatically discovering encoding types supported by codecs module

8 views
Skip to first unread message

Benjamin Kaplan

unread,
Mar 24, 2010, 1:18:32 PM3/24/10
to pytho...@python.org
On Wed, Mar 24, 2010 at 12:17 PM, <pyt...@bdurham.com> wrote:
> Is there a way to programmatically discover the encoding types supported by
> the codecs module?
>
> For example, the following link shows a table with Codec, Aliases, and
> Language columns.
> http://docs.python.org/library/codecs.html#standard-encodings
>
> I'm looking for a way to programmatically generate this table through some
> form of module introspection.
>
> Ideas?
>
> Malcolm
> --

According to my brief messing around with the REPL,
encodings.aliases.aliases is a good place to start. I don't know of
any way to get the Language column, but at the very least that will
give you most of the supported encodings and any aliases they have.

Gabriel Genellina

unread,
Mar 24, 2010, 1:39:20 PM3/24/10
to pytho...@python.org
En Wed, 24 Mar 2010 13:17:16 -0300, <pyt...@bdurham.com> escribió:

> Is there a way to programmatically discover the encoding types
> supported by the codecs module?
>
> For example, the following link shows a table with Codec,
> Aliases, and Language columns.
> http://docs.python.org/library/codecs.html#standard-encodings
>
> I'm looking for a way to programmatically generate this table
> through some form of module introspection.

After looking at how things are done in codecs.c and encodings/__init__.py
I think you should enumerate all modules in the encodings package that
define a getregentry function.
Aliases come from encodings.aliases.aliases.

--
Gabriel Genellina

pyt...@bdurham.com

unread,
Mar 24, 2010, 1:55:32 PM3/24/10
to Benjamin Kaplan, pytho...@python.org
Benjamin,

> According to my brief messing around with the REPL, encodings.aliases.aliases is a good place to start. I don't know of any way to get the Language column, but at the very least that will give you most of the supported encodings and any aliases they have.

Thank you - that's exactly the type of information I was looking for.

I'm including the following for anyone browsing the mailing list
archives in the future.

Here's the snippet we're using to dynamically generate the codec
documentation posted on the docs.python website.

import encodings
encodingDict = encodings.aliases.aliases
encodingType = dict()
for key, value in encodingDict.items():
if value not in encodingType:
encodingType[ value ] = list()
encodingType[ value ].append( key )

for key in sorted( encodingType.keys() ):
aliases = sorted( encodingType[ key ] )
aliases = ', '.join( aliases )
print '%-20s%s' % ( key, aliases )

Regards,
Malcolm

pyt...@bdurham.com

unread,
Mar 24, 2010, 1:58:47 PM3/24/10
to Gabriel Genellina, pytho...@python.org
Gabriel,

> After looking at how things are done in codecs.c and encodings/__init__.py I think you should enumerate all modules in the encodings package that define a getregentry function. Aliases come from encodings.aliases.aliases.

Thanks for looking into this for me. Benjamin Kaplan made a similar
observation. My reply to him included the snippet of code we're using to
generate the actual list of encodings that our software will support
(thanks to Python's codecs and encodings modules).

Your help is always appreciated :)

Regards,
Malcolm


----- Original message -----
From: "Gabriel Genellina" <gags...@yahoo.com.ar>
To: pytho...@python.org
Date: Wed, 24 Mar 2010 14:39:20 -0300
Subject: Re: Programmatically discovering encoding types supported by
codecs module

En Wed, 24 Mar 2010 13:17:16 -0300, <pyt...@bdurham.com> escribió:

> Is there a way to programmatically discover the encoding types
> supported by the codecs module?
>
> For example, the following link shows a table with Codec,
> Aliases, and Language columns.
> http://docs.python.org/library/codecs.html#standard-encodings
>
> I'm looking for a way to programmatically generate this table
> through some form of module introspection.


--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Gabriel Genellina

unread,
Mar 24, 2010, 6:50:11 PM3/24/10
to pytho...@python.org
En Wed, 24 Mar 2010 14:58:47 -0300, <pyt...@bdurham.com> escribiᅵ:

>> After looking at how things are done in codecs.c and
>> encodings/__init__.py I think you should enumerate all modules in the
>> encodings package that define a getregentry function. Aliases come from
>> encodings.aliases.aliases.
>
> Thanks for looking into this for me. Benjamin Kaplan made a similar
> observation. My reply to him included the snippet of code we're using to
> generate the actual list of encodings that our software will support
> (thanks to Python's codecs and encodings modules).

I was curious as whether both methods would give the same results:

py> modules=set()
py> for name in glob.glob(os.path.join(encodings.__path__[0], "*.py")):
... name = os.path.basename(name)[:-3]
... try: mod = __import__("encodings."+name,
fromlist=['ilovepythonbutsometimesihateit'])
... except ImportError: continue
... if hasattr(mod, 'getregentry'):
... modules.add(name)
...
py> fromalias = set(encodings.aliases.aliases.values())
py> fromalias - modules
set(['tactis'])
py> modules - fromalias
set(['charmap',
'cp1006',
'cp737',
'cp856',
'cp874',
'cp875',
'idna',
'iso8859_1',
'koi8_u',
'mac_arabic',
'mac_centeuro',
'mac_croatian',
'mac_farsi',
'mac_romanian',
'palmos',
'punycode',
'raw_unicode_escape',
'string_escape',
'undefined',
'unicode_escape',
'unicode_internal',
'utf_8_sig'])

There is a missing 'tactis' encoding (?) and about twenty without alias.

--
Gabriel Genellina

pyt...@bdurham.com

unread,
Mar 28, 2010, 6:48:52 AM3/28/10
to Gabriel Genellina, pytho...@python.org
Gabriel,

Thank you for your analysis - very interesting. Enjoyed your fromlist
choice of names. I'm still in my honeymoon phase with Python so I only
know the first part :)

Regards,
Malcolm


----- Original message -----
From: "Gabriel Genellina" <gags...@yahoo.com.ar>
To: pytho...@python.org
Date: Wed, 24 Mar 2010 19:50:11 -0300
Subject: Re: Programmatically discovering encoding types supported by
codecs module

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

0 new messages