Unicode only?

29 views
Skip to first unread message

berser...@gmail.com

unread,
Jan 29, 2009, 12:18:08 PM1/29/09
to Quod Libet Development
hi! I use mutagen this is good lib for Python. But mutagen work with
unicode id3 tags, only:

code: m = mutagen.id3.Open ('x.mp3') print m

print: {u'WXXX:': WXXX(encoding=0, desc=u'',
url=u'russianmp3.qps.ru'), 'TPE1': TPE1(encoding=0, text=[u'
\xc5\xe1\xe0\xed\xfc\xea\xee']), 'TDRC': TDRC(encoding=0, text=
[u'2003']), 'TALB': TALB(encoding=0, text=[u'\xd2\xe0\xed\xf6\xfb \xf1
\xec\xe0\xf2\xee\xec']), 'TRCK': TRCK(encoding=0, text=[u'6']),
'TPUB': TPUB(encoding=0, text=[u'by Romikay']), u"COMM::'\\x00\\x00\
\x00'": COMM(encoding=1, lang='\x00\x00\x00', desc=u'', text=
[u'www.rus.6x.to BEST RUSSIAN MUSIC']), 'TIT2': TIT2(encoding=3, text=
[u'An example']), 'TCON': TCON(encoding=0, text=[u'Other']), 'TPE2':
TPE2(encoding=1, text=[u'\xc5\xe1\xe0\xed\xfc\xea\xee'])}

This tag is not unicode, but mutagen add 'u' always.

Michael Urman

unread,
Jan 29, 2009, 1:10:57 PM1/29/09
to quod-libet-...@googlegroups.com

All text tags are Unicode. Some text may be a convenient subset of
Unicode which has a smaller name. If you know a priori that your text
will use it, and are trying to process text as bytestrings, that's
what .encode() is for. Is there a different problem here?


--
Michael Urman

berser...@gmail.com

unread,
Jan 29, 2009, 1:35:59 PM1/29/09
to Quod Libet Development
Problem is here:
m = mutagen.easyid3.Open('x.mp3')
z = m.get('artist')
[u' \xc5\xe1\xe0\xed\xfc\xea\xee']


I do not know how correctly get string from the list [u'
\xc5\xe1\xe0\xed\xfc\xea\xee']
With [' \xc5\xe1\xe0\xed\xfc\xea\xee'] no problem:

z = ['\xe1\xe0\xed\xfc\xea\xee']
print z[0].decode('cp1251')


On 30 янв, 04:10, Michael Urman <mur...@gmail.com> wrote:
> On Thu, Jan 29, 2009 at 11:18, berserker...@gmail.com
>
>
>
> <berserker...@gmail.com> wrote:
>
> > hi! I use mutagen this is good lib for Python. But mutagen work with
> > unicode id3 tags, only:
>
> > code: m = mutagen.id3.Open ('x.mp3') print m
>
> > print: {u'WXXX:': WXXX(encoding=0, desc=u'',
> > url=u'russianmp3.qps.ru'), 'TPE1': TPE1(encoding=0, text=[u'
> > \xc5\xe1\xe0\xed\xfc\xea\xee']), 'TDRC': TDRC(encoding=0, text=
> > [u'2003']), 'TALB': TALB(encoding=0, text=[u'\xd2\xe0\xed\xf6\xfb \xf1
> > \xec\xe0\xf2\xee\xec']), 'TRCK': TRCK(encoding=0, text=[u'6']),
> > 'TPUB': TPUB(encoding=0, text=[u'by Romikay']), u"COMM::'\\x00\\x00\
> > \x00'": COMM(encoding=1, lang='\x00\x00\x00', desc=u'', text=
> > [u'www.rus.6x.toBEST RUSSIAN MUSIC']), 'TIT2': TIT2(encoding=3, text=

Michael Urman

unread,
Jan 29, 2009, 2:26:03 PM1/29/09
to quod-libet-...@googlegroups.com
2009/1/29 berser...@gmail.com <berser...@gmail.com>:

>
> Problem is here:
> m = mutagen.easyid3.Open('x.mp3')
> z = m.get('artist')
> [u' \xc5\xe1\xe0\xed\xfc\xea\xee']
>
>
> I do not know how correctly get string from the list [u'
> \xc5\xe1\xe0\xed\xfc\xea\xee']
> With [' \xc5\xe1\xe0\xed\xfc\xea\xee'] no problem:
>
> z = ['\xe1\xe0\xed\xfc\xea\xee']
> print z[0].decode('cp1251')

Ah. Chances are your file's tag is (technically) corrupt, as there is
no indicator that allows an ID3 tag to use cp1251 encoding. Many
Windows-based taggers would stuff the active codepage encoding in what
is really latin1. To recover your text, you could try
artist = badly_encoded_artist.encode('latin1').decode('cp1251')
--
Michael Urman

berser...@gmail.com

unread,
Jan 29, 2009, 2:37:17 PM1/29/09
to Quod Libet Development
Thanks,

but, return:
>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)






On 30 янв, 05:26, Michael Urman <mur...@gmail.com> wrote:
> 2009/1/29 berserker...@gmail.com <berserker...@gmail.com>:

Steven Robertson

unread,
Jan 30, 2009, 1:25:06 AM1/30/09
to quod-libet-...@googlegroups.com
2009/1/29 berser...@gmail.com <berser...@gmail.com>:

>
> Thanks,
>
> but, return:
>>>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 0: ordinal not in range(128)

Please post the code you used that triggered this error, as well as
the tags in the file (using

>>> x = EasyID3('filename')
>>> for (k, v) in x.items(): print(k, v)

or similar).

Steven

berser...@gmail.com

unread,
Jan 30, 2009, 5:21:59 AM1/30/09
to Quod Libet Development
All tags:

('album', [u'\xd2\xe0\xed\xf6\xfb \xf1 \xec\xe0\xf2\xee\xec'])
('date', [u'2003'])
('title', [u'An example'])
('genre', [u'Other'])
('tracknumber', [u'6'])
('artist', [u' \xc5\xe1\xe0\xed\xfc\xea\xee'])

my code:
m = mutagen.easyid3.Open('x.mp3')
art = m.get('artist')[0]
print art
>>> Åáàíüêî

m = mutagen.easyid3.Open('x.mp3')
art = m.get('artist')[0]
artist = art.encode('latin1').decode('cp1251')
print artist
>>> Ебанько

Oh, sorry! I wrong, yesterday....
Thanks!!!


On 30 янв, 16:25, Steven Robertson <r...@parseit.org> wrote:
> 2009/1/29 berserker...@gmail.com <berserker...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages