>> print BS(u'\u201cnext generation\u201d').renderContents()
“next generation”
>> print unicode(BSu'\u201cnext generation\u201d').renderContents() )
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position
0: ordinal not in range(128)
Am I missing something too?
C.
On Dec 4 2009, 7:27 pm, Aaron DeVore <aaron.dev...@gmail.com> wrote:
> renderContentsconverts to a str on Python 2.*. Use this instead:
>
> unicode(BS.BeautifulSoup(u'\u201cnext generation\u201d'))
>
> -Aaron DeVore
>
I don't have a good grasp of Unicode/encodings so I can't give an
authoritative word on this. However, I do know where the error is
coming from. Using unicode(soup.renderEncoding()) attempts to convert
an str to unicode using the ascii codec, which doesn't include \201c.
Instead, you need to use the utf-8 encoding:
soup.renderContents().decode('utf-8')
-Aaron DeVore