> Why does this error occur when writing the prettified html file with
> beautifulsoup4 and how can it be corrected?
If you examine the string before writing it to a file, you should see
that it's a bytestring under Beautiful Soup 3, and a Unicode string
under Beautiful Soup 4. BS3's handling of Unicode was inconsistent.
BS4 won't convert Unicode strings to bytestrings unless you explicitly
tell it to.
Your error happens when Python attempts to encode a Unicode character
(EM DASH, in this case) into your system encoding, but your system
encoding doesn't include that character. This class of error is
discussed here:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#miscellaneous
I've updated the documentation to cover the case where the error
happens while writing to a file.
You can get the BS3 behavior by calling prettify(encoding="utf8"), or
you can encode the Unicode string to a UTF-8 bytestring before writing
it to a file.
Leonard
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "beautifulsoup" group.
> To view this discussion on the web visit
>
https://groups.google.com/d/msg/beautifulsoup/-/-clbmBBnt5cJ.
> To post to this group, send email to
beauti...@googlegroups.com.
> To unsubscribe from this group, send email to
>
beautifulsou...@googlegroups.com.
> For more options, visit this group at
>
http://groups.google.com/group/beautifulsoup?hl=en.