� is called REPLACEMENT CHARACTER. Beautiful Soup may use REPLACEMENT
CHARACTER to replace characters that can't be converted to Unicode, as
described here:
http://www.crummy.com/software/BeautifulSoup/bs4/doc/#encodings
You can do search-and-replace with code like this:
>>> for s in soup.strings:
... s.replace_with(s.replace(u"\N{REPLACEMENT CHARACTER}", ""))
However, it's possible that your terminal is printing � to represent
*other* characters that it can't display. In that case, your document
does not actually contain REPLACEMENT CHARACTER. You'll have to
identify what those characters actually are, and replace them instead.
Leonard
Leonard
--
You received this message because you are subscribed to the Google Groups "beautifulsoup" group.
To post to this group, send email to beauti...@googlegroups.com.
To unsubscribe from this group, send email to beautifulsou...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beautifulsoup?hl=en.