.extract() handling characters with accent e.g. â & ó?

246 views
Skip to first unread message

MPgoog

unread,
Apr 13, 2011, 8:08:19 PM4/13/11
to scrapy-users
I scape the word Teresópolis and it reads "Teres\u00f3polis." How can
I avoid that?

Pablo Hoffman

unread,
Apr 13, 2011, 11:14:14 PM4/13/11
to scrapy...@googlegroups.com
Hi MPgoog,

The scraped data is fine, that's just how Python represents unicode strings.

You can view the readable representation by printing that unicode string in a
Python shell:

>>> print u"Teres\u00f3polis."
Teres�polis.

On Wed, Apr 13, 2011 at 05:08:19PM -0700, MPgoog wrote:
> I scape the word Teres�polis and it reads "Teres\u00f3polis." How can
> I avoid that?
>
> --
> You received this message because you are subscribed to the Google Groups "scrapy-users" group.
> To post to this group, send email to scrapy...@googlegroups.com.
> To unsubscribe from this group, send email to scrapy-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/scrapy-users?hl=en.

MPgoog

unread,
Apr 14, 2011, 10:14:14 PM4/14/11
to scrapy-users
Pablo,

Thank you very much. You're right, when printing the value, it is
displaying correctly. Now I will just have to figure out how to get
PHP to echo it correctly :P Or maybe its time to test out Django.

Thanks again,
MP

Pablo Hoffman

unread,
Apr 15, 2011, 12:27:18 AM4/15/11
to scrapy...@googlegroups.com
You can encode it to the string, in the encoding that you use in your PHP
pages. For example, if you use utf-8:

byte_string = u"Teres\u00f3polis.".encode('utf-8')

Reply all
Reply to author
Forward
0 new messages