Handle character entities in email (e.g. for WordPress)

11 views
Skip to first unread message

scarpent

unread,
Nov 22, 2006, 9:05:03 PM11/22/06
to Upgrades, Wishlists, Enhancements
WordPress encodes characters in posts, for example:

' = ‘
" = ”
-- = – (en dash, e.g. a--b)
-- = — (em dash, e.g. a -- b)

(Unicode encoding)

But when FeedBurner sends out an email for a post, it strips out all these encoded characters, making me look semi-illiterate when something like:

Jack said, "You shouldn't use sparrows--especially European ones--to carry your coconuts."

Becomes:

Jack said, You shouldnt use sparrows especially European ones to carry your coconuts.

My literacy is questionable enough without that kind of help! :-)

Someone from FB acknowledged the issue a while back and mentioned they'd put it on the list. Is there any chance it will bubble up to the top anytime in the near future?

Thanks!

Scott

joek

unread,
Nov 23, 2006, 9:30:35 AM11/23/06
to Upgrades, Wishlists, Enhancements
I'll see if I can get someone smarter than me to get in here and answer this question...

Eric

unread,
Nov 28, 2006, 10:18:27 AM11/28/06
to Upgrades, Wishlists, Enhancements
Hi there Scott ... I'm not any smarter than Joe, but I'd like to not see this issue fall through the cracks. Would you mind giving me two pieces of information: 1) your feed URL, and 2) which fb person you communicated with in the past? We'll see if we can figure this out.

scarpent

unread,
Nov 28, 2006, 4:35:21 PM11/28/06
to Upgrades, Wishlists, Enhancements
Hi, Eric. Thanks for your reply (and to you also, Joe!).

My feed is:

http://feeds.feedburner.com/MovingToFreedom

Arun Kannan from FB wrote me this on Sept. 24:

"Yes as you have observed we do strip out those character codes. But, your suggestion is valid and we will put it on our TODO list. Thanks for your suggestion."

So I didn't take that as a promise for anything to happen soon. It wasn't until I finally got an email subscriber recently that I started thinking about it again.

Another thing that I had mentioned in the email but forgot to say above is that this only happens for plain text emails. In HTML (e.g. Google Mail) the character codes are there and display fine.

(Thanks again -- you guys do great work.)

Scott

Eric

unread,
Nov 29, 2006, 3:50:18 PM11/29/06
to Upgrades, Wishlists, Enhancements
Thanks Scott ... so I think the main issue is that we have to convert HTML entities into reasonable plain text representations. Gotcha. Arun does have this on a todo list, so we won't lose track of it.

Thanks for using FeedBurner!

ClearDebt

unread,
Aug 12, 2007, 11:12:49 AM8/12/07
to Upgrades, Wishlists, Enhancements
I am also having problems with FeedBurner stripping encoded characters from my feed.

My feed at http://www.cleardebt.co.uk/news/feed uses WordPress which after some tweaking encodes characters in both the item description and title - e.g. ‘ becomes ‘

FeedBurner seems to strip these encoded tags in an item title but not in an item description meaning:
Credit cards ‘hugely profitable’ for lenders
becomes
Credit cards ‘hugely profitable’ for lenders

This is giving me a huge headache is both displaying and parsing my feed! At one point the ‘ symbol was coming through as ? or as crazy symbols like Ä€~

The Feed is correct as the validation tool http://feedvalidator.org/check.cgi?url=http%3a%2f%2ffeeds.feedburner.com%2fcleardebt shows the encoded characters.

As a workaround I am currently pointing my RSS parser at http://www.cleardebt.co.uk/news/wp-feed.php avoiding Feedburner on /feed (which goes to the same location) and it is working perfectly - although this is not ideal as I can't track it and i'm worried other users wanting to use the ClearDebt feed will have the same problem!

Regards,
Peter

scarpent

unread,
Aug 12, 2007, 11:54:22 AM8/12/07
to Upgrades, Wishlists, Enhancements
Sounds like a different issue -- maybe would be a good idea to start a new thread for it.

For the email handling, all of the HTML character entities are still being stripped out. Is this still on a todo list out there? Any estimated dates for a fix/enhancment?

Thanks!

arunk

unread,
Aug 15, 2007, 10:48:04 AM8/15/07
to Upgrades, Wishlists, Enhancements
Scarpent,
You had mentioned that html character entities are being stripped in the text version of the email, can you let me know which email client you had used to view the text version ? Can you also tell us a particular post that shows this behavior so that we can investigate it further.

Thanks,

scarpent

unread,
Aug 15, 2007, 6:23:24 PM8/15/07
to Upgrades, Wishlists, Enhancements
Hi, Arun.

I use Thunderbird for email. All posts have the html character entities stripped out for the plain text email.

Thanks,

Scott
Reply all
Reply to author
Forward
0 new messages