Folding not UTF-8 aware

18 views
Skip to first unread message

Fjan

unread,
May 27, 2009, 4:49:36 AM5/27/09
to RiCal, rick.d...@gmail.com
Hi Rick,

I wrote an RFC2445 implementation myself (for www.supersaas.com) which
is not nearly as nice as yours, so 'm looking to replace part of it
with your hard work.

On thing I noticed so far is that your "FoldingStream" class simply
inserts a new line after every 72 bytes to comply with the folding
requirement. The problem with that is that it has a large change of
splitting a multi-byte UTF-8 character right in the middle for non-US
languages. I can tell you from experience that most iCal software will
then convert that into garbage characters.

There are two solutions:
1 - use something like the ActiveSupport String::mb_chars.insert(50,"\r
\n "). Note that you cannot simply insert at character 78 since a
character can take multiple bytes. Not very elegant and very compute
intensive to do it right.
2 - simply don't wrap at all, I have tested it with a several iCal
aware programs and none of them seem to mind. By the way, the RFC says
to insert a CRLF instead of the '/n' you insert and none of them seem
to mind either. Also, Outlook and Apple's iCal don't seem to wrap
properly either so you're probably safe.

Cheers,
Jan

RubyRedRick

unread,
May 27, 2009, 10:41:33 AM5/27/09
to RiCal
On May 27, 4:49 am, Fjan <jmfa...@gmail.com> wrote:
> Hi Rick,
>
> I wrote an RFC2445 implementation myself (forwww.supersaas.com) which
> is not nearly as nice as yours, so 'm looking to replace part of it
> with your hard work.
>
> On thing I noticed so far is that your "FoldingStream" class simply
> inserts a new line after every 72 bytes to comply with the folding
> requirement. The problem with that is that it has a large change of
> splitting a multi-byte UTF-8 character right in the middle for non-US
> languages. I can tell you from experience that most iCal software will
> then convert that into garbage characters.
>
> There are two solutions:
> 1 - use something like the ActiveSupport String::mb_chars.insert(50,"\r
> \n "). Note that you cannot simply insert at character 78 since a
> character can take multiple bytes. Not very elegant and very compute
> intensive to do it right.

Actually, I think this can be done relatively cheaply by using
String.unpack("U*"), and scanning back to find the split point based
on the code points. I'll have a look at that.

> 2 - simply don't wrap at all, I have tested it with a several iCal
> aware programs and none of them seem to mind. By the way, the RFC says
> to insert a CRLF instead of the '/n' you insert and none of them seem
> to mind either. Also, Outlook and Apple's iCal don't seem to wrap
> properly either so you're probably safe.

Well, I put in the wrapping because Adam, who was focussed on
exporting calendars, pointed out that I wasn't doing it. I assumed
that he WAS having problems getting it accepted. Maybe he'll see this
an comment.

RubyRedRick

unread,
May 27, 2009, 12:55:52 PM5/27/09
to RiCal
Okay,

I opened a ticket for this.
http://rick_denatale.lighthouseapp.com/projects/30941/tickets/11-export-folding-is-not-utf-8-safe

And I've committed tests, and a fix to github.

The basic approach I took was to split the string, and use
String.unpack("U") to determine if the remaining string is valid
UTF-8. If not i back up one and try again.

The specs show that it works for 1, 2, and 3 byte utf-8 characters.
I'm not sure that 4 byte UTF-8 characters really exist in the wild,
but if someone want's to provide one I'd be happy to add the specs

aiwilliams

unread,
May 28, 2009, 8:31:26 AM5/28/09
to RiCal
It is important to Outlook. It complains A LOT when the folding is
done wrong. It does do the import, but the warnings scare our
customers.
Reply all
Reply to author
Forward
0 new messages