Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Character Set Conversions

0 views
Skip to first unread message

sof...@gmail.com

unread,
Aug 3, 2006, 5:03:40 AM8/3/06
to
I have a PHP routine which parses my incoming emails and extracts
certain key data from the body text.

This works fine normally, but occasionally an email may contain the
name and address of somebody in Europe, and when this happens the
character set of the email changes from "us-ascii" to "ISO 8859-1" and
this confuses my parsing code, because I get characters like =E4 and
=E5 for some of the accented european characters in people's name or
address, and where there would normally be a row of "equals" characters
which is used as a separator line in the email, I get =3D=3D=3D etc.

The data is being saved into a mySQL database, so I want the name and
address to look correct when it is printed (ie the =E4 should be
printed as the correct accented european character).

I've looked into the various PHP functions to unencode strings etc, as
my ideal plan would be to convert the incoming text strings before my
parser examines them, but I'm confused about which function to use.

Andy

Alvaro G. Vicario

unread,
Aug 3, 2006, 1:39:47 PM8/3/06
to
*** sof...@gmail.com escribió/wrote (3 Aug 2006 02:03:40 -0700):
> This works fine normally, but occasionally an email may contain the
> name and address of somebody in Europe, and when this happens the
> character set of the email changes from "us-ascii" to "ISO 8859-1" and
> this confuses my parsing code, because I get characters like =E4 and
> =E5 for some of the accented european characters in people's name or
> address, and where there would normally be a row of "equals" characters
> which is used as a separator line in the email, I get =3D=3D=3D etc.

This cute function was published in this group some weeks ago:

<?

function quoted_word_callback($m) {
switch($m[2]) {
case 'Q': case 'q': return quoted_printable_decode($m[3]);
case 'B': case 'b': return base64_decode($m[3]);
}
}

$s = "OLED & =?ISO-8859-1?Q?br=E4nsleceller?=";
echo preg_replace_callback('/=\?(.*)\?([BQ])\?(.*)\?=/U',
'quoted_word_callback', $s);

?>

Hope it helps.

http://groups.google.com/group/comp.lang.php/msg/1687691dd266c990

--
-+ http://alvaro.es - Álvaro G. Vicario - Burgos, Spain
++ Mi sitio sobre programación web: http://bits.demogracia.com
+- Mi web de humor con rayos UVA: http://www.demogracia.com
--

0 new messages