My issue can be reduced to a difference in the submitted form data compared
with the fixed typed-in string in my perl code, although both flavors of
UTF-8 characters appear identical in a browser window through Perl.
One works to email and the other does not. For example, I test with a simple
HTML form submit:
<!DOCTYPE html>
<html><head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<form ENCTYPE="multipart/form-data" method="post" action="
compare.pl">
<input type="text" name="subject" size="30" value="μερικές ελληνικές
λέξεις">
<input type="submit" value="Submit">
</form>
</body>
</html>
And it's submitted to the following
compare.pl script:
#!/usr/bin/perl -w
use CGI;
use utf8;
use Email::MIME::RFC2047::Encoder;
my $fixed_subject;
# Only the following passed directly through an email
# subject intact:
$fixed_subject = "μερικές ελληνικές λέξεις";
my $query = new CGI;
# This value will display correctly in a web browser
# but not after having been sent in a subject
# line of an email via Mime-Lite or other:
my $submitted_subject = $query->param('subject');
# The following $utf8_encoded_submitted_subject will not display correctly
# in a browser or email subject line:
my $utf8_submitted_subject_encoder = Email::MIME::RFC2047::Encoder->new;
my $utf8_encoded_submitted_subject = $utf8_submitted_subject_encoder-
>encode_text($submitted_subject);
print "Content-type: text/html\n\n";
print "<!DOCTYPE html>\n";
print "<html><head>\n";
print "<title>Compare</title>\n";
print "<meta http-equiv=\"Content-Type\" content=\"text/html;
charset=utf-8\">\n";
print "</head>\n";
print "<body>\n";
print "\$fixed_subject: $fixed_subject\n";
print "<hr>";
print "\$submitted_subject: $submitted_subject\n";
print "<hr>";
print "\$utf8_encoded_submitted_subject: $utf8_encoded_submitted_subject\n";
print "</body></html>\n";
I leave out the email code here but as said the $fixed_subject typed
directly into the perl code works in a subject line of a mail transmission
through Mime::Lite or Mail::Sender while the $submitted_subject that was
corrected as a form value through CGI does not.
What exactly has happens to $submitted_subject in the process and how can it
be made identical to the $fixed_subject string?
In a browser, the $fixed_subject prints as:
μερικές ελληνικές λέξεις
And the $submitted_subject prints the same:
μερικές ελληνικές λέξεις
The $utf8_encoded_submitted_subject prints as:
=?utf-8?Q?=c3=8e=c2=bc=c3=8e=c2=b5=c3=8f=c2=81=c3=8e=c2=b9=c3=8e=c2=ba?=
=?utf-8?Q?=c3=8e=c2=ad=c3=8f=c2=82_=c3=8e=c2=b5=c3=8e=c2=bb=c3=8e=c2=bb?=
=?utf-8?Q?=c3=8e=c2=b7=c3=8e=c2=bd=c3=8e=c2=b9=c3=8e=c2=ba=c3=8e=c2=ad?=
=?utf-8?Q?=c3=8f=c2=82_=c3=8e=c2=bb=c3=8e=c2=ad=c3=8e=c2=be=c3=8e=c2=b5?=
=?utf-8?Q?=c3=8e=c2=b9=c3=8f=c2=82?=
If I send the "μερικές ελληνικές λέξεις" characters in the subject of an
email using Thunderbird, they displays fine in the email program.
Thunderbird's specific subject line source code appears as follows:
=?UTF-8?B?zrzOtc+BzrnOus6tz4IgzrXOu867zrfOvc65zrrOrc+CIM67zq3Ovs61?=
=?UTF-8?B?zrnPgg==?=
The source of the $fixed_subject line of the perl generated mail looks as
follows:
=?utf-8?Q?=ce=bc=ce=b5=cf=81=ce=b9=ce=ba=ce=ad=cf=82_=ce=b5=ce=bb=ce=bb?=
=?utf-8?Q?=ce=b7=ce=bd=ce=b9=ce=ba=ce=ad=cf=82_=ce=bb=ce=ad=ce=be=ce=b5?=
=?utf-8?Q?=ce=b9=cf=82?=
So the $fixed_subject displays fine. How can the $submitted_subject string
be be made or preserved identical? After all, it's the same set of
characters but with somewhat different encoding or copying in perl I guess.
Thanks in advance for any suggestions.
Tuxedo