Running a "cut and paste your resume here" form submittal, there are
some odd characters that keep turning up. Right now they are all
replaced with a dash (-), since I'm assuming they're some kind of
bulleted list character. But it would be nice to know exactly what
characters these are supposed to represent. Would there be maybe a
translation chart somewhere?
& # 61623 ;
& # 61608 ;
& # 61656 ;
(Note - there are no spaces in the actual data being received, I just
put some spaces there so the &# wouldn't be possibly interpreted as the
original character before encoding.)
TIA,
Axl
>Running a "cut and paste your resume here" form submittal, there are
>some odd characters that keep turning up. Right now they are all
>replaced with a dash (-), since I'm assuming they're some kind of
>bulleted list character. But it would be nice to know exactly what
>characters these are supposed to represent. Would there be maybe a
>translation chart somewhere?
>
>& # 61623 ;
>& # 61608 ;
>& # 61656 ;
Not a Perl question.
But yes, there are lists available. The numeric code is the Unicode
character code, of which ISO-Latin-1 is a subset. See the FTP site for
<ftp://ftp.unicode.org/Public/>.
Convert the above numbers to 4 digits hex, and look 'em up in the file
available from <ftp://ftp.unicode.org/Public/UNIDATA/NamesList.txt>.
foreach (61623, 61608, 61656) {
printf "%04X\n", $_;
}
However, I get "F0B7", "F0A8", "F0D8" which are in the "private use
area" E000-F8FF. Gee, that ain't much use either. It looks like these
characters are not standardized.
--
Bart.