The data is in a unicode table in SQL Server, I believe SQL Server
stores it as UCS-2. I'm trying to read it out of the database and
write it to a file in UTF-8 (I want to do more with it but I'd be
happy just getting that to work for now!). I'm running ActivePerl
5.8.6 on Windows 2000.
Here's the simple test script I'm trying to get working:
use strict;
use utf8;
use Encode;
use DBI;
use DBD::ADO;
use Win32::OLE;
# Change the Code Page to UTF-8
Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);
# Connection string
my $strConn = "Provider=SQLOLEDB; Data Source=ISVR01; Initial
Catalog=Wibble; User ID=Wobble; Password=plop";
my $dbConn;
my $prCategori;
my $strSQL;
# Open the connection
$dbConn = DBI->connect("DBI:ADO:$strConn") or die $DBI::errstr;
print "Connection opened.\n";
# Just something to test - yes I know I should use stored procedures
$strSQL = "SELECT * FROM CATEGORIES WHERE CAT_ID = 3672";
# Execute the SQL
$prCategori = $dbConn->prepare($strSQL);
$prCategori->execute();
# Open the test file
open(COFNOD, ">", "prawf.log") or die "ERROR! Can't open the log
file\n";
# Set to use UTF-8
binmode(COFNOD, ":utf8");
# Write the BOM - not really needed for UTF-8 but Windows likes to
have it
print COFNOD "\x{FEFF}";
# Print some test characters to see if they work - w and y with
circumflexes
print COFNOD "\x{0175}\x{0177}\n";
while ( my $rsCategori = $prCategori->fetchrow_hashref )
{
print COFNOD "Disgrifiad: ";
print COFNOD $rsCategori->{DESCRIPTION};
print COFNOD "\n";
}
close COFNOD;
$dbConn->disconnect();
The text returned from the field should be:
Ardal o Harddwch Naturiol Eithriadol Llŷn
[the last word should be Llyn with a ^ over the y]
What I actually get is:
Ardal o Harddwch Naturiol Eithriadol Llŷn
Running a hex editor on the file shows that the bytes where the ŷ
should be are C3 85 C2, but I believe the UTF-8 bytes should be C5 B7.
The test characters the script prints with \x{} work fine so somehow
the data from the field seems to be wrong. When I run the script
without setting the CP it just gets translated to latin1, losing the
circumflex.
use strict;
use utf8;
use Encode;
use DBI;
use DBD::ADO;
use Win32::OLE;
# Change the Code Page to UTF-8
Win32::OLE->Option(CP => Win32::OLE::CP_UTF8);
# Connection string
my $strConn = "Provider=SQLOLEDB; Data Source=ISVR01; Initial
Catalog=wibble; User ID=wobble; Password=plop";
my $dbConn;
my $prCategori;
my $strSQL;
my $fieldname;
# Open the connection
$dbConn = DBI->connect("DBI:ADO:$strConn") or die $DBI::errstr;
print "Connection opened.\n";
# Just something to test - yes I know I should use stored procedures
$strSQL = "SELECT * FROM CATEGORIES WHERE CAT_ID = 3672";
# Execute the SQL
$prCategori = $dbConn->prepare($strSQL);
$prCategori->execute();
# Open the test file
open(COFNOD, ">", "prawf.log") or die "ERROR! Can't open the log
file\n";
# Set to use UTF-8
binmode(COFNOD, ":utf8");
# Write the BOM - not really needed for UTF-8 but Windows likes to
have it
print COFNOD "\x{FEFF}";
# Print some test characters to see if they work - w and y with
circumflexes
print COFNOD "\x{0175}\x{0177}\n";
# Get the field names
my @fieldnames = @{ $prCategori->{NAME} };
while ( my $rsCategori = $prCategori->fetchrow_hashref )
{
# for each field set the utf-8 flag
foreach $fieldname ( @fieldnames )
{
Encode::_utf8_on($rsCategori->{$fieldname});