develooper Front page | perl.dbi.users | Postings from August 2006

Encoding

Thread Next
From:
Daniel Zeman
Date:
August 31, 2006 09:09
Subject:
Encoding
Message ID:
44F70975.4040504@ufal.mff.cuni.cz
Hi,
I wonder if anyone can help me.

I am using
- Debian Linux
- Perl 5.8.8
- DBI (I do not know how to figure out its version)
- MySQL 5.0.22-Debian_3-log

I want to store and handle data in UTF-8 but so far I have not been able 
to force Perl/DBI to do so.

I have created a table using

     my $sql = "CREATE TABLE $tbl (".join(", ", @columns).") CHARACTER 
SET utf8 COLLATE utf8_czech_ci;";
     $dbh->do($sql);

I feeded the table with data using

     my $list_of_columns = join(", ", @names);
     my $list_of_values = join(", ", map{"_utf8'$record->{$_}'"}(@names));
     my $sql = "INSERT INTO $tbl ($list_of_columns) VALUES 
($list_of_values);";
     $dbh->do($sql);

I have looked into the database using phpMyAdmin 2.8.2-Debian-0.1 and it 
really looked like the data were stored in correct UTF-8.

However, when I retrieve the data from Perl/DBI, something in the chain 
(MySQL? the driver? DBI?) decides that another encoding (probably, 
Latin1) would be better for me. It "converts" the strings from UTF-8 to 
that encoding, which means, at the time the data arrives in my Perl 
code, all the non-Latin1 characters have already been irrecoverably 
converted to question marks. I would be happy to decode the data myself 
but there is nothing I can do with the question marks.

I am using the following code to retrieve the data:

     my $sql = "SELECT kod, hry.nazev FROM hry INNER JOIN prodej ON 
hry.kod = prodej.kod_hry GROUP BY kod, hry.nazev";
     my $sqlobj = $dbh->prepare($sql);
     $sqlobj->execute();
     while(my ($kod, $nazev) = $sqlobj->fetchrow_array())
     {
         ...
     }

So far, the only workaround I have, is not to tell the DBI the data is 
UTF-8 when I am inserting it (i.e., drop the "_utf8" part before the 
single quote), and use Encode; decode("utf8", ...) on anything I fetch 
from the database. This way, the database never knows the data was a 
UTF-8 text, treats the bytes as Latin1 characters and returns them 
undisturbed. However, I cannot access the data using phpMyAdmin (unless 
I en/decode UTF in my brain), the string lengths do not reflect the 
reality etc.

Is there a better way to do it? I think there must be some small stupid 
locale-like setting telling the machine that I am a UTF guy. But the 
settings I was able to come up with did not help and I actually have no 
idea which part of the MySQL-driver-DBI-Perl chain is responsible.

Any hints are welcome.
Thanks

Dan

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About