Hello,
Using a MySQL database keeping Unicode strings under control turned out to be a bit of a challenge. I could have sworn that
character encoding and collation are set up correctly, i.e.
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8 | utf8_general_ci |
+--------------------------+----------------------+
Alas, that turned out not to be the case. So now I’m looking at a few databases where random tables and columns are back to latin1 character set. Trying to get to a solution that takes a db, checks & switches its charset/encoding then iterates over all tables, checks & switches their charset/encoding then iterates over the columns and (for appropriate types) checks & switches their charset/encoding.
I can do that by issuing SQL statements directly:
SELECT @@character_set_database, @@collation_database;
then if need be
ALTER DATABASE db_name CHARACTER SET "utf8" COLLATE "utf8_general_ci";
Similarly for tables (
link) and table columns (
link).
However, I wonder if there is a more elegant way with SQLAlchemy than issuing SQL directly?
I have also noticed that some string types use
Unicode (which maps to
varchar) and others use
UnicodeText (which maps to
mediumtext). I vaguely remember that there were size constraints, I think the maximum of 65k across all columns of a table whereas mediumtext doesn’t have that 65k size constraint? Disregarding that, converting from UnicodeText to Unicode should not impact the value, right? While cleaning up the character set encoding and collation, I thought that it may be good to get the string types right.
Much thanks!
Jens