utf8 colation error

34 views
Skip to first unread message

Adam Weremczuk

unread,
May 14, 2020, 11:01:35 AM5/14/20
to Review Board Community
Hi all,

Following installation guide for MySQL I've added to /etc/mysql/my.cnf

[client]
default-character-set=utf8

[mysqld]
character-set-server=utf8

MariaDB fails to start:

May 14 14:01:41 gittest systemd[1]: Starting MariaDB 10.1.44 database server...
May 14 14:01:41 gittest mysqld[10318]: 2020-05-14 14:01:41 139687784537472 [Note] /usr/sbin/mysqld (mysqld 10.1.44-MariaDB-0+deb9u1) starting as process 10318 ...
May 14 14:01:41 gittest mysqld[10318]: 2020-05-14 14:01:41 139687784537472 [ERROR] COLLATION 'utf8mb4_general_ci' is not valid for CHARACTER SET 'utf8'
May 14 14:01:41 gittest mysqld[10318]: 2020-05-14 14:01:41 139687784537472 [ERROR] Aborting
May 14 14:01:41 gittest systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
May 14 14:01:41 gittest systemd[1]: Failed to start MariaDB 10.1.44 database server.

When I comment out these 2 addition it starts fine and I can retrieve the following:

MariaDB [(none)]> SHOW COLLATION LIKE 'utf8%';
+------------------------------+---------+-----+---------+----------+---------+
| Collation                    | Charset | Id  | Default | Compiled | Sortlen |
+------------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci              | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin                     | utf8    |  83 |         | Yes      |       1 |
| utf8_unicode_ci              | utf8    | 192 |         | Yes      |       8 |
| utf8_icelandic_ci            | utf8    | 193 |         | Yes      |       8 |
| utf8_latvian_ci              | utf8    | 194 |         | Yes      |       8 |
| utf8_romanian_ci             | utf8    | 195 |         | Yes      |       8 |
| utf8_slovenian_ci            | utf8    | 196 |         | Yes      |       8 |
| utf8_polish_ci               | utf8    | 197 |         | Yes      |       8 |
| utf8_estonian_ci             | utf8    | 198 |         | Yes      |       8 |
| utf8_spanish_ci              | utf8    | 199 |         | Yes      |       8 |
| utf8_swedish_ci              | utf8    | 200 |         | Yes      |       8 |
| utf8_turkish_ci              | utf8    | 201 |         | Yes      |       8 |
| utf8_czech_ci                | utf8    | 202 |         | Yes      |       8 |
| utf8_danish_ci               | utf8    | 203 |         | Yes      |       8 |
| utf8_lithuanian_ci           | utf8    | 204 |         | Yes      |       8 |
| utf8_slovak_ci               | utf8    | 205 |         | Yes      |       8 |
| utf8_spanish2_ci             | utf8    | 206 |         | Yes      |       8 |
| utf8_roman_ci                | utf8    | 207 |         | Yes      |       8 |
| utf8_persian_ci              | utf8    | 208 |         | Yes      |       8 |
| utf8_esperanto_ci            | utf8    | 209 |         | Yes      |       8 |
| utf8_hungarian_ci            | utf8    | 210 |         | Yes      |       8 |
| utf8_sinhala_ci              | utf8    | 211 |         | Yes      |       8 |
| utf8_german2_ci              | utf8    | 212 |         | Yes      |       8 |
| utf8_croatian_mysql561_ci    | utf8    | 213 |         | Yes      |       8 |
| utf8_unicode_520_ci          | utf8    | 214 |         | Yes      |       8 |
| utf8_vietnamese_ci           | utf8    | 215 |         | Yes      |       8 |
| utf8_general_mysql500_ci     | utf8    | 223 |         | Yes      |       1 |
| utf8_croatian_ci             | utf8    | 576 |         | Yes      |       8 |
| utf8_myanmar_ci              | utf8    | 577 |         | Yes      |       8 |
| utf8_thai_520_w2             | utf8    | 578 |         | Yes      |       4 |
| utf8mb4_general_ci           | utf8mb4 |  45 | Yes     | Yes      |       1 |
| utf8mb4_bin                  | utf8mb4 |  46 |         | Yes      |       1 |
| utf8mb4_unicode_ci           | utf8mb4 | 224 |         | Yes      |       8 |
| utf8mb4_icelandic_ci         | utf8mb4 | 225 |         | Yes      |       8 |
| utf8mb4_latvian_ci           | utf8mb4 | 226 |         | Yes      |       8 |
| utf8mb4_romanian_ci          | utf8mb4 | 227 |         | Yes      |       8 |
| utf8mb4_slovenian_ci         | utf8mb4 | 228 |         | Yes      |       8 |
| utf8mb4_polish_ci            | utf8mb4 | 229 |         | Yes      |       8 |
| utf8mb4_estonian_ci          | utf8mb4 | 230 |         | Yes      |       8 |
| utf8mb4_spanish_ci           | utf8mb4 | 231 |         | Yes      |       8 |
| utf8mb4_swedish_ci           | utf8mb4 | 232 |         | Yes      |       8 |
| utf8mb4_turkish_ci           | utf8mb4 | 233 |         | Yes      |       8 |
| utf8mb4_czech_ci             | utf8mb4 | 234 |         | Yes      |       8 |
| utf8mb4_danish_ci            | utf8mb4 | 235 |         | Yes      |       8 |
| utf8mb4_lithuanian_ci        | utf8mb4 | 236 |         | Yes      |       8 |
| utf8mb4_slovak_ci            | utf8mb4 | 237 |         | Yes      |       8 |
| utf8mb4_spanish2_ci          | utf8mb4 | 238 |         | Yes      |       8 |
| utf8mb4_roman_ci             | utf8mb4 | 239 |         | Yes      |       8 |
| utf8mb4_persian_ci           | utf8mb4 | 240 |         | Yes      |       8 |
| utf8mb4_esperanto_ci         | utf8mb4 | 241 |         | Yes      |       8 |
| utf8mb4_hungarian_ci         | utf8mb4 | 242 |         | Yes      |       8 |
| utf8mb4_sinhala_ci           | utf8mb4 | 243 |         | Yes      |       8 |
| utf8mb4_german2_ci           | utf8mb4 | 244 |         | Yes      |       8 |
| utf8mb4_croatian_mysql561_ci | utf8mb4 | 245 |         | Yes      |       8 |
| utf8mb4_unicode_520_ci       | utf8mb4 | 246 |         | Yes      |       8 |
| utf8mb4_vietnamese_ci        | utf8mb4 | 247 |         | Yes      |       8 |
| utf8mb4_croatian_ci          | utf8mb4 | 608 |         | Yes      |       8 |
| utf8mb4_myanmar_ci           | utf8mb4 | 609 |         | Yes      |       8 |
| utf8mb4_thai_520_w2          | utf8mb4 | 610 |         | Yes      |       4 |
+------------------------------+---------+-----+---------+----------+---------+
59 rows in set (0.00 sec)

I've replaced utf8 with utf8mb4 in my.cf and MariaDB is now starting fine.

Have I done the right thing?

Shall the installation documentation be updated?

Thanks,
Adam

Adam Weremczuk

unread,
May 15, 2020, 8:31:38 AM5/15/20
to Review Board Community
I don't think utf8mb4 was a good idea and I believe it's now leading to:
sudo rb-site install /var/www/mysite
(...)
* Installing the site...
(...)
Creating table scmtools_repository

[!] There was an error synchronizing the database. Make sure the
    database is created and has the appropriate permissions, and then
    continue.
[!] Details: (1071, 'Specified key was too long; max key length is 767
    bytes')

Press Enter to continue

Christian Hammond

unread,
May 17, 2020, 11:39:46 PM5/17/20
to revie...@googlegroups.com
Hi Adam,

Yeah... Here's the situation with MySQL/MariaDB and "utf8".

When MySQL introduced utf8 charset, they went with a sort of "compressed" version of UTF-8 that excluded bits for some character ranges (I am super simplifying this). Emojis and some other character ranges didn't exist at the time, and now cannot be represented by their "utf8".

utf8mb4 is the "real" UTF-8 charset type. However, it's not a drop-in replacement. It affects key lengths, amongst other things, and is incompatible with, well, many things.

There is a way to get true UTF-8 support. It requires utf8mb4, and a handful of global settings applied to the server to enable large keys and a different InnoDB file format. It then requires a special command to be set at the beginning of each MySQL/MariaDB session to opt into some better support.

Basically, it's invasive and not something that we can currently tell people to enable, or it'll cause new problems. It also requires full table rebuilds. The instructions also depend on the version of MySQL/MariaDB.

We plan to bake in some level of support for it in Review Board in the future, but Django doesn't natively support it, and it'll require a bunch of special logic to rebuild data.

I can't currently provide the settings you may need, because many of them are dependent on the version of MySQL/MariaDB you're using, and I haven't verified them lately (just working off internal notes). It boils down to:

1) Using utf8mb4 charsets for all databases, tables, and connections/sessions
2) Using utf8mb4_bin collation for all the above
3) Enabling innodb_large_prefix and innodb_per_table (might depend on the versions of MySQL/MariaDB)
4) Enabling innodb_file_format=barracuda (not needed on modern versions)

This is not an exhaustive step-by-step.

PostgreSQL will do UTF-8 by default, fwiw.

Hoping to revisit this support in MySQL/MariaDB after RB4 wraps up. Should be easier now that MySQL/MariaDB have made progress in this area, and I need to update my knowledge of what that progress looks like.

Christian


--
Supercharge your Review Board with Power Pack: https://www.reviewboard.org/powerpack/
Want us to host Review Board for you? Check out RBCommons: https://rbcommons.com/
Happy user? Let us know! https://www.reviewboard.org/users/
---
You received this message because you are subscribed to the Google Groups "Review Board Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reviewboard...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/reviewboard/a02fb57b-6547-4d43-a028-4e8706a42860%40googlegroups.com.


--
Christian Hammond
President/CEO of Beanbag
Makers of Review Board

Adam Weremczuk

unread,
May 18, 2020, 8:23:53 AM5/18/20
to revie...@googlegroups.com
Hi Christian,

Thank you for a detailed and thorough reply.

Since it's a fresh installation I've opted for mysql-server 5.7.30-1debian9 and utf8.

Would expect any issues with it?

Thanks,
Adam


Christian Hammond

unread,
May 18, 2020, 8:15:41 PM5/18/20
to revie...@googlegroups.com
It should work, outside of Emojis (since that's sort of the main pain point with MySQL's "UTF-8" support).

We do have sort of fake Emojis supported in Markdown mode, like :+1: and such, which will render to Emojis. Just the Unicode characters won't work on MySQL with that charset.

Christian

Reply all
Reply to author
Forward
0 new messages