Defining a UCA Collation for the utf8 chrset and using it for an innodb fulltext index in Percona server 5.6

39 views
Skip to first unread message

Armin Hopp

unread,
Jun 14, 2016, 6:52:03 PM6/14/16
to Percona Discussion
Hey folks,

I need an innodb fulltext index that treats some characters different from how they are treated by default. To be exact I want the "&" (ampersand u0026) to be treated like a character e.g. like an "a" (u0061).


And set up a UCA collation in /usr/share/mysql/charsets/Index.xml like: 

  1. <charset name="utf8">
  2.  <family>Unicode</family>
  3.  <description>UTF-8 Unicode</description>
  4.  <alias>utf-8</alias>
  5.  <collation name="utf8_general_ci" id="33">
  6.   <flag>primary</flag>
  7.   <flag>compiled</flag>
  8.  </collation>

    <!-- my code starts here -->
  9.  <collation name="utf8_withampersand_ci" id="1024">
  10.    <rules>
  11.      <reset>a</reset>
  12.      <i>\u0026</i> <!-- ampersand -->
  13.    </rules>
  14.  </collation>
  15. <!-- my code ends here -->

     
    <collation name="utf8_bin" id="83">
  16.    <flag>binary</flag>
  17.    <flag>compiled</flag>
  18.  </collation>
  19. </charset>

First question:
How do I define the base collation of an UCA collation? Since it is nowhere given in the example I assume the primary collation of the charset is used as base, is that correct?

Defining the collation like this and restarting mysql made the collation available. 
I can list it using SHOW COLLATION LIKE 'utf8\_%';
I can use it for collumns 

-- Example table using the custom collation
CREATE TABLE `fulltext_search` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `search` varchar(255) CHARACTER SET utf8 COLLATE utf8_check24_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `ft_search` (`search`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Alas the collation has no effect.

More questions:
What am I missing?
Is it even possible to control the collation of the fulltext index for innodb this way?
Is my LDML to change the behaviour of the ampersand correct at all?

Thanks in advance

Regards
Armin




Reply all
Reply to author
Forward
0 new messages