[pmwiki-devel] lower casing diacritics

0 views
Skip to first unread message

Simon

unread,
Oct 18, 2022, 12:48:09 AM10/18/22
to PmWiki Devel Mailing List
Can anyone suggest a means of converting diacritic characters to lower case,
e.g. Ā to ā, Ê to ê, Į to į, etc
other than creating a translation table?

thanks

Simon

Petko Yotov

unread,
Oct 18, 2022, 2:19:23 AM10/18/22
to Simon, PmWiki Devel Mailing List
You can use mb_strtolower():

https://php.net/mb_strtolower

Here is an example from the PHP interactive shell:

php > $str = "e.g. Ā to ā, Ê to ê, Į to į, etc";
php > print_r(mb_strtolower($str));
e.g. ā to ā, ê to ê, į to į, etc
php > print_r(mb_strtoupper($str));
E.G. Ā TO Ā, Ê TO Ê, Į TO Į, ETC

Petko

--
If you upgrade : https://www.pmwiki.org/Upgrades


On 18/10/2022 06:46, Simon wrote:
> Can anyone suggest a means of converting diacritic [1]characters to


> lower case,
> e.g. Ā to ā, Ê to ê, Į to į, etc
> other than creating a translation table?
>
> thanks
>
> Simon
>
>
>

> Links:
> ------
> [1] https://en.wikipedia.org/wiki/Diacritic
> _______________________________________________
> pmwiki-devel mailing list
> pmwiki...@pmichaud.com
> http://www.pmichaud.com/mailman/listinfo/pmwiki-devel

_______________________________________________
pmwiki-devel mailing list
pmwiki...@pmichaud.com
http://www.pmichaud.com/mailman/listinfo/pmwiki-devel

Simon

unread,
Oct 18, 2022, 6:19:20 AM10/18/22
to Petko Yotov, PmWiki Devel Mailing List
Again, thanks heaps for answering these newbie questions, that works. 
What I think I have found is that while html_entity_decode('Ē') gives "Ē"
htmlentities ("Ē") doesn't convert Ē back to  Ē

Simon


Petko Yotov

unread,
Oct 18, 2022, 6:39:46 AM10/18/22
to Simon, PmWiki Devel Mailing List
You may be able to use:

$entity = mb_convert_encoding($decoded, 'HTML');


You may or may not need to specify a $from_encoding argument. From the
documentation it seems before PHP 8.0 $from_encoding was required.
Documentation:

https://php.net/mb_convert_encoding

Petko

Simon

unread,
Oct 19, 2022, 4:35:03 AM10/19/22
to Petko Yotov, PmWiki Devel Mailing List
Some background.
I am trying to update the SearchCloud recipe.

The recipe grabs the q parameter of a search action.

I want it to 
* make the search terms insensitive
* handle characters with diacritics.

Here is some debug output

2022-10-19 21:26:01 
q="SĀÉÎÖŬ-àęiøűd" 
$SCrq="SĀÉÎÖŬ-àęiøűd" 
tkey1="SÄ€ÉÎÖŬ-àÄ™iøűd" 
tkey2="sÄ ???Å­-?Ä™i?űd" 
tkey3="sÄ ???Å­-?Ä™i?űd"

Generated from debug code
      $convmap = array (0x80, 0xffff);
      $q     = strval($_REQUEST['q']); # get search term
      $SCrq  = trim (\stripmagic($q));
      $tkey1 = html_entity_decode($SCrq); # remove html entities to allow lower case conversion
      $tkey2 = mb_strtolower($tkey1); # convert to lower case
      $tkey3 = mb_encode_numericentity ($tkey2, $convmap); # convert non-ascii to htmlentities
      $fwritestatus = fwrite($logfilehandle, $logfiletime
      . 'q="' . $q
      . '" $SCrq="' . $SCrq
      . '" tkey1="' . $tkey1 
      . '" tkey2="' . $tkey2 
      . '" tkey3="' . $tkey3 . '"'

As you can see in the debug output it seems to fall apart at tkey2.
I'd welcome more suggestions


Dominique Faure

unread,
Oct 22, 2022, 5:52:13 AM10/22/22
to Simon, Petko Yotov, PmWiki Devel Mailing List
You should perhaps specify the 'UTF-8' encoding to the mb_strtolower call.
Reply all
Reply to author
Forward
0 new messages