[pmwiki-devel] $UploadNameChars - adding unicode characters

0 views
Skip to first unread message

Simon

unread,
Jul 29, 2019, 4:40:31 AM7/29/19
to PmWiki Devel Mailing List
https://pmwiki.org/wiki/PmWiki/UploadVariables#UploadNameChars


From the page
The set of characters allowed in upload names. Defaults to "-\w. ", which means alphanumerics, hyphens, underscores, dots, and spaces can be used in upload names, and everything else will be stripped.
$UploadNameChars = "-\\w. !"; # allow dash, letters, digits, dots, spaces and exclamations
$UploadNameChars = "-\\w. \\x80-\\xff"; # allow Unicode

Isn't \\x80-\\xff  just extended ASCII?




I'm trying to do this with no effect

  $UploadNameChars = "-\\w. !=\\+#\\x{014C}\\x{014D}"; # allow exclamations, equals, plus, and hash Ōō

any advice appreciated

thanks

Simon


Petko Yotov

unread,
Jul 29, 2019, 5:47:09 AM7/29/19
to pmwiki...@pmichaud.com
On 29/07/2019 10:38, Simon wrote:
> https://pmwiki.org/wiki/PmWiki/UploadVariables#UploadNameChars
> From the page
> The set of characters allowed in upload names. Defaults to "-\w. ",
> which
> means alphanumerics, hyphens, underscores, dots, and spaces can be used
> in
> upload names, and everything else will be stripped.
> $UploadNameChars = "-\\w. !"; # allow dash, letters, digits, dots,
> spaces and exclamations
> $UploadNameChars = "-\\w. \\x80-\\xff"; # allow Unicode
> Isn't \\x80-\\xff just extended ASCII?

If the charset/encoding of your wiki is ISO-8859-1/Latin-1/Windows-1252
or another 8-bit encoding, \x80-\xff are the characters in the code page
between 128 and 255, see
https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout

If you have enabled UTF-8 (variable-length, 8-32 bits/character) for
your wiki, it is a different code page, with characters \x20-\x7f are
the same as in most 8-bit code pages (ASCII) and the others are 2, 3 or
4 bytes for one character but all come from the \x80-\xff range.


> I'm trying to do this with no effect
>
> $UploadNameChars = "-\\w. !=\\+#\\x{014C}\\x{014D}"; # allow
> exclamations, equals, plus, and hash Ōō

Exclamations, equals, plus, and hash is strongly recommended to NOT
enable because these characters have different meanings in URL
addresses, and in PmWiki.

The exclamation sign is a stop-mark for a link, a hash signifies
internal anchor or ajax subpage, plus is the standard encoding of
spaces, and equals start values of URL parameters.

If you do enable these, many other things may and will break, and we
currently don't have the potential to support such configurations.

There is no such thing as \x{014C}, in the UTF-8 encoding these are the
2 bytes \xc5 and \x8c and in your range you would write these
\\xc5\\x8c. The small letter would be \\xc5\\x8d so the range would look
like \\xc5\\x8c\\x8d (no need to repeat \\xc5). If it is not the UTF-8
encoding, it depends if the current code page contains this character,
for example the iso8859-4 code page contains these Ōō characters at
single bytes \xd2 and \xf2:

https://en.wikipedia.org/wiki/ISO/IEC_8859-4

so if your wiki is in iso8859-4 then you could add the range \\xd2\\xf2.
Enabling this could be as easy as adding to config.php

$Charset = "ISO-8859-4";

but your local configuration files, if they contain the international
characters, need to be saved in the same encoding, see:

https://www.pmwiki.org/wiki/PmWiki/LocalCustomizations#encoding

If the international characters are not in the code page of the wiki,
they cannot be enabled, browsers cannot post such files correctly. The 2
characters are not in the Latin-1/iso8859-1 code page.

If this is a vital requirement for file names, you may try enabling
UTF-8 for your wiki, then browsers will be able to both post files and
pages (wikitext, pagenames, categories) with the international
characters without transforming these to HTML entities.

However, moving a wiki to UTF-8 is not easy if you already have uploaded
files with international characters, or pagenames with these, and you
may have some difficulties if the file system of the server is not
Unicode.

Or, you could try enabling some 8-bit encoding which does contain these
characters, but again, if it is not the same as the encoding on your
file system, using a file/ftp browser may not show the correct
characters, and a file uploaded via FTP with such characters in the name
may not be visible on the wiki.

If it is not a fatally important requirement to have these characters in
the filenames on the server, but you are annoyed when people upload
files which appear with broken names, I can suggest a custom
$MakeUploadNamePatterns array that will replace Ōō with Oo in the file
name (not the text inside the file) when a file is uploaded. Enabling
this will probably break existing links in the wiki to already uploaded
files with these characters, and these may need to be renamed.

There is no easy solution unfortunately.

Petko

_______________________________________________
pmwiki-devel mailing list
pmwiki...@pmichaud.com
http://www.pmichaud.com/mailman/listinfo/pmwiki-devel

Simon

unread,
Jul 29, 2019, 6:43:03 AM7/29/19
to Petko Yotov, PmWiki Devel Mailing List
Thankyou very much indeed.
I have an old PmWiki install (no doubt using an 8-bit character set), so I have not dared to change to UTF 8 (I had looked at  https://pmwiki.org/wiki/PmWiki/UTF-8 ). 

I've set   $Charset = "ISO-8859-4"; and added the range \\xd2\\xf2 to no effect, so I'll do more work.

It looks like I have some research and work to do.
The files live on my own (windows) server so I know the server supports these characters in filenames.
The reason is that characters with macrons are part of Māori - an official NZ language, so I want to support it. 

cheers and thanks again

Simon

Reply all
Reply to author
Forward
0 new messages