I am looking at TiddlyWiki as a possible replacement for my current
homepage. However, I have one small problem:
I tried TiddlyWiki and found it has no problems with non-Ascii Tiddler
names, e.g. "FrantišekFuka" (the character after "i" is non-Ascii).
However, "FrantišekFuka" is not automatically recognized as a WikiWord
("FrantisekFuka" is) so I have to use double square brackets all the
time.
Can I somehow change the configuration so that WikiWords are
automatically defined as "String of non-whitespace Unicode characters
containing at least one uppercase letter in the middle of the string"
instead of the current definition ("String of non-whitespace ASCII
characters...")?
If this is not configurable, can you at least point me to the relevant
part of Javascript which handles this? It seems to me it shouldn't be
hard to change the relevant code myself.
Thanks.
var upperLetter = "[A-Z\u00c0-\u00de\u0150\u0170]";
var lowerLetter = "[a-z\u00df-\u00ff_0-9\\-\u0151\u0171]";
var anyLetter =
"[A-Za-z\u00c0-\u00de\u00df-\u00ff_0-9\\-\u0150\u0170\u0151\u0171]";
and then a couple of lines further down there's:
var wikiNamePattern = "(~?)((?:" + upperLetter + "+" + lowerLetter +
"+" + upperLetter + anyLetter + "*)|(?:" + upperLetter + "{2,}" +
lowerLetter + "+))";
So, the rules at the moment are that a Wiki word has two possibilities:
- one or more upper case letters, followed by one or more lower case
letters, followed by one or more lower case letters, followed by any
combination of upper and lower case
- two or more upper case letters followed by one or more lower case letters
And the definitions of what constitutes upper and lower case lies in
those definitions for upperLetter and lowerLetter. I'd be delighted to
extend them to be more complete - I just need to know what extra
characters are required.
From earlier discussions, though, I have gathered that there are some
tricky letters that are considered to be of a different case in
different languages. If that's true, it may never be possible to
arrive at a definitive, language independent definition.
Cheers
Jeremy
--
Jeremy Ruston
mailto:jer...@osmosoft.com
http://www.tiddlywiki.com
I looked at the Unicode table here:
http://free.prohosting.com/~vitivas/js/UniCode/CharTab.html
It's rather messy. The uppercase characters C0-DD correspond to
lowercase characters E0-FD (e.g. C5 is uppercase version of E5
lowercase character). Then, from 100 to 233, even characters are
uppercase equivalents of odd lowercase character (e.g. 10E is
uppercase, 10F is the same character lowercase). Using these rules
should take care of all languages I know.
Note that I am not saying everything between C0 and 233 are existing
European letters. If you implement the rule in the paragraph above, it
would mean that some special non-letter characters (e.g. 1c0 to 1c3)
would be incorrectly recognized as uppercase/lowercase letters. But I
think it's a small price to pay if there is not any existing
"isUppercase?()" function for unicode characters.
Or one Tiddler called "ExtraLocalCharacters" which would list all the
characters in 2 lines like this:
ešcržýáíé
EŠCRŽÝÁÍÉ
You can find my related thread / post here:
http://groups.google.co.hu/group/TiddlyWiki/tree/browse_frm/thread/4ab10453ff0ef077/b5faeb8102dac85a?rnum=21&hl=hu&q=hungarian&_done=%2Fgroup%2FTiddlyWiki%2Fbrowse_frm%2Fthread%2F4ab10453ff0ef077%3Fpage%3Dend%26q%3Dhungarian%26hl%3Dhu%26&page=end#doc_b5faeb8102dac85a
Jeremy!
As fas as I can see either you can include all special characters from
Latin-1 Supplement codepage at once ( http://www.unicode.org/charts/
PDF/U0080.pdf ) or add them step-by-step as your users start to request
it, or provide a modular way of specifying special upper/lovercase
pairs with config tiddlers :).
Feel free to drop me a mail if you need further help.
Cheers:
József