Slices with names in non-latin letters?

48 views
Skip to first unread message

Yakov

unread,
Feb 5, 2012, 12:56:06 PM2/5/12
to TiddlyWiki
Some time ago I explored that non-latin letters make two-colomn rows
of tables to be not slices. For instance, if I write

|Имя|Иван|

(which is |Name|Ivan|) I can not use it as a slice (for instance, in
transclusion macros). Recently I need "object-tiddlers" with different
slices and sections and processing them with GridPlugin [1], for
instance generate summaries of people or of other "objects" -- need it
more and more, and here's a question: is it possible that "slices part
of the core", or GridPlugin will be refactored so that non-latin name
will also be "available"? (or it's better to forget it and use names
with latin letters; which, of'course is not that good, esp. for
collaborations)

In core it seems to be somewhat problematic, since slices are stored
as "properties" (a "map") [2] and much is related on it. On the other
hand, if such engine (handling non-latin name) is put into the
GridPlugin, it seems that it would be worth to put into the core (and
probably refactoring the core would be easier to make).

Perhaps, it's more "TiddlyWikiDev" issue.. what do you think anyway?

[1] http://www.TiddlyTools.com/#GridPlugin
[2] https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/TiddlyWiki.js#L14

Eric Shulman

unread,
Feb 5, 2012, 1:56:55 PM2/5/12
to TiddlyWiki
On Feb 5, 9:56 am, Yakov <yakov.litvin.publi...@gmail.com> wrote:
> Some time ago I explored that non-latin letters make two-colomn rows
> of tables to be not slices. For instance, if I write
> |Имя|Иван|
> (which is |Name|Ivan|) I can not use it as a slice (for instance, in
> transclusion macros).
...
> is it possible that "slices part of the core", or GridPlugin will be
> refactored so that non-latin name will also be "available"?

The TWCore uses a regexp text pattern to parse slice defintions
embedded in tiddler content. Here's the pattern used by the core:
----------------
(?:^([\'\/]{0,2})~?([\.\w]+)\:\1[\t\x20]*([^\n]+)[\t\x20]*$)|(?:^\|
([\'\/]{0,2})~?([\.\w]+)\:?\4\|[\t\x20]*([^\n]+)[\t\x20]*\|$)
----------------

This pattern actually matches *two* alternatives for the slice-
defininition syntax using either "name:value" or "|name|value|". Of
course, regexp can be incredibly painful to read and understand... so,
here's a break out of the parts of this pattern:
----------------
(?: start "name:value" syntax
^ start of line
([\'\/]{0,2}) optional start bold or italic formatting
~? optional non-wikiword prefix
([\.\w]+) slice name
\: :
\1 optional end bold or italic (matching above)
[\t\x20]* optional leading whitespace
([^\n]+) slice value
[\t\x20]* optional trailing whitespace
$ end of line
) end name:value syntax
|
(?: start |name|value| syntax
^ start of line
\| table cell boundary
([\'\/]{0,2}) optional start bold or italic formatting
~? optional non-wikiword prefix
([\.\w]+) slice name
\:? optional :
\4 optional end bold or italic formatting (matching above)
\| table cell boundary
[\t\x20]* optional leading whitespace
([^\n]+) slice value
[\t\x20]* optional trailing whitespace
\| table cell boundery
$ end of line
) end |name|value| syntax
----------------

As you can see above, the slice *name* pattern is:
[\.\w]+
which matches one or more occurences of "." (any char except for
newline) or "\w" (any 'word' character = upper/lower letters, numbers,
or underline). This slice name pattern works successfully for
standard "latin" character sets. However, as you noted, it doesn't
seem to work when applied to non-latin character sets.

I'm guessing that the problem arises because "." and "\w" only match
single-byte characters, but the non-latin characters are using multi-
byte encoding. Unfortunately, although I'd hope that any decent I18N-
ready browser should handle multi-byte encodings properly, this might
be a limitation of the browser's internal regexp processing.

Still, you might be able to play around with the regexp pattern to use
hex codes (\xNN) to match the symbols of the non-latin character
set... but I suspect it will be VERY ugly :(

Sorry I can't offer a more encouraging response at this time.

-e
Eric Shulman
TiddlyTools / ELS Design Studios

----
WAS THIS ANSWER HELPFUL? IF SO, PLEASE MAKE A DONATION
http://www.TiddlyTools.com/#Donations
note: donations are directly used to pay for food, rent,
gas, net connection, etc., so please give generously and often!

Professional TiddlyWiki Consulting Services...
Analysis, Design, and Custom Solutions:
http://www.TiddlyTools.com/#Contact

Yakov

unread,
Feb 6, 2012, 11:27:13 AM2/6/12
to TiddlyWiki
Is it only RegExp which causes the problem? If so, I think I can make
a small patch which would substitue the current RegExp with the one
containing

[\.\wа-яё]

instead of

[\.\w]

(the а-яё part is russian alphabet). The other question is -- is it
safe to add \- and \s to this part (as they can be parts of "data
names", like in "long-term plans")? In fact, it seems that if someone
remembers what are the limitations here, [\.\w] can be substituted
with [^something] where "something" is those symbols that shouldn't be
in the slicename. In this case, this can go to the core.. (if the
RegExp is the *only* problem here).

PMario

unread,
Feb 6, 2012, 5:23:05 PM2/6/12
to TiddlyWiki
On Feb 6, 5:27 pm, Yakov <yakov.litvin.publi...@gmail.com> wrote:
> [\.\wа-яё]

If you have a look at the slices handling, you'll see that they end up
as something similar to the following.

var x={};
x["Имя"] = "Иван";
console.log(x)

So as long as the browsers javascript can access russian object
elements, it should be possible.

var x={};
x['абвгдеёжзийклмнопрстуфхцчшщъыьэюяабвгдеёжзийклмнопрстуфхцчшщъыьэюя']
=
"абвгдеёжзийклмнопрстуфхцчшщъыьэюяабвгдеёжзийклмнопрстуфхцчшщъыьэюя";
console.log(x)

Both tests seem to work. So the browser may digest it. I'm not so sure
about TW :)

> (the а-яё part is russian alphabet). The other question is -- is it
> safe to add \- and \s to this part (as they can be parts of "data
> names", like in "long-term plans")?

If you add "spaces" and "minus" to object elements, i bet you'll have
big troubles.

> ... In fact, it seems that if someone


> remembers what are the limitations here, [\.\w] can be substituted
> with [^something] where "something" is those symbols that shouldn't be
> in the slicename. In this case, this can go to the core.. (if the
> RegExp is the *only* problem here).

If this will be core, I'd expect someone has to make ultra havy
testing firs ;)

-m

Yakov

unread,
Feb 10, 2012, 4:31:50 PM2/10/12
to TiddlyWiki
I'll definitely do some tests when I get enough time. The main thought
in my head about this is: tiddlers have titles with almost any symbols
(like in [1], title is just a line!), and they are stored as properies
of the tiddlers object [2]! (not sure what does the "hashmap" term
means here) So at first glance it seems that it's possible to have
slices with any symbols.. Let me know if I miss something.

[1] https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/TiddlyWiki.js#L19
[2] https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/TiddlyWiki.js#L7

PMario

unread,
Feb 11, 2012, 9:26:16 AM2/11/12
to TiddlyWiki
On Feb 10, 10:31 pm, Yakov <yakov.litvin.publi...@gmail.com> wrote:
> ... (not sure what does the "hashmap" term
> means here) So at first glance it seems that it's possible to have
> slices with any symbols.. Let me know if I miss something.
In this case it's just a "lookup table" [2] to have fast access to a
tiddler, based on it's title.
see: https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/TiddlyWiki.js#L34

[2] http://en.wikipedia.org/wiki/Hashmap

-m

Yakov

unread,
Feb 12, 2012, 7:20:23 AM2/12/12
to TiddlyWiki
On 11 фев, 17:26, PMario <pmari...@gmail.com> wrote:
> On Feb 10, 10:31 pm, Yakov <yakov.litvin.publi...@gmail.com> wrote:> ... (not sure what does the "hashmap" term
> > means here) So at first glance it seems that it's possible to have
> > slices with any symbols.. Let me know if I miss something.
>
> In this case it's just a "lookup table" [2] to have fast access to a
> tiddler, based on it's title.
> see:https://github.com/TiddlyWiki/tiddlywiki/blob/master/js/TiddlyWiki.js...

So, what's different with slices? I looked up TiddlyWiki.js and it
seems that slices are used the same way.. Although, they can be used
differently in other .js parts.

PMario

unread,
Feb 12, 2012, 8:31:22 AM2/12/12
to TiddlyWiki
ahhh, :)
imo nothing. but there may be some chars, that are not allowed within
an object element. To handle this an escape mechanism has to be found.
I think you should extend your formatter with a plugin and run several
tests, to find out if it works ;)

-m

Yakov

unread,
Feb 24, 2012, 1:26:56 PM2/24/12
to TiddlyWiki
Ok, I've done simple tests. Adding

абвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ

(a space in the end) to each of the ([\.\w]+) parts let me get this
working:

==== Tiddler: [[Сегменты с русскими именами: тесты]] ====
|Slicename|slice content|
|Slice name|slice content 2|
|Имясегмента|содержимое сегмента 3|
|Имя сегмента|содержимое сегмента 4|
{{{<<tiddler [[Сегменты с русскими именами: тесты::Slicename]]>>}}}
<<tiddler [[Сегменты с русскими именами: тесты::Slicename]]>>
{{{<<tiddler [[Сегменты с русскими именами: тесты::Slice name]]>>}}}
<<tiddler [[Сегменты с русскими именами: тесты::Slice name]]>>
{{{<<tiddler [[Сегменты с русскими именами: тесты::Имясегмента]]>>}}}
<<tiddler [[Сегменты с русскими именами: тесты::Имясегмента]]>>
{{{<<tiddler [[Сегменты с русскими именами: тесты::Имя сегмента]]>>}}}
<<tiddler [[Сегменты с русскими именами: тесты::Имя сегмента]]>>

(each of the four tiddler macro shows the content).

But the thing is -- I got this working when I changed the core. First,
I wrote a plugin:

TiddlyWiki.prototype.slicesRE = /(?:^([\'\/]{0,2})~?([\.
\wабвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ ]
+)\:\1[\t\x20]*([^\n]*)[\t\x20]*$)|(?:^\|([\'\/]{0,2})~?([\.
\wабвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ ]
+)\:?\4\|[\t\x20]*([^\|\n]*)[\t\x20]*\|$)/gm;

which didn't work. I guess it's because the definition of the slicesRE
is changed after slices hashmap is built. Is anybody aware of fast
method of rebuilding the slices? Of'course, I can copy the store, than
purge the main one, than copy tiddlers back to the main store, but
this is bulky for each-startup procedure.

On the other hand, I'm going to analyse the syntax and do some tests
and then discuss this for the core update, so perhaps the first
question is not of that importance.

Reply all
Reply to author
Forward
0 new messages