KanjiVG internationalization (simplified chinese)

174 views
Skip to first unread message

huayin...@yahoo.fr

unread,
Jan 6, 2011, 6:39:25 AM1/6/11
to KanjiVG
Hi everybody,

As a chinese computing guy, see http://aadant.com/applet, I am
interested in KanjiVG internationalization

Following the discussion on "what about the wiki commons stroke order
project", I start a fresh discussion on the subject (chinese side).

I think everyone agrees on the KanjiVg extension to support other
locale.

Here a some experimental results (latest KanjiVg data 03 january
2010) :

1. on the 1000 most frequent chinese characters, 303 do not exist in
KanjiVg (unicode symbol)

2. on the 5000 most frequent chinese characters, 2 052 do not exist
in KanjiVg (unicode symbol).

That is mainly the result of chinese simplification process.

So that, if you try to test the chineseness of KanjiVg you will get at
most (70% on 1000 et 60% limit).

There is quite a huge work to localize !

I checked it further by feeding the KanjiVg data (including Kaisho
variant) to an open source online chinese handwritten classifier
(HanziDict).

I got 56 % match on the whole resultset. As we can neglect the false
positive for this classifier,
the difference (4%) is likely to be the error rate of the classifier
AND the stroke order differences.

As mentioned by Dr Apel :

>Alex mentioned the data on stroke order and glyph variants. This should pretty much cover traditional Chinese >Kaisho/block character writing style. So, in fact, the project already should be a multi country project. Most of >these Kaisho variants were generated half automatically -- an approach one should also apply for most >characters of simplified Chinese. Getting working data shouldn't be difficult, but it would need an esthetic >revision later.

I checked again the traditional chinese set and got 80 % unicode match
on top 5000.

>There is only 1300 entries on Commons SOP, while Kanjivg have +6000 (if I'm
>right). Since most stroke order are identical between all countries*,
>variants to input to get a multi-country coverage is likely just 5~10%*. Not
>a big deal.

It should not be too difficult to :

- change the stroke order for some known variants in kanjivg (such as
生). The gain will be maximum 4%.
It can be done manually on the candidate kanjis.

- try to automate the simplification process (for example 訁to 讠) and
test back against the classifier (max 20% gain = traditional chinese
rate - simplified chinese rate = 80 - 60).

Maybe you have ideas on the subject ? Other data sources ?

Maybe my estimations are not correct ?

Do not hesitate to ask if you want the source code and all the data.

By the way, I wish all the group a successful new year.

Ben Bullock

unread,
Jan 7, 2011, 11:44:39 PM1/7/11
to KanjiVG
On Jan 6, 8:39 pm, "huayingdeb...@yahoo.fr" <huayingdeb...@yahoo.fr>
wrote:

> There is quite a huge work to localize !

There is a graphical version of the common simplified Chinese
characters in the Tomoe project files. See

http://sourceforge.net/projects/tomoe/

The exact file is

data/handwriting-zh_CN.xml

in the distribution.

This is a dormant project for Japanese / Chinese handwriting
recognition. I guess one possible first start at making the missing
simplified characters would be to turn the Tomoe formatted characters
(which are straight line segments) into the SVG format used by
KanjiVG. That would fill in the gaps, and perhaps if someone has
enough skills and time to do so they could also turn the sharp corners
into rounded corners as well. One pitfall is that the Tomoe data is
licensed under the GNU licence, which doesn't match the current
licence for the KanjiVG data.

If you want a Perl parser for the Tomoe data I have one, so write to
me by email and I'll send it.

> By the way, I wish all the group a successful new year.

Wishing you and all members a happy new year too.

hugo lopes

unread,
Jan 8, 2011, 3:18:56 AM1/8/11
to kan...@googlegroups.com
For the GNU License trouble, propose this project and a License migration to the license owner of Tomoe should acceptable. Wikipedia globally did so, migrating its content from GNU to CC-by-sa which is shorter and more convenient.

Regards,


--
You received this message because you are subscribed to the "KanjiVG" group.
For options and unsubscribing, visit this group at
http://groups.google.com/group/kanjivg



--
羅禹國 - Hugo LOPEZ,
Tw. tel: 09-8343-9890
Institute of Technology & Innovation Management, Master 2
NCHU, Taizhong, Taiwan.
250 KuoKuang Rd., Taichung 402, Taiwan, R.O.C.
國立中興大學 台中市402國光路250號 台灣

huayin...@yahoo.fr

unread,
Jan 8, 2011, 8:04:26 AM1/8/11
to KanjiVG
It think Ben is right to point out Tomoe data : it seems very
comprehensive and already localized :
I also found out these packages from Tomoe at tegaki : http://www.tegaki.org/

handwriting-zh_TW.xml (11853 characters)
handwriting_zh_CN.xml (6763 characters)

They seem to be quite identical although the diff is complicated by
the xml encoding (utf8="&#x4F7F;"), they have at least the same number
of characters.

I tried to same procedure than HanziDict with zinnia (converting
Kanjivg to zinnia). The figures are pretty identical (even slightly
lower).
I compared both matches and found that Zinnia is sometimes stroke
insensitive leading to a false positive match (rank one) :

生, 里, 馬, 有, etc ...

For example : 6709 (japanese order) match 6709 (taiwan order). This
character is correctly drawn in handwriting-zh_TW.xml.



Kanjivg -> Zinnia :

0 51 13
0 51 18
0 41 40
0 18 63
1 14 30
1 26 30
1 87 25
1 97 26
2 42 44
2 43 48
2 43 93
2 43 99
3 44 47
3 70 42
3 74 46
3 74 92
3 71 98
3 66 94
4 44 60
4 73 57
5 45 74
5 73 71

At first sight, the best classifier seems to be HanziDict for the
japanese to chinese stroke order sensitive match:

http://kiang.org/jordan/software/hanzilookup/

It could be possible to train it again the Tomoe data and see what
happen !
I think there would be not licence problem in using the data for
validation.

It would be of course be nice to take this data for the missing
simplified as a starting point. There are also the Dragon char
files :

http://dragon-char.sourceforge.net

that are already in a format very close to KanjiVG : there are
integrated to Huaying if you want to see (character stroke order tab).
In both cases, the current GPL licence applies
(maybe an alternative ...).

Even if the license stands, these datasets can be used for validation
without restriction.

In any case, the SVG path is only part of the work (label the stroke
numbers, label stroke types in XML is to be done anyway ...). Of
course, a manual check would still be needed for the new characters.
Automatic validation being only a safety net...

What do you think ?

Arnaud

On 8 jan, 09:18, hugo lopes <hugo....@gmail.com> wrote:
> For the GNU License trouble, propose this project and a *License
> migration*to the license owner of Tomoe should acceptable. Wikipedia
> globally did so,
> migrating its content* from GNU to CC-by-sa* which is shorter and more
> convenient.
>
> Regards,

Mathieu Blondel

unread,
Jan 8, 2011, 8:35:54 AM1/8/11
to kan...@googlegroups.com
2011/1/8 huayin...@yahoo.fr <huayin...@yahoo.fr>:

> handwriting-zh_TW.xml (11853 characters)
> handwriting_zh_CN.xml (6763 characters)

handwriting_zh_CN.xml is the same as in Tomoe. handwriting_zh_TW.xml
has been constructed by Christoph Burgmer by aggregating components
from other characters (both Chinese and Japanese).

However, in both cases, these characters were designed to serve as
templates for handwriting recognition. If you try to render them,
you'll find out that they consist of straight lines and are pretty
ugly. Therefore, if your purpose is to use characters as a visual help
for learning, the Tomoe data is probably not the best, although it may
be useful to bootstrap a new project.

My 2 cents,
Mathieu

huayin...@yahoo.fr

unread,
Jan 8, 2011, 7:36:37 PM1/8/11
to KanjiVG
KanjiVG gives not only a stroke path (order) but also structural
character information (decomposition, stroke types).

However, it is rather poor in terms of rendering.

GlyphWiki is a nice rendering tool although I am bit disappointed by
some character such as 儿
http://en.glyphwiki.org/wiki/u513f : 2 strokes and a ugly triangular
stroke end. GlyphWiki can not produce professional quality fonts. It
can not model stroke paths. That's a huge bias.

The most powerful rendering engine seems to be CDL (proprietary). It
works with basic strokes and bounding boxes, transformation, relative
positioning and stroke start / end types.

The result would look like a real font. That's a lot of work however !

Another approach is to find a public domain font and run an efficient
heuristic algorithm to map Kanjivg strokes to the font glyphs. After a
visual control by a human character by character
(or some automatic sanity check, for example submit the font strokes
skeleton to a recognizer) that mapping could be stored somewhere to
improve the rendering.

I would like to insist on the importance to have a separate recognizer
trained on a different training set than KanjiVG ! KanjiVG would
benefit from every open source efforts with no licence problems.

Arnaud

On 8 jan, 14:35, Mathieu Blondel <math...@mblondel.org> wrote:
> 2011/1/8 huayingdeb...@yahoo.fr <huayingdeb...@yahoo.fr>:

hugo lopes

unread,
Jan 9, 2011, 12:04:14 AM1/9/11
to kan...@googlegroups.com
Sources:
For CDL (Chinese Characters Description Languages) systems, a starting page is there:
On the bottom are links toward the specification of the CDL, SCML, and  HanGlyph systems.
If you know other systems, please add them to the wikipedia article. I think KanjiVG derserves its section.

Regards,


--
You received this message because you are subscribed to the "KanjiVG" group.
For options and unsubscribing, visit this group at
http://groups.google.com/group/kanjivg

hugo lopes

unread,
Jan 9, 2011, 12:09:20 AM1/9/11
to kan...@googlegroups.com
Well, I started the KanjiVG section. Please, if you have a good understanding of KanjiVG, its technology, add 5 lines to this wikipedia article. That will help people to have a basic understanding of KanjiVG.
Sources welcome (type: <ref>http://mysource.org/my_source.html</ref>)

huayin...@yahoo.fr

unread,
Jan 10, 2011, 4:15:36 AM1/10/11
to KanjiVG
>handwriting_zh_CN.xml is the same as in Tomoe. handwriting_zh_TW.xml
>has been constructed by Christoph Burgmer by aggregating components
>from other characters (both Chinese and Japanese).

I converted the data into the Hanzilookup format, re-run the
recognition against KanjiVG and spotted some inconsistencies.
There are some traces of the japanese origin for some characters such
as 生 (among other which are not in the correct stroke other). This
leads to false positive.

A good sanity check is the sequence vertical - horizontal -
horizontal (VHH ㇑a㇐ ㇐) that should not appear in chinese for the
same component. That's a means to spot mistakes in kanjivg (chinese
version). The fix is horizontal - horizontal - vertical (HHV).

Arnaud

huayin...@yahoo.fr

unread,
Jan 10, 2011, 4:48:37 AM1/10/11
to KanjiVG
Sorry the fix is VHH -> HVH.

生 japanese order is left descending Horizontal Vertical Horizontal
Horizontal (suffix HHV)
生 chinese orders is left descending Horizontal Horizontal Vertical
Horizontal (suffix HVH)

On 10 jan, 10:15, "huayingdeb...@yahoo.fr" <huayingdeb...@yahoo.fr>
wrote:

huayin...@yahoo.fr

unread,
Jan 11, 2011, 3:08:13 PM1/11/11
to KanjiVG


On 10 jan, 10:48, "huayingdeb...@yahoo.fr" <huayingdeb...@yahoo.fr>
wrote:
> Sorry the fix is VHH -> HVH.
>
> 生 japanese order is left descending Horizontal Vertical Horizontal
> Horizontal (suffix HHV)
> 生 chinese orders is left descending Hi again,

I am sorry Yug, I am new to KanjiVG and I am not sure I can contribute
to Wikipedia yet.

However, I processed handwriting-zh_TW.xml and managed to cover more
chinese characters from the top 1000 (1% gain only).

2 interesting news :

as assumed, there are very few false positive with this method. I
checked the positive chinese order against some japanese patterns and
got a few results :

example :

KanjiVG curious variants : 5230VtFst.svg for 到
chinese exceptions to the rule : example : 重

Here are the tested japanese patterns (when the kanjivg:element and
kanjivg:type are flattened in one line)

㇑a㇐㇐
㇐冉
二㇐丨㇑a







It remains a huge work on the 303 left. Maybe it is time for the
KanjiVG team to tell what we could do. I could contribute with the
source code of these estimations (maybe in a kanjivg subdirectory).

Interestingly, as I said in my first message :

>- change the stroke order for some known variants in kanjivg (such as
>生). The gain will be maximum 4%.
>It can be done manually on the candidate kanjis.

This is still valid

>- try to automate the simplification process (for example 訁to 讠) and
>test back against the classifier (max 20% gain = traditional chinese
>rate - simplified chinese rate = 80 - 60).

284 / 303 are simplified forms of a traditional character

209 / 303 are simplified forms of a traditional character
with correct stroke order

=> what we can try :

replace the traditional radical with the simplified one and
try to see if it works.

Max : around 20 % gain

In practice, less than that because the character must meet several
conditions :

- the radical has been simplified
- nb. strokes simplified - nb. strokes radical simplified = nb.
strokes traditional - nb. strokes radical traditional

Hence around 10 % (interesting on 5000 characters, not on 1000 ...).

Nevetheless more the 3000 likely correct stroke order with stroke
types and components is not nothing.
Maybe a good editor could help industrialize the production. A java-
based would be a good idea since the recognizer could be called (I
managed to call both HanziDict and zinnia).
A wiki too.

Arnaud

Alexandre Courbot

unread,
Jan 13, 2011, 2:17:26 AM1/13/11
to kan...@googlegroups.com
Hi,

> Nevetheless more the 3000 likely correct stroke order with stroke
> types and components is not nothing.
> Maybe a good editor could help industrialize the production.  A java-
> based would be a good idea since the recognizer could be called (I
> managed to call both HanziDict and zinnia).
> A wiki too.

If you want access to the wiki I would be glad to give you the password.

As for the editor, yes, this is definitely needed. I am working on one
(using Python/Qt), but unfortunately lack the time to put it to
completion right now. In addition, some rules must be set up for the
file naming of non-Japanese characters, and I am not knowledgeable
enough to make the right decisions for it. Also, in order to provide
font-quality rendering, I need to know how the stroke height should
vary according to the control points. Ulrich sent me some weight
variations a while ago and I played with it a little bit, but the
result is not quite as good as the samples he showed me. I can do the
programming and maintain the project, but I really need *directions*
for that.

So, basically some people with deep knowledge of sinographs are needed
for the project to advance. Ulrich is the man, but unfortunately he
seems busy with other things. So people, please don't be afraid -
anybody who wants to get involved and has the necessary knowledge will
have free hand, and if we can put that editor thing to completion the
project will already feel much better.

Alex.

Dr. Ulrich Apel

unread,
Jan 13, 2011, 8:58:18 AM1/13/11
to kan...@googlegroups.com
Hi Alex, hi everybody,

> Ulrich is the man, but unfortunately he
> seems busy with other things.

I am very sorry for my long silence. I do try to follow what is going in the mailing list, but yes, I am pretty busy at the moment. It should be better next month, when two deadlines are over and we have semester vacations. By the way, one deadline is connected to get financing for KanjiVG, too.

It seems that colleagues from Buddhist studies in Japan have developed a system for the description of kanji variants. I will try to look into this in February. I am also planning to discuss KanjiVG with our colleagues from Sinology in Tuebingen.

Adaption of stroke order shouldn't be difficult – a rough versions of simplified Chinese for existing characters, too. If our IT assistant Roger -- who is member of this mailing list too -- has some spare time in February or March, I guess we two could get a first working version.

I used a Japanese schoolbook font as model KanjVG. This won't be a solution for Chinese. Probably one will have to try to go directly to esthetic rules of calligraphy.

For missing non-variant characters, a component analysis is necessary.

I have worked with students on missing Unicode characters, but have no results from them yet.

For better looking fonts with KanjiVG one could apply ideas from METAFONT: laying shapes over the paths, defining stroke ends, defining stroke weights and so on: ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/974/CS-TR-83-974.pdf or http://www.tug.org/TUGboat/Articles/tb05-2/tb10hobby.pdf. The paper is from 1983, but the approach stood a prototype. I wouldn't be astonished, if a combination with KanjiVG would lead to very nice results.

> Ulrich sent me some weight
> variations a while ago and I played with it a little bit, but the
> result is not quite as good as the samples he showed me. I can do the
> programming and maintain the project, but I really need *directions*
> for that.


Alex, could you make your preliminary results available somewhere, then I will have a look at them.

Ulrich

hugo lopes

unread,
Jan 13, 2011, 9:04:39 AM1/13/11
to kan...@googlegroups.com
About sources:
The Common Stroke Oder Project found the following key sources:
  1. ROC: 常用國字標準字體筆順手冊 (Stroke order 14 rules), by the Taiwan Ministry of Education. Book available online (authoritative work). ISBN 957-00-7082-X
  2. PRC: 現代漢語通用字筆順規範, 453pages, 1997, editeur: 语文出版社, ISBN:7801262018 (Authoritative)
  3. Japan: 筆順指導の手びき (Hitsujun shidō no tebiki), 1958. (Authoritative from 1958 to 1977)
    Note: nowadays, the Japanese Ministry of Education let editors set freely a character's stroke order, which all should « follow commonsensical orders which are widely accepted in the society ».
  4. Hong Kong: 香港標準字形及筆順 - stroke orders following the Hong Kong Department of Education's List of Commonly Used Characters.
So there is authoritative sources for Chinese, and one (Taiwan) is fully available online.
That's should help a lot !

Regards,

--
羅禹國 - Hugo LOPEZ
Wikipedia: User:Yug

Tw. tel: 09-8343-9890
Institute of Technology & Innovation Management, Master 2
NCHU, Taizhong, Taiwan.
250 KuoKuang Rd., Taichung 402, Taiwan, R.O.C.
國立中興大學 台中市402國光路250號 台灣

Alexandre Courbot

unread,
Jan 13, 2011, 9:52:11 PM1/13/11
to hugo lopes, KanjiVG
Hi Hugo,

> @Alex:
> What do you mean by : "In addition, some rules must be set up for the


> file naming of non-Japanese characters"

> I create the file naming conventions for the Commons Stroke Order Project. If I understand better your requirement, I may help.
> Basically, the naming convention on the Commons SOP are:
> ROCPRCJapan
> *-tbw.png (5) *-bw.png (1,006)*-jbw.png (52)
>
> * : the CJK character (unicode)
> - : a separator.
> t,,j : the country code. With t for Taiwan, [nothing] for China, j for Japan.
> bw : the image kind, there bw for 'Black and White diagrams'
> .png : the extension.
>
> Clarify your meaning, then I will to move further.

Basically, something like this. The most important thing is to be able
to differenciate between the different versions of a character. The
current naming scheme of KanjiVG is:

xxxxx-Variant.svg

where the x's are the unicode in hexadecimal, and -Variant being an
optional code describing when the kanji is a variant, and of what.
This brings two questions that I think we should answer first (and put
down on the wiki for the record):

1) If there are variants, what are they variants of? I.e. what is the
reference version of the stroke? If KanjiVG turns international,
shouldn't it have its own -Variant code for consistency?
2) There are various suffixes as of now: Kaisho, Jinmei, JinmeiKaisho,
VtLst, HzFst, Vt4, ... It would greatly help if we could clearly
explain what they stand for (I have no idea) and again write it down
to the wiki (which I don't mind doing provided I get a clear
explanations and e.g. supporting sources).

Which brings me to 3):
3) Since there seems to be many kinds of variants across countries,
wouldn't it make more sense to classify the variants according to
their characteristics (what I think the current variants are about)
instead of nationality? I mean, even in China there seems to be
several schools, so maybe we could cover the range of variants more
accurately this way.

Speaking without any sense to back up here - but I hope I can finally
make sense of these strange names soon. ;)

Alex.

Alexandre Courbot

unread,
Jan 13, 2011, 10:14:44 PM1/13/11
to kan...@googlegroups.com
Hello Ulrich,

> I am very sorry for my long silence.  I do try to follow what is going in the mailing list, but yes, I am pretty busy at the moment. It should be better next month, when two deadlines are over and we have semester vacations.  By the way, one deadline is connected to get financing for KanjiVG, too.

That would be fantastic - especially if that allows you to get back on
the ship! :)

> For better looking fonts with KanjiVG one could apply ideas from METAFONT:  laying shapes over the paths, defining stroke ends, defining stroke weights and so on: ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/974/CS-TR-83-974.pdf or http://www.tug.org/TUGboat/Articles/tb05-2/tb10hobby.pdf.  The paper is from 1983, but the approach stood a prototype.  I wouldn't be astonished, if a combination with KanjiVG would lead to very nice results.

I will have a look at that.

>> Ulrich sent me some weight
>> variations a while ago and I played with it a little bit, but the
>> result is not quite as good as the samples he showed me. I can do the
>> programming and maintain the project, but I really need *directions*
>> for that.
>
>
> Alex, could you make your preliminary results available somewhere, then I will have a look at them.

I have attached a couple screenshots. I used the stroke weight
variations you sent me some time ago (the ones in AppleScript) and an
algorithm that makes a linear progression along the length of the
path. owari-sample is the one you sent me and is the reference.
owari-rendered is the one rendered using my algorithm. If you look at
the first two strokes, you will notice that they get thinner, then
bigger before the curve. This is probably not desired and suggests my
linear progression algorithm is wrong. There is probably something to
be done with the path's control points, but somehow my brain seems to
be hermetic to AppleScript and I could not make any sense out of it
(hence my naive try). Any clue here would be very welcome and would
also help me establishing some rules about the number and nature of
control points for the editor (i.e. when the user selects a stroke
type, a sample is inserted and only some parameters of it could be
changed).

Alex.

owari-sample.png
owari-rendered.png

huayin...@yahoo.fr

unread,
Jan 14, 2011, 8:59:26 AM1/14/11
to KanjiVG
Yes, you should describe the variant inside the XML file so that a
file can be used accurately for several locales.

The variant XML tag should link to locales.

<vairiants>
<variant locale="ja_JP" style="Kaisho"/>
<variant locale="zh_TW" style="Kaishu" source="bishuen"/>
<variant locale="zh_CN" style="Kaishu" source="kangxi"/>
</variants>

Why locale ? Because it is iso so that everybody knows...
Moreover, in XML, you can make up your mind and change the format
whenever you want to extend it.

The suffix does not matter but we could name it like :

9a6c-zh_CN.xml (simplified for 99ac chinese simplified only)

The debate about the suffix is therefore less important than the
variant information that should in any case reside in the XML...

I hope it brings something to this endless debate on naming
conventions ( see a may 2010 post on the very same issue) !

Alexandre Courbot

unread,
Jan 14, 2011, 10:38:21 PM1/14/11
to kan...@googlegroups.com
Hi,

> Yes, you should describe the variant inside the XML file so that a
> file can be used accurately for several locales.
>
> The variant XML tag should link to locales.
>
> <vairiants>
>  <variant locale="ja_JP" style="Kaisho"/>
>  <variant locale="zh_TW" style="Kaishu" source="bishuen"/>
>  <variant locale="zh_CN" style="Kaishu" source="kangxi"/>
> </variants>
>
> Why locale ? Because it is iso so that everybody knows...
> Moreover, in XML, you can make up your mind and change the format
> whenever you want to extend it.

So do you mean that the variant tag would link to another file? Or
that all the variants would be described (strokes + structure) in the
same file?

For the next version of KanjiVG format, I aim at having XML and SVG
files merged into a single file (this is already possible for
non-variants) that would be SVG-compliant. Therefore I'd rather be in
favour of having a single variant per file + an easy to figure naming
system. The style and source tags could also be added to the SVG file.

> 9a6c-zh_CN.xml (simplified for 99ac chinese simplified only)
>
> The debate about the suffix is therefore less important than the
> variant information that should in any case reside in the XML...

Indeed, but we would still need to know how to rename the large number
of existing files, and for this we need a match between the current
naming scheme and an eventual new one.

Alex.

huayin...@yahoo.fr

unread,
Jan 15, 2011, 5:21:23 AM1/15/11
to KanjiVG
@Alex :

> So do you mean that the variant tag would link to another file? Or
> that all the variants would be described (strokes + structure) in the
> same file?
>

1. no
2. yes indeed, the one that stores the information (at present XML,
tomorrow SVG).
I think Dr Apel talked about glyph variant earlier. These extra XML
informations could describe them also as well as source, style and
locale.

> For the next version of KanjiVG format, I aim at having XML and SVG
> files merged into a single file (this is already possible for
> non-variants) that would be SVG-compliant.

You will need to modify the python scripts to read the new information
with new arguments (the locale could be a good start).

>Therefore I'd rather be in
> favour of having a single variant per file + an easy to figure naming
> system. The style and source tags could also be added to the SVG file.
>

yes

>
> Indeed, but we would still need to know how to rename the large number
> of existing files, and for this we need a match between the current
> naming scheme and an eventual new one.
>

no, you would not. It is not necessary ... Simply add a variants tag
and a proper variant tag inside it (or a better name can be found,
that's just a proposal) because kanjivg:variant attribute already
exists.
Now, it is very easy :

For the empty suffix ones :

<variants>
<variant locale="jp_JA" style="scholar" source="MEXT"/>
</variants>

For the variant files, (example)

<variants>
<variant locale="ja_JP" style="Kaisho"/>
<variant locale="ja_JP" style="JinMei"/>
</variants>

or

<variants>
<variant locale="ja_JP" style="KaishoJinMeiVt2"/>
</variants>

In the SVG, the SVG filename should be of the XML description.

These are just examples to demonstrate that the data modeling is more
important than the naming ! And of course, the experts need to
validate the model.

What needs KanjiVG is a proper (relational / object / conceptual)
model. A visual one would be great.

Arnaud
Reply all
Reply to author
Forward
0 new messages