Solution for the Coexistance of Zawgyi and Unicode

174 views
Skip to first unread message

Ravi Chhabra

unread,
Feb 7, 2009, 4:33:32 AM2/7/09
to Myanmar...@googlegroups.com
Dear All,
The solution below has been tested on Firefox, and I would like the group to reproduce this on their machines and inform back to the group of the results. Extended from Saturngod's Post:
http://www.saturngod.net/?p=56


In Firefox, go to about:config and enter these two lines:

font.name.sans-serif.x-unicode "Myanmar3, Zawgyi1, Arial"
font.name.serif.x-unicode "Myanamr3, Zawgyi1, Times New Roman".

Now go to Myanmar IT Pros and disable all CSS and go to Myanmar Wikipedia as well. Both sites should generally work well with a few exception. Exceptions happen when sample text is too short, in that case Zawgyi text will appear as Myanamr3.

Note that Padauk or any Unicode font that uses the whole code range from U+1000 to U+109F can not be used. We can use this only because Zawgyi uses more code points than Myanmar3. Therefore it may for the time being make sens not to include Shan and other langauges inside Myanmar3 for the time being?

Let me know your thoughts.
Regards,
Ravi.
solution.png

Seth

unread,
Feb 17, 2009, 10:43:38 AM2/17/09
to MyanmarUnicode

Hello,
I'm just posting a follow-up to this solution, which was posted on
Myanmar IT Pros, in the thread:
http://myanmaritpros.com/forum/topic/show?id=1445004%3ATopic%3A110567&page=2#comments

Ravi, you wrote:
> Therefore it may for the time being make sense not to include Shan and other
> langauges inside Myanmar3 for the time being?

For the time being, maybe. However, I really like using Padauk. My
suggestion in the thread (about halfway down) was to use a
GreaseMoneky script, which scanned text on page-load for substrings
common to Zawgyi's encoding, and set the font if this was the case.
From my own work in Myanmar unicode, I would expect that one could
find a fast, accurate set of heuristics without too much difficulty.

This solution is particularly recommended for webmail clients,
since one might be receiving Myanmar email in either of the two
popular encodings, and your client (e.g., Gmail) cannot possibly be
expected to know the difference. Moreover, a font (like Arial Unicode)
will most certainly NOT omit glyphs for official code points, so the
problem must be addressed inevitably.

Please post back with any questions or comments.
-->Seth

Ravi Chhabra

unread,
Feb 18, 2009, 6:45:23 AM2/18/09
to Myanmar...@googlegroups.com
Agreed. The GM script should be the way forward.
Moreover, a font (like Arial Unicode) will most certainly NOT omit glyphs for official code points, so the problem must be addressed inevitably.
Not exactly sure what this means. Are you refering to the case where there is a hacked version of Arial-Zawgyi? Or are you refering to the fact that should Microsoft start adding Myanmar into Arial, we may need to come out with a solution like this anyway? What ever the reason, getting the GM down should be first, but we do need to create awareness about this so that user would actually install it. Working on the script now.
Cheers,
Ravi.

Keith Stribley

unread,
Feb 19, 2009, 6:02:06 AM2/19/09
to Myanmar...@googlegroups.com
2009/2/17 Seth <seth...@gmail.com>:

>
> For the time being, maybe. However, I really like using Padauk. My
> suggestion in the thread (about halfway down) was to use a
> GreaseMoneky script, which scanned text on page-load for substrings
> common to Zawgyi's encoding, and set the font if this was the case.
> From my own work in Myanmar unicode, I would expect that one could
> find a fast, accurate set of heuristics without too much difficulty.
>
> This solution is particularly recommended for webmail clients,
> since one might be receiving Myanmar email in either of the two
> popular encodings, and your client (e.g., Gmail) cannot possibly be
> expected to know the difference. Moreover, a font (like Arial Unicode)
> will most certainly NOT omit glyphs for official code points, so the
> problem must be addressed inevitably.
>

The GreaseMonkey solution sounds good. I use the Stylish firefox
plugin to set site specific font styles, though it doesn't allow auto
detection like the above solution. You can turn the rules on and off
quite quickly using the Stylish icon in the bottom right of the
browser.
https://addons.mozilla.org/en-US/firefox/addon/2108
Sample style:

@-moz-document domain(mail.google.com) {
*,input { font-family: Padauk, Myanmar3; }
}

I've also now updated my offline document converter tool to support
ZawGyi-One to Unicode 5.1 conversion. See
http://www.thanlwinsoft.org/ThanLwinSoft/DocCharConvert/ if you are
interested.

Keith

Seth Hetu

unread,
Feb 27, 2009, 2:43:13 AM2/27/09
to Myanmar...@googlegroups.com
> Not exactly sure what this means. Are you refering to the case where there
> is a hacked version of Arial-Zawgyi? Or are you refering to the fact that
> should Microsoft start adding Myanmar into Arial, we may need to come out
> with a solution like this anyway?

I'm referring to the second one. I think it is reasonable to expect
most companies (including Microsoft) to develop glyphs for anything in
the standard.

I've got your script installed on my laptop. So far, it works great!
-->Seth

Ravi Chhabra

unread,
Feb 27, 2009, 9:27:05 AM2/27/09
to Myanmar...@googlegroups.com
HI Guys,
I am working on Zawgyi detection and applying fonts based on the detect results on an element level. Please do provide feedback on it:

http://userscripts.org/scripts/show/42941

Currently it can do the following:

1. Check for presence of Myanmar Range, if not do nothing.
2. If present detect if it can be uniquely identified as Zawgyi or Unicode and apply appropriate fonts.
3. If detection fails apply the default font specified by the user in Firefox Preferences Database, if no Myanmar fonts is specified default to all known Unicode compliant Fonts.
4. Allow Script specific font settings through GM_setValue and use this if present. This will take precedence over user specified font via name.font.* name space.
5. Undo/Redo the changes made by the script via GreaseMonkey -> User Scripts Command...

For the coming iterations I am working on:
1. Add more Zawgyi specific detections. This is important for Default Font to work the way it is supposed to, right now the code is there but doesn't really work as I am cheating a bit. :D
2. Detect presence of Arial Zawgyi. And if present default font should be Zawgyi, unless GM specific font preference is also set.
3. Check for Unicode per word, if more than 90% of the words are Unicode/Zawgyi apply as such. This is to prevent typos from causing the script to think the text is something else than what it really is. This is still tricky as there are a lot of words and parts of speeches that would be either be the same in both Zawgyi and Unicode, hence I need to detect these separately as well and ignore it in the count. Some example such a sentence are given here:
မမ ဝဝ ထထ က အက ပထမ။ ကပါ ကပါ မမ ရာ၊ ညည လ သာသာ။  ည အခါ ငါ စာရ၊ မမ ဝဝ ထထ က။ (page 16)
That is a poem from the prescribed primary school Myanmar textbook. Sentences and part of speech like these uses only [က-အ] and[ါ-း].

4. Add debug mode and split the words into individual tags and apply fonts, to find out exactly why and where the detection failed and if that is an edge case that can be detected or simply too ambiguous.


I would like to know if there is any thing more that I should be adding. Many thanks for the feedback from Seth and Mark.
Cheers,
Ravi.

Seth Hetu

unread,
Mar 1, 2009, 9:47:50 PM3/1/09
to Myanmar...@googlegroups.com
> You can turn the rules on and off
> quite quickly using the Stylish icon in the bottom right of the
> browser.
> https://addons.mozilla.org/en-US/firefox/addon/2108

Thanks for the link. This is a good tool for enforcing the encoding on
a page which we know should contain entirely one encoding. (E.g.,
Zawgyi for the PlanetMM forums, or Padauk/Parabaik/Myanmar3 for the
Ubuntu translation project
(https://translations.launchpad.net/ubuntu/jaunty/+lang/my).)

Like Ravi said, no scanning detection tool can be 100% accurate. So
it's good that we have options.

Cheers,
-->Seth

Reply all
Reply to author
Forward
0 new messages