Need for Iran System to UTF-8 converter source code

871 views
Skip to first unread message

ابراهیم محمدی

unread,
Oct 22, 2011, 2:42:41 PM10/22/11
to Persian Computing
Salaam,

I need source code of a converter for Iran System codepage to UTF-8. Any popular target encoding which is easily convertible to UTF-8 must be enough. I searched through the Internet, but failed to find a working source code.

Do you know of such a source code?

BTW, what is the reason Iran System is not implemented in libiconv? Is it just that nobody has tried?

Regards.

--
ابراهیم محمدی
Ebrahim Mohammadi

Behnam Esfahbod

unread,
Oct 22, 2011, 3:18:01 PM10/22/11
to ابراهیم محمدی, Persian Computing
Ebrahim,

I think the main reason that it's not implemented in iconv is the fact that Iran System encoding is a shaped-based encoding, not a letter-based one. Thus, it's not obvious how to convert it to Unicode's letter-based codes (the Arabic block, U+0600..06FF block).


By the way, if you are sure that your files are encoded in Iran System, can you share some of them with us? They can be a good resource for anyone who wants to write a converter.

Best,
-Behnam


2011/10/22 ابراهیم محمدی <ebr...@mohammadi.ir>



--
    '     بهنام اسفهبد
    '     Behnam Esfahbod
   '      http://behnam.esfahbod.info
  *  ..   Persian Internet Society
 *  `  *  http://persian-isoc.org
  * o *   3E7F B4B6 6F4C A8AB 9BB9 7520 5701 CA40 259E 0F8B

ابراهیم محمدی

unread,
Oct 22, 2011, 8:58:52 PM10/22/11
to Behnam Esfahbod, Persian Computing
I implemented a (hacky) tool to do the conversion: http://code.ebrahim.ir/iransystem
Comments, testing, bug reports, etc. are all welcome.

Saddest part of the story was reverse order of RTL characters. For example to write سلام many old DOS application used to write byte of م first, and then ا, then ل, and finally س! Now add some LTR character sequences (like digits) to the mixture! (O_O)

The tool is put into public domain.

2011/10/22 Behnam Esfahbod <beh...@esfahbod.info>

Behdad Esfahbod

unread,
Oct 23, 2011, 5:33:31 AM10/23/11
to ابراهیم محمدی, Behnam Esfahbod, Persian Computing
On 10/22/2011 08:58 PM, ابراهیم محمدی wrote:
>
> Saddest part of the story was reverse order of RTL characters. For example to
> write سلام many old DOS application used to write byte of م first, and then ا,
> then ل, and finally س! Now add some LTR character sequences (like digits) to
> the mixture! (O_O)

What you need is a reverse-bidi algorithm. So far, seems like, the best
reverse-bidi algorithm around is the bidi algorithm itself. You can use
FriBidi for example.

behdad

Reply all
Reply to author
Forward
0 new messages