KOI-8, encoding autodetection and alike. Feature request.

65 views
Skip to first unread message

Igor

unread,
Jun 23, 2008, 7:35:43 PM6/23/08
to scite-interest
Hello!

I'm looking for a good text editor and SciTE seems to me the best
choice. But it lacks one important (IMHO) feature. Because I edit
sources both of windows and linux programms I need to edit files in
utf-8, cp-1251 and koi-8 encodings. First two encodings are supported
well in SciTE but not the koi-8. It is impossible to just open file in
koi-8 and edit under windows. You need open it, convert to cp-1251,
edit and convert back to koi-8 before saving. It is very inconvenient.
It will be good to allow user to chose koi-8 from encodings menu. Why
it is not possible to edit file in koi-8 directly? Or with transparent
to user conversion on load/save.

Before SciTE I have used another editor. But it lacks unicode support,
so I chose SciTE. In that editor there was this features:
1) It was possible to change current encoding by pressing Alt-F8 and
choosing from list of installed encodings.
2) For each file editor remembers encoding set by user, so user need
not so set encoding again and again on every open.
3) There was character set autodetection feature based on frequency
distribution of characters in different encodings. This feature works
fine practically without wrong guess (only on very short files or with
non-typical texts, e.g. only in capital letters).

So user can just open file and edit it don't thinking about what
encoding the file was in and what encoding it should be saved before
closing.

Is it possible to implement some of these features in SciTE? (If yes,
SciTE will be an ideal text editor.)

Thank you.

Remi Gillig

unread,
Jun 23, 2008, 8:51:30 PM6/23/08
to scite-interest
Why don't you only use Unicode encoding?
KOI-8 seems to be for Cyrillic characters and Unicode manages them
anyway and Unicode is very well supported everywhere.

Remi Gillig.

izh...@gmail.com

unread,
Jun 24, 2008, 5:06:29 AM6/24/08
to scite-i...@googlegroups.com
Unfortunately not everywhere and not by all. I have many-many-many
files coming from linux world in koi-8. My friends send me letters, I
edit other's sources... I can't just enforce all other world to use
unicode. Just because some useful projects are not maintained anymore
and I have no time to convert everything to unicode. In russian linux
and unix world IMHO koi-8 still number one encoding. Why does SciTE
supports cp-866 - old cyrillic encoding used in DOS? For historic
reasons, I guess. And it is good since I have some old files in DOS
encoding too, e.g. my school labs. Or I must convert them to unicode
too? ;-) Why does editor decide what user can edit in convenient way
and what can't? I just need to edit my files. If it is not possible to
change the SciTE, sigh, I will look for another editor with koi-8
support. :-(

2008/6/24, Remi Gillig <remig...@gmail.com>:

Remi Gillig

unread,
Jun 24, 2008, 5:31:18 AM6/24/08
to scite-i...@googlegroups.com
I'm not sure at all if they support KOI-8 already but as they are a
russian community and apparently more willing to add this kind
of functionality you should try SciTE-Ru :
http://code.google.com/p/scite-ru/

Remi Gillig.

2008/6/24 <izh...@gmail.com>:

Neil Hodgson

unread,
Jun 24, 2008, 7:07:44 AM6/24/08
to scite-i...@googlegroups.com
Igor:

> First two encodings are supported
> well in SciTE but not the koi-8. It is impossible to just open file in
> koi-8 and edit under windows.

Other 8 bit character sets are supported directly on Windows by
just setting the character set parameter when choosing a font.
Supporting other character sets is a lot more work and no one has been
sufficiently interested to implement this.

Neil

Igor Zhbanov

unread,
Jun 24, 2008, 8:13:15 AM6/24/08
to scite-i...@googlegroups.com
2008/6/24 Neil Hodgson <nyama...@gmail.com>:

> Other 8 bit character sets are supported directly on Windows by
> just setting the character set parameter when choosing a font.
> Supporting other character sets is a lot more work and no one has been
> sufficiently interested to implement this.

Changing font for each file is not very convenient way. IMHO most of
russian speaking people (especially programmers) need an editor which
supports unicode, cp1521, cp866 and koi8 encodings. By the way, as I
know linux version of SciTE supports well koi8, but it lacks of
support of cp1251. All I want is to edit linux files under windows and
vice verse. Why not to make SciTE more cross-platform? If english is
the most familiar language, should people speak just english? Every
text editor want to be popular in countries using cyrillic encodings
must support at least cp866 (aka dos, aka oem), cp1251 (aka windows)
and koi8. And, of course, unicode. By the way, one third to one half
of e-mails are still in koi8 encoding in Russia.

Neil Hodgson

unread,
Jun 24, 2008, 8:39:05 AM6/24/08
to scite-i...@googlegroups.com
Igor Zhbanov:

> Changing font for each file is not very convenient way.

The call to SetLogFont/CreateFontIndirect takes a font name and a
character set and then displays the bytes of text as that character
set. You can ask it to use Tahoma and GREEK_CHARSET for example. KOI-8
is not one of the character sets supported by Windows.

Neil

Igor Zhbanov

unread,
Jun 24, 2008, 8:52:37 AM6/24/08
to scite-i...@googlegroups.com
2008/6/24 Neil Hodgson <nyama...@gmail.com>:

> The call to SetLogFont/CreateFontIndirect takes a font name and a
> character set and then displays the bytes of text as that character
> set. You can ask it to use Tahoma and GREEK_CHARSET for example. KOI-8
> is not one of the character sets supported by Windows.

Yes. On linux systems it is vice verse: it often lacks of windows
charset. So cross-platform editors transparently recode editing file
and user's input to internal encoding (e.g., utf-8). Or they edit file
as is but convert it before displaying to encoding supported by
screen.

P.S. Thanks for listening. :-)

Reply all
Reply to author
Forward
0 new messages