Re: [scintilla] How to make autocompletion work on IME

391 views
Skip to first unread message

Neil Hodgson

unread,
Mar 26, 2015, 8:36:29 PM3/26/15
to scite-i...@googlegroups.com, john...@dreamwiz.com
[Moving to the SciTE mailing list (http://groups.google.com/group/scite-interest) as this is no longer a Scintilla issue]

johnsonj:


   In SciTE, the API autocompletion is normally shown when the user wants by pressing Ctrl+I and the user may also choose for API autocompletion to appear when particular characters are typed. The patch shows API autocompletion automatically even when it has not been requested.

   Neil

sonj john

unread,
Mar 27, 2015, 2:53:12 AM3/27/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
Thank you for testing.
I need more testing for your instruction.

Neil Hodgson

unread,
Mar 29, 2015, 6:12:46 PM3/29/15
to scite-i...@googlegroups.com
The definition of character sets used for autocompletion and calltips, such as autocomplete.*.start.characters, should allow for large sets of characters such as representing all Korean characters. However, this becomes difficult as there are 110,000 characters defined by Unicode so char strings holding nearly all as UTF-8 may be around 300K. A better data structure (possibly a bit map (24K) or a list of character ranges) would help. There also needs to be a way to specify these more compactly so you can say something like autocomplete.*.start.characters=$(chars.alpha)$(chars.hangul).

Implementing this well is quite a bit of work.

Neil

sonj john

unread,
Mar 30, 2015, 7:47:01 AM3/30/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
semantics of autocompletion on ime characters is different from SBCS.

scintilla's current autocompletion and calltip works in SBCS and is for function.
but ime characters has to be treated specially which may better be used for text writing not for programming.
I want to use autocompletion with abbreviation list box.

all ime characters may be considered autocomplete.*.start.characters.
Do not set word.characters and auto~start.characters,
do not compare Contains(),
no need ContinueCallTip().

If we need the existing scintilla way of autocompletion for programming.
DBCS should be taken care of to convert from unicode, too.
And then it should need SetLocale(). things get more and more difficult.

I think taking different semantics for ime characters gets all things easy.

sonj john

unread,
Mar 30, 2015, 7:54:41 AM3/30/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
"""The patch shows API autocompletion automatically even when it has not been requested."""

I know Ctrl+I, but I do not understand above "requested" meaning.
I mean autocompletion be triggered on all ime characters.
Do you mean it has no "if Contains(autocompletecharacters, unicode)" statement?

Neil Hodgson

unread,
Mar 30, 2015, 6:15:12 PM3/30/15
to scite-i...@googlegroups.com
sonj john:

> semantics of autocompletion on ime characters is different from SBCS.
>
> scintilla's current autocompletion and calltip works in SBCS and is for function.

SciTE is a programmers editor so has features for making it easier to work with APIs. These include autocompletion of APIs and display of calltips. This should not be dependent on the encoding but identifiers may commonly only use a subset of ASCII so that is what is well supported.

Editing textual documents is a less important application for Scintilla / SciTE than editing source code, markup language files, or data files. There exist many word processors which are better oriented and have more features for text input.

> but ime characters has to be treated specially which may better be used for text writing not for programming.
> I want to use autocompletion with abbreviation list box.

Word completion would be a new feature. The existing API features may be used to try to implement this but they do not do so easily or well. Trying to morph API completion into word completion through special-casing non-Latin1 is going to weaken API completion and make word completion difficult to understand and implement.

This type of completion works with Latin1 as well as MBCS. So non-Latin1 should not be treated differently.

If word completion were to be added, how would word completion and API completion cooperate?

I do not know if it is worthwhile adding word completion to SciTE.

Neil

sonj john

unread,
Apr 1, 2015, 9:31:09 AM4/1/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
I have made Contains(std::string, int) and UTF32ToUTF8() to test API completion in utf8.
I find it is simpler without them.
then it does not need converting functions DBCS and UTF8 between UNICODE,
Here is just for a review.
======================================================
    if (utf32 > 127) { // non Ascii character
        if ((selEnd == selStart) && (selStart > 0)) {
            if (wEditor.Call(SCI_CALLTIPACTIVE)) {
                ContinueCallTip();
            } else {
                // Assume that autoCompleteStartCharacters contains utf32
                StartAutoComplete();
            }
        }
        return;
    }


sonj john

unread,
Apr 9, 2015, 4:47:19 AM4/9/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
just for a review.
It seems work in utf8 as instructed.
gcc complains a lot of STL things related with codecvt.
How can I support DBCS with STL?

and
I encounterd following error while building latest SciTE with mingw32-make.

mingw32-make: *** No rule to make target 'Accessor.o', needed by '../bin/Sc1.exe'.  Stop.
auto0409.patch

Neil Hodgson

unread,
Apr 9, 2015, 7:56:29 AM4/9/15
to scite-i...@googlegroups.com
sonj john:

> gcc complains a lot of STL things related with codecvt.
> How can I support DBCS with STL?

codecvt is poorly implemented by C++ compilers. C++11 should mean at least UTF-32 ↔︎ UTF-8 and UTF-16 ↔︎ UTF-8 are available but we haven’t yet started requiring C++11.

The easiest way to support DBCS is with platform calls.

> I encounterd following error while building latest SciTE with mingw32-make.
>
> mingw32-make: *** No rule to make target 'Accessor.o', needed by '../bin/Sc1.exe'. Stop.

Most likely you didn’t build Scintilla first.

Neil

sonj john

unread,
Apr 9, 2015, 11:08:41 AM4/9/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
"""Most likely you didn’t build Scintilla first."""
yes, you are right.
I have tried codecvt with nmake.
rebuilding scintilla with mingw32-make solves the problem. thank you.


"""The easiest way to support DBCS is with platform calls."""
I gave up codecvt and setlocale.
so I used a temporay function UTF32ToUTF8 to test whether ime autocompletion works.
It seems work well in utf8.
How can I draw down platform calls to src level?
since src level shoud not depend on platforms?


sonj john

unread,
Apr 9, 2015, 7:26:06 PM4/9/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
I wonder how use this.

SCI_ENCODEDFROMUTF8(const char *utf8, char *encoded)
SCI_SETLENGTHFORENCODE(int bytes)

Neil Hodgson

unread,
Apr 12, 2015, 6:45:47 AM4/12/15
to scite-i...@googlegroups.com
[moving this to SciTE list]

johnsonj:

> I wonder what is your intention about supporting DBCS.
> autocompetion needs your policy.

As I discussed a few messages back, SciTE doesn’t do general text autocompletion, instead having API completion. Adding text autocompletion would be a new feature. It would be a significant effort to design and implement this and I’m not sure its worth it.

Neil

sonj john

unread,
Apr 12, 2015, 7:05:30 AM4/12/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
I integrated ime autocompletion with existing scheme.
I would follow the scintilla way.
I made it work in dbcs and utf8.

1. scintilla returns dbcs and utf each, I accept.
2. scintilla returns unicode according dbcs or utf8, I admit

but scintilla should not return utf8 while dbcs mode.
it is nonsense.

Do you think allow ime autocompletion only in utf8 mode?
At any rate, I will follow you.
Point me where to go.

sonj john

unread,
Apr 12, 2015, 7:07:21 AM4/12/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
auto0409.patch shows it

Neil Hodgson

unread,
Apr 12, 2015, 7:18:43 AM4/12/15
to scite-i...@googlegroups.com
sonj john:

> I integrated ime autocompletion with existing scheme.

Text autocompletion appears different to me so should not be integrated. On OS X, for example, a different UI is used for text autocompletion with a single item and either an X or a reversion arrow.

> I would follow the scintilla way.
> I made it work in dbcs and utf8.
>
> 1. scintilla returns dbcs and utf each, I accept.
> 2. scintilla returns unicode according dbcs or utf8, I admit
>
> but scintilla should not return utf8 while dbcs mode.
> it is nonsense.
>
> Do you think allow ime autocompletion only in utf8 mode?
> At any rate, I will follow you.
> Point me where to go.

Your example files are not APIs. If you really want to handle APIs that contain non-Latin characters then we could work on that but, in that case, I want to see the files.

Neil

sonj john

unread,
Apr 12, 2015, 7:35:41 AM4/12/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
I have tried all sorts things to support DBCS.
suprisingly, I find auto0409.patch is complete already.

Python3 does allow ime characters.
I have already test it.

I adopt dicitonaries ad more complicated exmple.

auto0409.patch follows scintilla way.
I will submit example python3 api.

sonj john

unread,
Apr 12, 2015, 9:52:11 AM4/12/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
you may be familiar with python.properties with which is I initially started.

sample api
type "kana"

But I do not know why display order is reverse.
I think DBCS mode also shoud work well.
autoHiarakana.jpg
calltipHirakana.jpg
autoHangul.jpg
calltipHangul.jpg
imechar.api
python.properties

sonj john

unread,
Apr 12, 2015, 9:59:14 AM4/12/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com

I do not argue this is perfect.
I want to fill it with your instruction, your help and your intention.

almost close to target. see just as your expectation!

Neil Hodgson

unread,
Apr 13, 2015, 5:32:22 AM4/13/15
to sonj john, scite-i...@googlegroups.com
sonj john:

> I do not argue this is perfect.

I expect it will take a long time to make it usable. Including lists of all Hangul characters in a .properties file is ugly and difficult to work on. It also means defining the encoding of .properties files.

Including bracket variants like ‘)’ as well as ‘)’ doesn’t really make sense since only ‘)’ will be interpreted correctly by Python. It may be easier to enter ‘、’ and ‘)’ from the IME but they are syntax errors to the interpreter.

> I want to fill it with your instruction, your help and your intention.
>
> almost close to target. see just as your expectation!

Its better to remove unnecessary code before publishing it. In auto0409.patch, there is an extra function UTF32ToBytes which only calls UTF32ToUTF8Character, <locale> is included for no apparent reason, and CharAdded’s argument is called ch in one place and utf32 in another.

UTF32ToUTF8Character uses division, modulo and plus which is unexpected: most UTF code (like scintilla/src/UniConversion.cxx) uses ands and ors. That makes it harder to check against another implementation.

Neil

sonj john

unread,
Apr 13, 2015, 7:59:23 PM4/13/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
thank you for your intruction.
chaged according to your insturction.
I take plugin strategy as oringinally planned instead of integrating strategy.

auto0414.patch

Neil Hodgson

unread,
Apr 15, 2015, 3:37:49 AM4/15/15
to scite-i...@googlegroups.com
If you want the character bytes in the current document encoding then they have just been inserted in the document so can be retrieved from there as the whole character before the caret.

Neil

sonj john

unread,
Apr 15, 2015, 9:06:33 AM4/15/15
to scite-i...@googlegroups.com, john...@dreamwiz.com, nyama...@me.com
The character bytes before the caret in the current document encoding is notified on the instant of typing through SCN_CHARADDED.
The Utf32 code point from scintilla contain the character bytes as is in DBCS or UTF8.
so no need codecvt.h, no need encodedFromUTF8.

johnsonj

unread,
Apr 17, 2015, 9:37:31 PM4/17/15
to scite-i...@googlegroups.com, nyama...@me.com, john...@dreamwiz.com
copy & pasted from SciteGTK for Ime autocompletion.
but no longer needed for it.
I tested it works in UTF8, but I am not sure in DBCS.

May it help? I hope.
encode0414.patch

Neil Hodgson

unread,
Apr 21, 2015, 7:58:13 PM4/21/15
to scite-i...@googlegroups.com
johnsonj:
With Scintilla calls that have output string parameters, when a NULL is given, it is supposed to return the size needed. That allows the caller to ask how big to make the buffer before allocating it. Neither of these work with NULL.

Both the segments are limited to arrays of 5 wchar_t when they will often be called with larger strings.

Neil
Message has been deleted

johnsonj

unread,
Jul 6, 2015, 5:19:12 AM7/6/15
to scite-i...@googlegroups.com, nyama...@me.com
After a lot of trial and error, I have got to come back to my initial plugin strategy again.
Here shows my conclusion.
just simple.
autoIME0706.patch

Neil Hodgson

unread,
Jul 6, 2015, 7:21:15 PM7/6/15
to scite-i...@googlegroups.com
johnsonj:

> After a lot of trial and error, I have got to come back to my initial plugin strategy again.
> Here shows my conclusion.
> just simple.

This is similar to assuming all characters > 0xFF should be in the autocompletion starts character set and that the user wants automatically triggered autocompletion.

Automatically triggered autocompletion is a feature that users must currently opt into. Only 3 current .properties files have settings for autocomplete.<lexer>.start.characters and 2 of those set it to just “.” so that autocompletion appears when “.” is typed. Discovering the set of matching words is relatively expensive for something that may occur for each keystroke.

There should be some control over this so that users can decide if they want it.

Neil

johnsonj

unread,
Jul 6, 2015, 11:21:55 PM7/6/15
to scite-i...@googlegroups.com, nyama...@me.com
Thank your for your instruction.


"""This is similar to assuming all characters > 0xFF should be in the autocompletion starts character set and that the user wants automatically triggered autocompletion."""

"all characters > 0xFF" with untouched should not be passed into scintilla.
"all characters > 0xFF" do not limit users option.
Users have options in autocomplete.props.start.characters and calltip.props.word.characters.
Just do not give them ime characters, autocompletion box or calltip box will not pop up.


"""Discovering the set of matching words is relatively expensive for something that may occur for each keystroke."""

I am not sure, but it works well and fast for 2350 hangul characters in practice.
auto0707.patch

Neil Hodgson

unread,
Jul 10, 2015, 2:36:39 AM7/10/15
to scite-i...@googlegroups.com
johnsonj:

<auto0707.patch>

   Committed as 

   ime.autocomplete isn’t really an accurate name as it appears for Russian or Greek language input and they do not use IMEs.

   Neil

johnsonj

unread,
Jul 10, 2015, 7:41:38 AM7/10/15
to scite-i...@googlegroups.com, nyama...@me.com
Thank your for hard works.

This patch does not support multi bytes characters but just blocks multibytes to be passed into scintilla.
So it is not clear to choose 'multibyte.autocomplete'.
And 2bytes characters such as greek or russian to be input should be set through IME setting.(http://altec.colorado.edu/writing/IME_install_windows.shtml#recommendations)
So I think the name "ime.autocomplete" is not bad.

johnsonj

unread,
Jul 10, 2015, 7:51:55 AM7/10/15
to scite-i...@googlegroups.com, nyama...@me.com
I have tested it with std::string.
I hope a new function supported by scintilla, regardless of my playing.
and it could also be used in lua scripts.

   SCI_CHARACTERAT

It shoud return document string(one character), not one byte.
I will keep going on playing with std::string.

Neil Hodgson

unread,
Jul 11, 2015, 8:48:27 PM7/11/15
to johnsonj, scite-i...@googlegroups.com
johnsonj:

> I hope a new function supported by scintilla, regardless of my playing.
> and it could also be used in lua scripts.
>
> SCI_CHARACTERAT
>
> It shoud return document string(one character), not one byte.

So similar to GetTextRange of position..PositionAfter(position)?

Neil

johnsonj

unread,
Jul 12, 2015, 2:15:03 AM7/12/15
to scite-i...@googlegroups.com, nyama...@me.com
Thank you for kind instructions.
You remind me of...

SCI_POSITIONBEFORE(int position)
SCI_POSITIONAFTER(int position)

That was the way I think of.
I had better read the documents more seriously again.
Reply all
Reply to author
Forward
0 new messages