Error: This string is not supported in this configuration.

30 views
Skip to first unread message

Aleksa Petrovic

unread,
Aug 18, 2023, 5:07:35 PM8/18/23
to niceeditor

Hello,

with a file that contains UTF-8 Unicode text,
I start writing a part of a Cyrillic string and then call the AutoComplete command, but then I get the error
This string is not supported in this configuration.
Please tell me why this happens and how to fix it.
The version of NE that I use is 3.3.2 (2022-09-27).
The UTF-8 flag is set to true, as is the UTF-8 automatic detection flag, and in the file .default#ap I have set the UTF8Auto flag to 1.

Best regards,

Aleksa Petrović

Sebastiano Vigna

unread,
Aug 22, 2023, 7:42:28 AM8/22/23
to nicee...@googlegroups.com
We'll need a much more detailed bug report to look at this. A file, exact position, etc.

Ciao,

seba

uto...@gmail.com

unread,
Aug 22, 2023, 4:40:42 PM8/22/23
to niceeditor
There's only one place in the entire code base where that error is set, and that's in the INSERTSTRING_A action. It would help to know in much more detail exactly what's happening in your case. It looks as though you've entered a prefix containing some Cyrillic, but we don't know what encoding that uses. But since you're seeing an error that can only come from INSERTSTRING, I must presume you either get shown a list of possible completions and you've selected one, or all possible completions have a common prefix, such that in either case autocomplete is attempting to insert either your selection or the common prefix into your document.

Here's where it gets tricky. (copying from a comment in ne.h):
   A buffer or clip at any given time may be marked as ASCII, UTF-8 or
   8-bit. This means, respectively, that it contains just bytes below 0x80, an
   UTF-8-encoded byte sequence or an arbitrary byte sequence.

   The attribution is lazy, but stable. This means that a buffer starts as ASCII,
   and becomes UTF-8 or 8-bit as soon as some insertion makes it necessary. This
   is useful to delay the encoding choice as much as possible.

The string returned from AUTOCOMPLETE is evaluated for it's encoding independent from its source. For example, if it happens to contain only ASCII characters (i.e. below 0x80), it doesn't matter what its source document's encoding was. If that string is either UTF-8 or 8-bit, then it can only be inserted into a UTF-8 or 8-bit encoded document, respectively, OR into an ASCII document with the side effect of changing the target document's encoding from ASCII to match that of the inserted string.

Given that your document is marked UTF-8, my guess is that INSERTSTRING has determined that the string returned by AUTOCOMPLETE is 8-bit, and therefore can't be inserted into a UTF-8 document, thus the error. It could be inserted into an ASCII document, but with the side effect of changing that document's encoding from ASCII to 8-bit. The same 8-bit string could be inserted into an 8-bit encoded document with no side effects.

Your only options I see for making this work are (1) express your Cyrlic in UTF-8 encoding so that it can be inserted into a UTF-8 encoded document, or (2) use 8-bit encoding for both your autocomplete sources and your target document.

I don't think this is a bug; it's the only option given the constraints imposed by the document encoding conventions at play.

pepa65

unread,
Aug 23, 2023, 3:30:04 AM8/23/23
to niceeditor
If this is the case, another legitimate option is to (automatically, as a service to the user) convert the 8-bit string into UTF-8 and then insert it.

I don't know if that would require too much extra code, but that would be very nice.

uto...@gmail.com

unread,
Aug 23, 2023, 6:56:43 AM8/23/23
to niceeditor
It would be nice, but it isn't really part of the design. Ne has no notion of "codepages"; it's either ASCII, UTF-8 (which is a superset of ASCII), and "everything else" which ne lumps together as "8-bit". It probably wouldn't be too hard to convert from any single codepage to UTF-8, but a general solution, in my opinion, should reside outside of ne. Ne itself is unlikely to be a key component of a workflow that routinely requires editing text with varying codepages — i.e. outside the scope of ASCII/UTF-8.

If anyone would like to study the problem, you might as well start here:

Aleksa Petrovic

unread,
Aug 28, 2023, 3:23:56 PM8/28/23
to niceeditor
Hello.
It is confusing which options effect this, I was experimenting with utf-8 options, where I write two strings using Greek alphabet, and then try to complete a string to one of them, it completes the prefix, but it won't let me type one of the characters that would complete one of the strings. This also goes for certain latin non-ASCII characters.

Aleksa Petrovic

unread,
Aug 28, 2023, 3:36:28 PM8/28/23
to niceeditor
When using Greek alphabet, as I said it completes the prefix, but I can't type in the next character so as to complete the string further. But when using Cyrillic, it fails to complete even the prefix, giving the error that I mentioned in the first post, "This string is not supported in this configuration."

Aleksa Petrovic

unread,
Sep 6, 2023, 6:03:06 AM9/6/23
to niceeditor
Hello. I was testing and I have some strings that trigger this error. If you have strings ыкув and ыкуа, or средњ (средљ) and средс, then it will fail at completing the prefix to any of them. I don't know why љ and њ trigger it. If you are using Greek then you won't be able to select the string to complete to by typing the letters (you can only use arrow keys to select the string manually), even though I have UTF8Auto on.

Sebastiano Vigna

unread,
Sep 6, 2023, 6:09:06 AM9/6/23
to nicee...@googlegroups.com


> On 6 Sep 2023, at 12:03, Aleksa Petrovic <aleksa.petr...@gmail.com> wrote:
>
> Hello. I was testing and I have some strings that trigger this error. If you have strings ыкув and ыкуа, or средњ (средљ) and средс, then it will fail at completing the prefix to any of them. I don't know why љ and њ trigger it. If you are using Greek then you won't be able to select the string to complete to by typing the letters (you can only use arrow keys to select the string manually), even though I have UTF8Auto on.
>
>

It would be very useful if you could provide a file that reproduces the problem, as a lot of UTF-8 stuff changes when it's pasted and goes through email.

Ciao,

seba

Aleksa Petrovic

unread,
Sep 6, 2023, 10:01:38 AM9/6/23
to niceeditor
I am sending the file that contains letters (all the letters are from the same script).

Sebastiano Vigna

unread,
Sep 6, 2023, 10:08:44 AM9/6/23
to nicee...@googlegroups.com


> On 6 Sep 2023, at 16:01, Aleksa Petrovic <aleksa.petr...@gmail.com> wrote:
>
> I am sending the file that contains letters (all the letters are from the same script).
> https://drive.google.com/file/d/1WqpQxOIU5aKbGj4Ni0O681PxILpK5zSe/view?usp=sharing
>

Ok. It looks like a genuine bug!

Ciao,

seba

uto...@gmail.com

unread,
Sep 23, 2023, 12:53:17 PM9/23/23
to niceeditor
The master branch at https://github.com/vigna/ne has a fix for this.

The issue was in finding the longest common prefix. The 5-character strings in your sample file all take 2 bytes per character to encode. That's 10 bytes each. They have in common the first 9 bytes; the 10th byte distinguishes "с" from "њ". It was inappropriately splitting the last character to claim the first 4½ characters as the common prefix. The resulting 4½ character prefix isn't valid utf-8, so it became 8-bit, which can't be inserted into a utf-8 document without changing the document's encoding to 8-bit as well.

Thanks for bringing this issue to our attention!
--
Todd

Aleksa Petrovic

unread,
Sep 24, 2023, 5:40:33 PM9/24/23
to niceeditor
Thank you a lot for solving this problem! I have just tried the master branch of NE and I can confirm that this works for the different scripts that I tried.
And I thank you for the excellent work that you do on this editor!

Aleksa Petrovic

unread,
Sep 24, 2023, 5:43:29 PM9/24/23
to niceeditor
Just to clarify, I want to ask if one is supposed to select which string to complete to by typing the differentiating characters, if the characters are UTF-8?
I see it's able to complete UTF-8 strings now, but can we select them by typing UTF-8 characters?

uto...@gmail.com

unread,
Sep 24, 2023, 6:22:03 PM9/24/23
to niceeditor
That's the intent. If you knew where the cursor was supposed to be, you could select by entering UTF-8 characters. However, as you've probably discovered, there some issue still displaying requesters with UTF-8 strings. In particular, the cursor gets displayed farther to the right than it should be because it's counting bytes rather than characters. It will take a bit more work to get those issue worked out.
Reply all
Reply to author
Forward
0 new messages