Annotation encoding in C#

339 views
Skip to first unread message

Misha Konvisar

unread,
Aug 15, 2013, 4:40:21 PM8/15/13
to scintilla...@googlegroups.com
Hello everyone,

I'm having a minor issue.
Writing a plugin for Notepad++. This plugin is adding annotations to lines in document.
The problem is that I'm not able to display a Cyrillic text in annotation when I have my document in UTF encoding.
When I switch document to ANSI, then I'm able to see a Cyrillic annotation text, but not able to see a Cyrillic text in document.
I see just strange unreadable characters in annotations box.

That is how I'm adding annotation box:
public static void AddCommentToLine(int position, string text)
{
    Encoding unicode = Encoding.Unicode;


    Encoding encoder = Encoding.UTF8;//.GetEncoding(1253); here I've tried different outuput encodings, but no result...

    string strEncoded = encoder.GetString(Encoding.Convert(unicode, encoder, unicode.GetBytes(text)));
    Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position, strEncoded);
}

I had similar problem, when was trying to read line from document, but this was solved with encoding scintilla output to UTF8, after that my C# code was able to work with scintilla text correctly.

Has anybody face this problem?
I think my problem is transferring Unicode string to scintilla editor.
Should I use some styles for annotations?

Thank in advance!

Neil Hodgson

unread,
Aug 15, 2013, 8:01:42 PM8/15/13
to scintilla...@googlegroups.com
Misha Konvisar:

Writing a plugin for Notepad++. This plugin is adding annotations to lines in document.
The problem is that I'm not able to display a Cyrillic text in annotation when I have my document in UTF encoding.

   The encoding used for annotations is the same as for the document. For a document in UTF-8, the annotations should also be UTF-8.

   Here's an image of a UTF-8 file in SciTE showing Unicode annotations by using its error.inline feature:


Should I use some styles for annotations?

   Yes, you should set the styles for annotations. First try the same style settings as the text being annotated.

   Neil

Misha Konvisar

unread,
Aug 16, 2013, 4:33:50 AM8/16/13
to scintilla...@googlegroups.com
Hi Neil,

thank you for help.

I'm trying to set annotation style in this way.

//before adding annotation to line, I'm reading style at position (guess here should be position of a character position, not line)
int style = (int)Win32.SendMessage(curScintilla, SciMsg.SCI_GETSTYLEAT, position, 0);

//add annotation to line
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position, text);

//apply saved style to annotation
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETSTYLE, position, style);

But still have my annotation unreadable.

When I switch Npp encoding to ANSI, annotations are displayed correctly, but Cyrillic text in document is unreadable.
Встроенное изображение 1

When Npp encoding is switched to UTF8, situation is reverted.
Встроенное изображение 2


From Scintilla Documentation, I couldn't find any style message, that is able to set annotation encoding.
Could you please clarify how to get "First try the same style settings as the text being annotated." is a "SCI_GETSTYLEAT" message a good way?


Thank you.



2013/8/16 Neil Hodgson <nyama...@me.com>

--
You received this message because you are subscribed to the Google Groups "scintilla-interest" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scintilla-inter...@googlegroups.com.
To post to this group, send email to scintilla...@googlegroups.com.
Visit this group at http://groups.google.com/group/scintilla-interest.
For more options, visit https://groups.google.com/groups/opt_out.

2.png
1.png
PastedGraphic-1.tiff

zebrox

unread,
Aug 16, 2013, 9:00:16 AM8/16/13
to scintilla...@googlegroups.com
Still no success.
I've set defined AnnotationStyleId == 3 and set character set for it to Cyrilic

Win32.SendMessage(curScintilla, SciMsg.SCI_STYLESETCHARACTERSET, AnnotationStyleId, (int)SciMsg.SC_CHARSET_CYRILLIC);

Then, in code adding annotation I'm doing conversion to UTF8, adding annotation, applying AnnotationStyleId style to it.
But still get those strange characters...


public static void AddCommentToLine(int position, string text)
{
    Encoding encSrc = Encoding.Unicode;
    Encoding encDest = Encoding.UTF8;
    string strEncoded = encDest.GetString(Encoding.Convert(encSrc, encDest, encSrc.GetBytes(text)));

    Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position, strEncoded);
    Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETSTYLE, position, AnnotationStyleId);
}

Now I'm a bit stuck...

Neil Hodgson

unread,
Aug 16, 2013, 9:21:27 AM8/16/13
to scintilla...@googlegroups.com
zebrox:

> Encoding encSrc = Encoding.Unicode;
> Encoding encDest = Encoding.UTF8;
> string strEncoded = encDest.GetString(Encoding.Convert(encSrc, encDest, encSrc.GetBytes(text)));

That looks confusing. Dump the bytes before and after conversion along with what you think the text should be.

Neil

Misha Konvisar

unread,
Aug 16, 2013, 9:36:13 AM8/16/13
to scintilla...@googlegroups.com
Hi Neil,

thanks for answer, already did it, but dont know how to treat the results...

so code is following:

Encoding encSrc = Encoding.Unicode;
Encoding encDest = Encoding.UTF8;

text = "ыыыы";
ShowBytes("before", encSrc, text);

string strEncoded = encDest.GetString(Encoding.Convert(encSrc, encDest, encSrc.GetBytes(text)));
ShowBytes("after", encDest, strEncoded);

and ShowByts outputs are:
Встроенное изображение 1Встроенное изображение 2



1.png
2.png

zebrox

unread,
Aug 16, 2013, 10:11:20 AM8/16/13
to scintilla...@googlegroups.com
public static void AddCommentToLine(int position, string text)
{
    //Error 1, direction of conversion
    Encoding encSrc = Encoding.UTF8;
    Encoding encDest = Encoding.Unicode;

    //Error 2, wrong procedure
    string strEncoded = encDest.GetString(encSrc.GetBytes(text));

    //Error 3, some tricks of passing managed strings to unmanaged code
    //http://stackoverflow.com/questions/11090427/make-intptr-in-c-net-point-to-string-value
    IntPtr strPtr = Marshal.StringToHGlobalUni(strEncoded);
    Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position, strPtr);
    Marshal.FreeHGlobal(strPtr);
}
So finally I got my annotations in Russian!

Thanks for help, Neil.

Dave Brotherstone

unread,
Aug 16, 2013, 11:20:47 AM8/16/13
to scintilla...@googlegroups.com
You shouldn't actually need to marshal the string when passing a string in UTF8 to Scintilla (only when you want Scintilla to fill a buffer for you, and even then a stringbuffer with a reserved capacity is normally marshalled correctly automatically).  The problem was your encoding and subsequent decoding of the string.  Strings in C# are UTF-16 (so, Encoding.Unicode), always.  Whenever you have a string object, it's always (internally) encoded in UTF-16, there's no such thing as a C# "string" object that has a different encoding. So when you do encSrc.GetBytes(text), you're getting the UTF-8 bytes of the string. When you then pass that to encDest.GetString( ...), you're passing in UTF-8 byte sequence and asking it to treat it as UTF-16, which it then converts to a string object.  This obviously comes out as garbage.  What you want to do is *just* convert it to UTF-8, then pass *those* bytes to Scintilla. 

I don't know the signature of the Win32.SendMessage method, but I expect the following would do what you're after.

byte[] utf8Text = Encoding.UTF8.GetBytes(text);
Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position, utf8Text);

Depending on the signature, you might need to cast it to something.

Hope that helps,
Dave.

PS If you've not seen it before, http://www.joelonsoftware.com/articles/Unicode.html is a great article on how all this unicode/utf8/utf16 stuff fits together.

Misha Konvisar

unread,
Aug 16, 2013, 6:44:35 PM8/16/13
to scintilla...@googlegroups.com

Hi Dave,

thanks for comment and interesting link
You suggestion is working. No need to make strange conversions.

But two things I still had to do.
1. Add terminating zero to original string, as scintilla was displaying random characters at the end of annotation.
2. As I don't have overloaded method Win32.SendMessage accepting byte[] as fourth parameter, I had to obtain IntPtr pointer to UTF8 byte[] array.

So, final code looks like this:

public static void AddCommentToLine(int position, string text)
{
    //add teminating zero to string
    text += char.MinValue;
    byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);

    //http://stackoverflow.com/questions/537573/how-to-get-intptr-from-byte-in-c-sharp
    IntPtr unmanagedPointer = Marshal.AllocHGlobal(utf8Bytes .Length);
    Marshal.Copy(utf8Bytes , 0, unmanagedPointer, utf8Bytes .Length);
    Win32.SendMessage(curScintilla, SciMsg.SCI_ANNOTATIONSETTEXT, position, unmanagedPointer);
    Marshal.FreeHGlobal(unmanagedPointer);
}

W.M.

unread,
Sep 2, 2016, 5:32:34 PM9/2/16
to scintilla-interest, mkon...@gmail.com
Misha, Dave and others.. Thank you very much.. The final code as presented by Misha works perfectly.. In my case, I found no need for this piece `char.MinValue`, Thanks :-)
Reply all
Reply to author
Forward
0 new messages