partial patch to fix the different encodings bug

14 views
Skip to first unread message

Дмитрий Франк

unread,
Dec 20, 2011, 1:25:03 AM12/20/11
to ecli...@googlegroups.com
Hello.

I have faced the problem: at work i use Windows, my language is russian (cyrillic), so, Windows encoding is cp1251. I have to use the same &encoding in Vim ( cp1251 ), because if i set another &encoding, then all system messages becomes garbage inside Vim.

I edit files in utf-8. Cyrillic characters is multi-byte in utf-8. When i try to autocomplete file containing multi-byte characters, i got autocompletion broken, because of Vim's function line2byte() works with &encoding, not with &fileencoding.  I have mailed to vim...@googlegroups.com , but they said that this behavior will not change, and suggested some workaround. Here is the thread on vim_dev: http://goo.gl/ef89i

I implemented this workaround, so, here's patch: http://goo.gl/b5ztR  (only one file is patched, eclim/autoload/eclim/util.vim )
Now, autocomplete works for me on files with any encodings. Nice!

But this is just a part of the different encoding problem. The second part is that signs is misplaced if &encoding != &fileencoding .  I just figured out that this is not Vim part, but eclimd : after calling eclim#ExecuteEclim() result is wrong. So, there's need to fix server eclimd, i don't familiar with it.

How to reproduce the bug: start eclimd, open any java file from the proper project, do the following:

:set &encoding=cp1251
:set &fileencoding=utf-8

insert in the begin of the file comment with any multi-byte text (say, you can copy-paste this test text: http://goo.gl/rNkRF ), and initiate some errors or warnings in this file. You will see that signs is upper than they should be.

This is not very critical for me (less than autocomplete), but anyway it is annoying. Could you please try to solve this?

Eric Van Dewoestine

unread,
Dec 21, 2011, 2:14:21 PM12/21/11
to ecli...@googlegroups.com
On 2011-12-20 10:25:03, Дмитрий Франк wrote:
> Hello.
>
> I have faced the problem: at work i use Windows, my language is russian
> (cyrillic), so, Windows encoding is cp1251. I have to use the same
> &encoding in Vim ( cp1251 ), because if i set another &encoding, then all
> system messages becomes garbage inside Vim.
>
> I edit files in utf-8. Cyrillic characters is multi-byte in utf-8. When i
> try to autocomplete file containing multi-byte characters, i got
> autocompletion broken, because of Vim's function line2byte() works with
> &encoding, not with &fileencoding. I have mailed to
> vim...@googlegroups.com , but they said that this behavior will not
> change, and suggested some workaround. Here is the thread on vim_dev:
> http://goo.gl/ef89i

I'm on the vim_dev mailing list as well so I've been following the
discussion.

> I implemented this workaround, so, here's patch: http://goo.gl/b5ztR (only
> one file is patched, eclim/autoload/eclim/util.vim )
> Now, autocomplete works for me on files with any encodings. Nice!

I haven't checked it in yet, but I've applied the necessary changes to
fix this.

> But this is just a part of the different encoding problem. The second part
> is that signs is misplaced if &encoding != &fileencoding . I just figured
> out that this is not Vim part, but eclimd : after calling
> eclim#ExecuteEclim() result is wrong. So, there's need to fix server
> eclimd, i don't familiar with it.
>
> How to reproduce the bug: start eclimd, open any java file from the proper
> project, do the following:
>
> :set &encoding=cp1251
> :set &fileencoding=utf-8
>

> insert *in the begin of the file* comment with any multi-byte text (say,


> you can copy-paste this test text: http://goo.gl/rNkRF ), and initiate some
> errors or warnings in this file. You will see that signs is upper than they
> should be.
>
> This is not very critical for me (less than autocomplete), but anyway it is
> annoying. Could you please try to solve this?

Although I could easily reproduce the code completion issue I've been
unable to reproduce the misplacement of the validation signs.

Can you check your default jvm file encoding by running the following
through a test class:

System.out.println(java.nio.charset.Charset.defaultCharset().name());

--
eric

Дмитрий Франк

unread,
Dec 21, 2011, 2:38:02 PM12/21/11
to ecli...@googlegroups.com


2011/12/21 Eric Van Dewoestine <erva...@gmail.com>
Eric, the result is "windows-1251"
(this is equivalent to cp1251) 


--
eric

--
You received this message because you are subscribed to the Google Groups "eclim-dev" group.
To post to this group, send email to ecli...@googlegroups.com.
To unsubscribe from this group, send email to eclim-dev+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/eclim-dev?hl=en.


Eric Van Dewoestine

unread,
Dec 21, 2011, 2:44:30 PM12/21/11
to ecli...@googlegroups.com

Hmm, that's what I've set mine to and I still can't reproduce the
issue. Can you give me a full example of a file that you can trigger
this issue with?

--
eric

Дмитрий Франк

unread,
Dec 21, 2011, 3:47:44 PM12/21/11
to ecli...@googlegroups.com


21 декабря 2011 г. 23:44 пользователь Eric Van Dewoestine <erva...@gmail.com> написал:
Sure.
Here's file Test.java : http://goo.gl/R7x0V
Here's screenshot: http://goo.gl/QtmUE
 
--
eric

--
You received this message because you are subscribed to the Google Groups "eclim-dev" group.
To post to this group, send email to ecli...@googlegroups.com.
To unsubscribe from this group, send email to eclim-dev+unsubscribe@googlegroups.com.

Eric Van Dewoestine

unread,
Dec 21, 2011, 4:39:09 PM12/21/11
to ecli...@googlegroups.com

I'm still having issues reproducing this. I've gone so far as to
attempt to setup Russian as my default language on my windows VM, but
when I open that file in gvim using cp1251 as the encoding, those
comments display as a long series of garbage until I set encoding to
utf-8 (but then vim's messages become a different type of garbage like
your screenshot sent to vim_dev). Regardless of how the text looks
though, the sign always displays in the correct place.

One thing you can try that in theory would fix the problem, would be
to start eclimd with: -Dfile.encoding=utf-8

Since the files you are editing are using utf-8, but your eclipse is
defaulting to cp1251, forcing utf-8 to be the default should fix the
translation of eclipse offsets to vim line/column numbers.

--
eric

Дмитрий Франк

unread,
Dec 21, 2011, 5:02:58 PM12/21/11
to ecli...@googlegroups.com


22 декабря 2011 г. 1:39 пользователь Eric Van Dewoestine <erva...@gmail.com> написал:
Very strange. Dark encoding forces =)
 
One thing you can try that in theory would fix the problem, would be
to start eclimd with: -Dfile.encoding=utf-8


Wow! This is solved now! Thank you very much =))
Could you please post this solution to eclim's FAQ or Troubleshooting? I think i'm not alone who use eclim on Windows with one-byte encoding.

**
By the way, not regarding to today's question, but regarding to eclim's documentation:
When i use embedded Vim, i face the issue with the embedded gvim’s command line being cut off.
Suggested options ( set guioptions-=m |  set guioptions-=T ) didn't help me, but some another did:

set guioptions-=L " remove left scrollbar when window is splitted vertically
set guioptions-=l  " remove left scrollbar

Now, this issue is gone. I would add these options to the docs with embedded eclim ( http://eclim.org/eclimd.html?highlight=embedded )

**
Thanks again!
 
Since the files you are editing are using utf-8, but your eclipse is
defaulting to cp1251, forcing utf-8 to be the default should fix the
translation of eclipse offsets to vim line/column numbers.

--
eric

Дмитрий Франк

unread,
Dec 21, 2011, 11:47:16 PM12/21/11
to ecli...@googlegroups.com


22 декабря 2011 г. 1:39 пользователь Eric Van Dewoestine <erva...@gmail.com> написал:
Where can i define options to eclimd when it's started from Eclipse?
I would use embedded gvim, so, eclimd is started from Eclipse automatically, and unfortunately i can't find where can i set up options for it.
 


Since the files you are editing are using utf-8, but your eclipse is
defaulting to cp1251, forcing utf-8 to be the default should fix the
translation of eclipse offsets to vim line/column numbers.

--
eric

Eric Van Dewoestine

unread,
Dec 22, 2011, 10:18:31 AM12/22/11
to ecli...@googlegroups.com
On 2011-12-22 08:47:16, Дмитрий Франк wrote:
> > I'm still having issues reproducing this. I've gone so far as to
> > attempt to setup Russian as my default language on my windows VM, but
> > when I open that file in gvim using cp1251 as the encoding, those
> > comments display as a long series of garbage until I set encoding to
> > utf-8 (but then vim's messages become a different type of garbage like
> > your screenshot sent to vim_dev). Regardless of how the text looks
> > though, the sign always displays in the correct place.
> >
> > One thing you can try that in theory would fix the problem, would be
> > to start eclimd with: -Dfile.encoding=utf-8
>
>
> Where can i define options to eclimd when it's started from Eclipse?
> I would use embedded gvim, so, eclimd is started from Eclipse
> automatically, and unfortunately i can't find where can i set up options
> for it.

In this case file.encoding must be set at jvm startup time, so your
best bet would be to edit the eclipse.ini file in your eclipse home
directory and add -Dfile.encoding=utf-8 on a new line below the
-vmargs line.

Eric Van Dewoestine

unread,
Dec 22, 2011, 4:58:05 PM12/22/11
to ecli...@googlegroups.com
On 2011-12-22 02:02:58, Дмитрий Франк wrote:
> > I'm still having issues reproducing this. I've gone so far as to
> > attempt to setup Russian as my default language on my windows VM, but
> > when I open that file in gvim using cp1251 as the encoding, those
> > comments display as a long series of garbage until I set encoding to
> > utf-8 (but then vim's messages become a different type of garbage like
> > your screenshot sent to vim_dev). Regardless of how the text looks
> > though, the sign always displays in the correct place.
> >
> >
> Very strange. Dark encoding forces =)
>
>
> > One thing you can try that in theory would fix the problem, would be
> > to start eclimd with: -Dfile.encoding=utf-8
> >
> >
> Wow! This is solved now! Thank you very much =))

Great to hear!

> Could you please post this solution to eclim's FAQ or Troubleshooting? I
> think i'm not alone who use eclim on Windows with one-byte encoding.

I just checked in a change which adds an entry to the troubleshooting
section which links to the FAQ on how to set the default file
encoding.

> **
> By the way, not regarding to today's question, but regarding to eclim's
> documentation:
> When i use embedded Vim, i face the issue with the embedded gvim's command
> line being cut off.
> Suggested options ( set guioptions-=m | set guioptions-=T ) didn't help
> me, but some another did:
>
> set guioptions-=L " remove left scrollbar when window is splitted vertically
> set guioptions-=l " remove left scrollbar
>
> Now, this issue is gone. I would add these options to the docs with
> embedded eclim ( http://eclim.org/eclimd.html?highlight=embedded )
>
> **

I also checked in an update to those docs suggesting these settings as
well. Thanks for tracking that down :)

> Thanks again!
>
>
> > Since the files you are editing are using utf-8, but your eclipse is
> > defaulting to cp1251, forcing utf-8 to be the default should fix the
> > translation of eclipse offsets to vim line/column numbers.

FYI, I also checked in the change[1] to GetOffset to handle the differing
fileencoding vs encoding:

[1] https://github.com/ervandew/eclim/commit/007a9be55073016962714dbce45827b2837d235a

--
eric

Дмитрий Франк

unread,
Dec 22, 2011, 11:56:26 PM12/22/11
to ecli...@googlegroups.com


23 декабря 2011 г. 1:58 пользователь Eric Van Dewoestine <erva...@gmail.com> написал:
Great. Thanks for responsiveness!

Дмитрий Франк

unread,
Dec 22, 2011, 11:58:23 PM12/22/11
to ecli...@googlegroups.com


2011/12/22 Eric Van Dewoestine <erva...@gmail.com>

On 2011-12-22 08:47:16, Дмитрий Франк wrote:
> > I'm still having issues reproducing this. I've gone so far as to
> > attempt to setup Russian as my default language on my windows VM, but
> > when I open that file in gvim using cp1251 as the encoding, those
> > comments display as a long series of garbage until I set encoding to
> > utf-8 (but then vim's messages become a different type of garbage like
> > your screenshot sent to vim_dev). Regardless of how the text looks
> > though, the sign always displays in the correct place.
> >
> > One thing you can try that in theory would fix the problem, would be
> > to start eclimd with: -Dfile.encoding=utf-8
>
>
> Where can i define options to eclimd when it's started from Eclipse?
> I would use embedded gvim, so, eclimd is started from Eclipse
> automatically, and unfortunately i can't find where can i set up options
> for it.

In this case file.encoding must be set at jvm startup time, so your
best bet would be to edit the eclipse.ini file in your eclipse home
directory and add -Dfile.encoding=utf-8 on a new line below the
-vmargs line.


Yeah, thanks, that helped.
 
> >
> >
> > Since the files you are editing are using utf-8, but your eclipse is
> > defaulting to cp1251, forcing utf-8 to be the default should fix the
> > translation of eclipse offsets to vim line/column numbers.

--
eric

Reply all
Reply to author
Forward
0 new messages