Problem printing out central european characters

Luka Djigas

unread,

Dec 26, 2011, 11:05:41 AM12/26/11

to

This is expecially annoying, since my old _vimrc had something which
enabled printing them out correctly, but I've lost it so...

I'm trying to get Vim to print correctly Central european characters
(šđčćž ŠĐČĆŽ). Printing them out to PDF currently gives out garbage.

Anyone have any idea where the problem might lie?

--------
My vimrc contains:

set encoding=utf-8
set fileencodings=ucs-bom,utf8,cp1250,latin1
set guifont=Consolas:h9:cEASTEUROPE
set printfont=Consolas:h9:cEASTEUROPE

Using the latest Vim copy (OLE ...) 7.3. from vim.org
--------

-- Luka

Andreas Prilop

unread,

Dec 27, 2011, 11:29:45 AM12/27/11

to

On Mon, 26 Dec 2011, Luka Djigas wrote:

> I'm trying to get Vim to print correctly Central european characters

> Printing them out to PDF currently gives out garbage.

Explain exactly what "garbage" means!
For example:
In each place where a letter "C with acute accent" is expected,
a small picture of a yellow rose is shown instead.

> set encoding=utf-8

> set guifont=Consolas:h9:cEASTEUROPE
> set printfont=Consolas:h9:cEASTEUROPE

This is a contradiction. cEASTEUROPE means code page 1250 or
Windows-1250, which is *not* UTF-8.

--
From the New World:
http://www.google.com/search?ie=ISO-8859-2&q=Dvofi%E1k

Luka Djigas

unread,

Dec 28, 2011, 1:12:16 PM12/28/11

to

On Tue, 27 Dec 2011 17:29:45 +0100, Andreas Prilop
<prilo...@trashmail.net> wrote:

>On Mon, 26 Dec 2011, Luka Djigas wrote:
>
>> I'm trying to get Vim to print correctly Central european characters
>> Printing them out to PDF currently gives out garbage.
>
>Explain exactly what "garbage" means!
>For example:
> In each place where a letter "C with acute accent" is expected,
> a small picture of a yellow rose is shown instead.

Andreas, I don't know whether you are serious or trying to get into a
trolling match.
Garbage ... I don't know how to describe it better. Some machine code
symbols, something from a character set table far away from anything
used in normal text ... garbage.

Here, this is what it looks like after a copy/paste attempt
A!Ä‘Ä?Ä‡A3A Ä?ÄOÄ†A1

I couldn't say whether that reminds anyone of a field of yellow roses.

>
>> set encoding=utf-8
>> set guifont=Consolas:h9:cEASTEUROPE
>> set printfont=Consolas:h9:cEASTEUROPE
>
>This is a contradiction. cEASTEUROPE means code page 1250 or
>Windows-1250, which is *not* UTF-8.

I've tried changing both the guifont, and the printfont to just
Consolas:h9. The hardcopy "printout" still remains bad.

-- Luka

Andreas Prilop

unread,

Dec 28, 2011, 1:37:50 PM12/28/11

to

On Wed, 28 Dec 2011, Luka Djigas wrote:

> Andreas, I don't know whether you are serious

Yes.

> or trying to get into a trolling match.

No.

> Garbage ... I don't know how to describe it better.

Can you provide an image/a screen shot?

>>> set guifont=Consolas:h9:cEASTEUROPE
>>> set printfont=Consolas:h9:cEASTEUROPE
>

> I've tried changing both the guifont, and the printfont to just
> Consolas:h9. The hardcopy "printout" still remains bad.

What is your value for "fileencoding"?
NB: "fileencoding" and "fileencodings" are something different.
Try
set fileencoding=cp1250

Luka Djigas

unread,

Dec 29, 2011, 9:35:07 AM12/29/11

to

On Wed, 28 Dec 2011 19:37:50 +0100, Andreas Prilop
<prilo...@trashmail.net> wrote:

>> or trying to get into a trolling match.
>
>No.

My apologies...

I swear, I thought you were a troll :/

>
>> Garbage ... I don't know how to describe it better.
>
>Can you provide an image/a screen shot?
>

Yes, http://i44.tinypic.com/120tr93.jpg

>>>> set guifont=Consolas:h9:cEASTEUROPE
>>>> set printfont=Consolas:h9:cEASTEUROPE
>>
>> I've tried changing both the guifont, and the printfont to just
>> Consolas:h9. The hardcopy "printout" still remains bad.
>
>What is your value for "fileencoding"?
>NB: "fileencoding" and "fileencodings" are something different.
>Try
> set fileencoding=cp1250

All my options related to encoding and fonts:

set encoding=utf-8
set fileencodings=ucs-bom,utf8,cp1250,latin1

set guifont=Consolas:h9:cEASTEUROPE
set printfont=Consolas:h9:cEASTEUROPE

Btw, I've found out that if I change

set encoding=cp1250

and then try to print the text out it comes out okey. But it is weird,
since UTF8 is supposed to be a much larger character set, and it is
supposed to include those characters as well. I would like to keep using
it as a value for encoding, but don't know wha tis the problem.

I'd appreciate any advice you can offer on the matter.

Luka

Andreas Prilop

unread,

Dec 29, 2011, 12:08:27 PM12/29/11

to

On Thu, 29 Dec 2011, Luka Djigas wrote:

> http://i44.tinypic.com/120tr93.jpg

This means that the letters are actually encoded in UTF-8,
but incorrectly interpreted as Windows-1250.
You can see the same effect on
http://www.user.uni-hannover.de/nhtcapri/multilingual1.html#latin
when you manually select the encoding Central European Windows-1250
in your browser.

> Btw, I've found out that if I change
> set encoding=cp1250
> and then try to print the text out it comes out okey.

Try
set fileencoding=cp1250
instead and keep
set encoding=utf-8

> But it is weird, since UTF8 is supposed to be a much larger
> character set, and it is supposed to include those characters
> as well. I would like to keep using it as a value for encoding,
> but don't know wha tis the problem.

Did you read
http://vimdoc.sourceforge.net/htmldoc/usr_45.html#45.3
http://vimdoc.sourceforge.net/htmldoc/os_win32.html
?

I have no practical experience with the Windows version of Vim,
only with Linux Vim.

Luka Djigas

unread,

Dec 29, 2011, 6:24:26 PM12/29/11

to

On Thu, 29 Dec 2011 18:08:27 +0100, Andreas Prilop
<prilo...@trashmail.net> wrote:

>On Thu, 29 Dec 2011, Luka Djigas wrote:
>
>> http://i44.tinypic.com/120tr93.jpg
>
>This means that the letters are actually encoded in UTF-8,
>but incorrectly interpreted as Windows-1250.
>You can see the same effect on
> http://www.user.uni-hannover.de/nhtcapri/multilingual1.html#latin
>when you manually select the encoding Central European Windows-1250
>in your browser.

:))

>Try
> set fileencoding=cp1250
>instead and keep
> set encoding=utf-8
>

Same thing (or something visually very similar).

>> But it is weird, since UTF8 is supposed to be a much larger
>> character set, and it is supposed to include those characters
>> as well. I would like to keep using it as a value for encoding,
>> but don't know wha tis the problem.
>
>Did you read
> http://vimdoc.sourceforge.net/htmldoc/usr_45.html#45.3
> http://vimdoc.sourceforge.net/htmldoc/os_win32.html
>?

Uhh, to tell, frankly, I'm not sure. The other day when I was
"researching" this I've read so many things from Vim's help that it's
still all a bit blurry ....
I just wanted a system that lets me use CE characters (since they form a
part of my alphabet) and unicode (for some special symbols not normally
available in Latin1 & Co. character sets) ...

>I have no practical experience with the Windows version of Vim,
>only with Linux Vim.

Do you think there is a difference when it comes to this?

-- Luka

Tony Mechelynck

unread,

Dec 30, 2011, 2:11:05 PM12/30/11

to

On 30/12/11 00:24, Luka Djigas wrote:
> On Thu, 29 Dec 2011 18:08:27 +0100, Andreas Prilop
> <prilo...@trashmail.net> wrote:
>
>> On Thu, 29 Dec 2011, Luka Djigas wrote:
>>
>>> http://i44.tinypic.com/120tr93.jpg
>>
>> This means that the letters are actually encoded in UTF-8,
>> but incorrectly interpreted as Windows-1250.
>> You can see the same effect on
>> http://www.user.uni-hannover.de/nhtcapri/multilingual1.html#latin
>> when you manually select the encoding Central European Windows-1250
>> in your browser.
>
> :))
>
>> Try
>> set fileencoding=cp1250
>> instead and keep
>> set encoding=utf-8
>>
> Same thing (or something visually very similar).
>
>>> But it is weird, since UTF8 is supposed to be a much larger
>>> character set, and it is supposed to include those characters
>>> as well. I would like to keep using it as a value for encoding,

Yes, UTF-8 can represent anything in cp1250, and many things that cp1250
cannot represent, but it doesn't represent them the same way:

- the 128 7-bit characters in US-ASCII are mapped at the same Unicode
codepoint numbers (U+0000 to U+007F) _and_ UTF-8 represents them the
same way, by a single byte each.
- the 256 8-bit characters in Latin1 aka ISO-8859-1 are mapped at the
same Unicode codepoints (U+0000 to U+00FF) but only the first half
(common with US-ASCII) are represented the same way. Codepoints U+0080
to U+07FF require two bytes in UTF-8, higher codepoints need even more.
- See http://www.unicode.org/charts/ from where you can browse or
download PDF code charts for the various sections of the Unicode
codepoint range.

UTF-8 represents Unicode codepoints as follows:

U+0000 to U+007F are one byte whose top bit is a zero bit.
The rest are two or more bytes, of which:
- the first byte (leading byte) has as many one-bits at top as there are
bytes in the whole multibyte sequence, then one zero bit, the rest are
data bits
- the other bytes (trailer bytes) have their two top bits set to 10, the
rest is six data bits
- the data bits are ordered "big end first" among the various bytes.

For instance, the Chinese "number one" character, an ideogram consisting
of just one horizontal stroke, is U+4E00, or 100.1110.0000.0000 in
binary. This is more than twelve bits, so three bytes will be necessary.
In UTF-8 it is represented as 1110.0100 10.111000 10.000000 (where I
separate the bytes by a space and status bits from data bits by a dot),
or E4 B8 80.

Since the main East-European accented Latin characters are among the
"Latin Extended-A" characters at U+0100 to U+017F, they are represented
in UTF-8 by two bytes each, between C4 80 and C5 BF. If there is a
misunderstanding between Vim and Windows about how the clipboard is
coded, you could get two characters for each codepoint, the first of
which woulod be C4 or C5, i.e. (if misinterpreted as Latin1) Ä
(A-umlaut) or Å (A-ball).

>>> but don't know wha tis the problem.
>>
>> Did you read
>> http://vimdoc.sourceforge.net/htmldoc/usr_45.html#45.3
>> http://vimdoc.sourceforge.net/htmldoc/os_win32.html
>> ?
>
> Uhh, to tell, frankly, I'm not sure. The other day when I was
> "researching" this I've read so many things from Vim's help that it's
> still all a bit blurry ....
> I just wanted a system that lets me use CE characters (since they form a
> part of my alphabet) and unicode (for some special symbols not normally

> available in Latin1& Co. character sets) ...

>
>> I have no practical experience with the Windows version of Vim,
>> only with Linux Vim.
>
> Do you think there is a difference when it comes to this?
>
> -- Luka

OK, another try:

In the vimrc:

if has('multi_byte')
if &enc !~? '^u' " caret-u, not control-u
if &tenc == ""
" avoid clobbering keyboard locale
let &tenc = &enc
endif
set enc=utf-8
endif
set fencs=ucs-bom,utf-8,cp1250
" anything after cp1250 (which is 8-bit)
" would be ignored anyway
endif

If (and only if) it still doesn't work, try adding

language ctype Polish_Poland.65001

(according to http://en.wikipedia.org/wiki/Windows_code_pages#List ,
65001 is the "Windows code page number" for UTF-8).

If it _still_ doesn't work, you may have to set your "Country settings"
(or whatever they are called) on Windows to use code page 65001 "Unicode
(UTF-8)" (or some such). But this could give problems in other programs,
so it is only a last resort.

Best regards,
Tony.
--
You're not drunk if you can lie on the floor without holding on.
-- Dean Martin

Andreas Prilop

unread,

Jan 2, 2012, 12:45:08 PM1/2/12

to

On Fri, 30 Dec 2011, Luka Djigas wrote:

>> I have no practical experience with the Windows version of Vim,
>> only with Linux Vim.
>
> Do you think there is a difference when it comes to this?

In Linux, you must set the encoding of your terminal to UTF-8.
I don’t know what you must do in MS Windows (if anything).