U+1ED9 ộ LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW
See:
:help ga
In utf-8, this character is encoded by the following sequence of three
bytes:
0xe1, 0xbb, 0x99
See:
:help g8
This is what a utf-8 encoded file with the three characters 'bột'
actually contains:
00000000 62 e1 bb 99 74 0a |b...t.|
00000006
0x62 b LATIN SMALL LETTER B
0xe1,0xbb,0x99 ộ LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW
0x74 t LATIN SMALL LETTER T
The final 0x0a is a line feed control character.
In Microsoft Windows' cp1252:
0xe1 á
0xbb »
0x99 ™
http://en.wikipedia.org/wiki/Windows-1252
You do not give much detail as to where you see what, but I am probably
not far off the mark assuming that 'bột' is what you see when editing a
utf-8 encoded file in vim, and that 'bá»™t' is what you see on your
printout.
Being unfamiliar with Microsoft Windows, I'm speculating a bit, but it
does look like your printing software is processing the file as if it
were cp1252 rather than utf-8.
> My printing options are:
>
> set printfont=LMMono10:h10 " This is the LMMono from LaTeX Latin Modern
> set printoptions=number:y
> set printencoding=ucs-2le bomb
If your file is utf-8 encoded, why do you tell vim that it is ucs2..?
:h penc-option
In particular, this help file states that:
Code page 1252 print character encoding is used by default on Windows
and OS/2 platforms.
> Please help. Thank you!
I am not familiar with Microsoft Windows, so I don't really have an
answer to your question but you could try:
:set penc=
or..
:set penc=utf-8
and see if the 'bột' string prints correctly.
My understanding is that compiled with the adhoc +options, Vim should be
able to process utf-8 encoded files transparently on any platform but
you may also want to ask Vim to convert the file.
Take a look at:
:h ++enc
:h ++ff
If that doesn't help, please attach a small sample file, see if someone
on the list can come up with something more conclusive.
CJ
--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
> About the ucs2, because the utf8 has failed so many times. :(
> On Sat, Jan 2, 2010 at 6:16 PM, Minh Duc Thai <thm...@gmail.com> wrote:
>
> > Hello Chris,
> > I've tried to print in Linux (I use Linux Mint version 8, the
> > printer is the Print To PDF) and the result is the same as in
> > Windows.
> > I think this is a bug.
Possibly, but this doesn't tell us much as to where the bug might be,
vim, printing software, etc.
Anyway, the fact that you are able to recreate on a linux system is good
news, since I don't have access to any version of Microsoft Windows.
I believe I suggested attaching a _short_ sample file, so I can take a
look at the file and possibly give printing on a shot and see if I can
recreate the problem.
Also, please trim your posts to something manageable. There is no sense
in repeating my initial reply - about 100 lines - to add just a couple
of lines of your own. Just keep whatever is relevant of the post you are
replying to.
And lastly, try to avoid posting an html copy of your message, unless
you have good cause to do so.
| http://www.vim.org/maillist.php
So please post back with a sample file attached to your message (not
something copied and pasted into your message) and see if I or someone
else can come up with some idea as to what's going on.
CJ
> Bột bột
It looks like you sent this to me directly instead of the list.
CJ
I tested using your example, and can confirm that Print does not work
well on Windows when enc=utf-8. I tried enc=latin1 and enc=cp936, and
in both cases Print is successful (as long as the characters can be
displayed in that encoding).
I tested using the Adobe PDF driver.
--
Wu Yongwei
URL: http://wyw.dcweb.cn/
> > Hello Chris,
Hmm well.. your .virmc apparently did not make it to the list.
In any case, it looks like I am able to recreate your problem here or at
least getting similar resultts.
I tried changing the printfont to GNU/unifont, which I know has a glyph
for U+1ED9 - :set printfont=unifont - and I was still getting the same
results. I tried other fonts and it looked like my 'printfont' settings
were silently ignored.
Then I saw this, under :help postscript-printing:
| There are currently a number of limitations with PostScript printing:
|
| - 'printfont' - The font name is ignored (the Courier family is always
| used - it should be available on all PostScript printers) but the
| font size is used.
I'm not sure how I could determine what font might correspond to 'the
Courier family' but it looks like it's defaulting to a font that has
no support for anything beyond U+0100.
Maybe s/o could shed some light on this?
Anyway, I was able to print your sample by invoking the paps converter:
| :%w ! paps --font="arial 8" --paper letter | lpr " proportional
| :%w ! paps --font="unifont 10" --paper letter | lpr " monospace
You may need to install paps on your linux system since it's not part of
Vim, and then you could give this a try, possibly dropping the '--paper
letter' switch if you want 'paps' to default to A4.
Let me know if this helps,
CJ
> > Hello Chris,
On debian stable, I also tested printing utf-8 encoded files containing
samples of CJK, Devanagari, and a couple other Eastern scripts and I was
unable to get :hardcopy to print their contents.
Since utf-8 is the default encoding on debian Lenny, I find it hard to
believe that the Vim to Postscript implementation would not function out
of the box with utf-8 encoded files, and even less plausible that I was
unable to find anyone reporting this issue while searching online, apart
from a few reports where Vim 7.0 or older was involved, and dating back
7-8 years ago.
Leads me to think that there's more to it than the speculations in my
earlier post today.
Note, that I tried to implement the following in my .vimrc, also without
success:
| set printexpr=PrintFile(v:fname_in)
| function PrintFile(fname)
| call system('paps --font="unifont 8" --paper letter | lpr ' . a:fname)
| call delete(a:fname)
| return v:shell_error
The characters from the 'exotic' scripts were replaced by inverted
question marks or blanks, and _as far as I can tell_ it looked as if the
same ASCII or latin1 font was used not matter what font I passed to the
paps converter.
Can anyone shed some light on this mattter?
Thanks,
CJ
> On debian stable, I also tested printing utf-8 encoded files containing
> samples of CJK, Devanagari, and a couple other Eastern scripts and I was
> unable to get :hardcopy to print their contents.
>
> Since utf-8 is the default encoding on debian Lenny, I find it hard to
> believe that the Vim to Postscript implementation would not function out
> of the box with utf-8 encoded files, and even less plausible that I was
> unable to find anyone reporting this issue while searching online, apart
> from a few reports where Vim 7.0 or older was involved, and dating back
> 7-8 years ago.
Printing UTF-8 text is hard, since PostScript doesn't support it
natively. I was pretty surprised that 'enscript' never made it into the
Unicode age. 'paps' is the only thing I found that seems to do a
reasonable job. Though, just now (while trying to find the page I found
yesterday) I found a few entries in a UTF-8 and Unicode FAQ under
'Printing'[1].
[1] http://www.cl.cam.ac.uk/~mgk25/unicode.html
CUPS supposedly handles UTF-8 via the texttops filter, but I was unable
to get anything reasonable (even fiddling with 'CHARSET=' and '-o
document-format=text/plain;charset=' options). I eventually gave up and
replaced /usr/libexec/cups/filter/texttops with the following script:
#!/bin/sh
paps < "$6" | title="$3" perl -lpwe 's/stdin/$ENV{title}/ if 2==$.'
> Leads me to think that there's more to it than the speculations in my
> earlier post today.
>
> Note, that I tried to implement the following in my .vimrc, also without
> success:
>
> | set printexpr=PrintFile(v:fname_in)
> | function PrintFile(fname)
> | call system('paps --font="unifont 8" --paper letter | lpr ' . a:fname)
> | call delete(a:fname)
> | return v:shell_error
>
> The characters from the 'exotic' scripts were replaced by inverted
> question marks or blanks, and _as far as I can tell_ it looked as if the
> same ASCII or latin1 font was used not matter what font I passed to the
> paps converter.
>
> Can anyone shed some light on this mattter?
From the docs, printexpr only affects how the generated PS temp file
gets printed. So, if Vim's already subbing out the chars in the PS,
it's not going to matter what happens next.
Testing with :ha > test.ps shows that no matter what encoding or
fileencoding or printencoding or printmbencoding I tried, it still shows
up as latin1 in the resulting PostScript. Which is weird considering
the various charset handling that appears to be done in src/hardcopy.c.
The only way I was able to get decent printouts was by just shelling out
to paps:
:!paps < % > test.ps
Best,
Ben
[..]
> > Since utf-8 is the default encoding on debian Lenny, I find it hard
> > to believe that the Vim to Postscript implementation would not
> > function out of the box with utf-8 encoded files,
[..]
> Printing UTF-8 text is hard, since PostScript doesn't support it
> natively.
Actually, since this is rather messy and I'm probably not going to take
another a look at it for some time, I decided to write my own personal
mini-howto on the subject, and since I was unable to quickly think of a
short elegant preamble, I wrote: "Printing UTF8-encoded files is tricky
at best.." ;-)
> I was pretty surprised that 'enscript' never made it into the
> Unicode age. 'paps' is the only thing I found that seems to do a
> reasonable job. Though, just now (while trying to find the page I found
> yesterday) I found a few entries in a UTF-8 and Unicode FAQ under
> 'Printing'[1].
>
> [1] http://www.cl.cam.ac.uk/~mgk25/unicode.html
Saw that too.. Nothing helpful.
> CUPS supposedly handles UTF-8 via the texttops filter, but I was unable
> to get anything reasonable (even fiddling with 'CHARSET=' and '-o
> document-format=text/plain;charset=' options). I eventually gave up and
> replaced /usr/libexec/cups/filter/texttops with the following script:
Went down that road, only to reach the same dead end.
> #!/bin/sh
> paps < "$6" | title="$3" perl -lpwe 's/stdin/$ENV{title}/ if 2==$.'
[..]
> > Can anyone shed some light on this mattter?
> From the docs, printexpr only affects how the generated PS temp file
> gets printed. So, if Vim's already subbing out the chars in the PS,
> it's not going to matter what happens next.
Pretty much what I speculated.
> Testing with :ha > test.ps shows that no matter what encoding or
> fileencoding or printencoding or printmbencoding I tried, it still
> shows up as latin1 in the resulting PostScript. Which is weird
> considering the various charset handling that appears to be done in
> src/hardcopy.c.
I was expecting to find a bug report somewhere, or would that be a Vim
enhancement request - i.e. lifting this limitation, and saw nothing.
> The only way I was able to get decent printouts was by just shelling
> out to paps:
> :!paps < % > test.ps
Looks like I was on the right track re: the OP's problem then, and one
variation or other involving paps should fix it for him.
Thank you for your comments,
CJ
Except that the message subject is "Printing with utf-8 characters on
WINDOWS"....
The real universal solution for non-ASCII characters is NOT print from
Vim. Convert the document to HTML with ":TOhtml", and then print from
your browser.
Except that the message subject is "Printing with utf-8 characters on
WINDOWS"....
I wrote the original PS driver for VIM, several years ago now. This is
somewhat OT from the OP as it is not Windows related. If you are not
interested, stop reading now.
The PS driver relies on fonts being present in the printer. The only
ones guaranteed to be there are the base 35 western fonts (Courier,
Times, etc). However, far east printers will have a few multi-byte
fonts to support CJK printing, for which the printmbcharset et al
options and handling of multi-byte encodings was added. It is possible
to install additional multi-byte fonts on the printer which could also
be used
Technically PostScript is text encoding agnostic - it just deals with
sequences of byte values. The selected font defines how to interpret
the byte sequence, as single bytes or a multi-byte encoding of some kind.
A lot depends on the characters being used. If you are using UTF-8
encoding for text that exists in a single ISO-8859 character set then
you can just set printencoding and VIM should translate the UTF-8
encoded text to single bytes for printing. If you are using characters
from multiple ISO-8859 character sets then things start to get complicated.
If you are just using ISO-8859 characters then it would be possible (but
not currently implemented) to support many such character sets when
printing with a single font.
If you are using true multiple-byte characters (i.e. ones not present in
any of the ISO-8859 or cp character sets) then you will need to use a
multi-byte font and the big issue is with handling them - their
discovery on the host system, metrics calculation for text layout,
selection of a sub-set of the contents (multi-byte fonts tend to be
large - do you want to generate a 12MB PS file to print <1K of text?),
and embedding in the generated PS.
Not a trivial problem to solve at the time. When discussed with Bram it
was decided this was not wanted. Dunno if time has changed the argument
at all.
TTFN
Mike
--
yip yip yip yip yap yap yip *BANG* - NO TERRIER
I think he wrote somewhere that he prints from Windows because his
printer is better supported.
> If you are not interested, stop reading now.
I'm not sure who wouldn't be. As far as I'm concerned, you are salvaging
the thread from guesswork and speculations, thank goodness for that.
> The PS driver relies on fonts being present in the printer. The only
> ones guaranteed to be there are the base 35 western fonts (Courier,
> Times, etc). However, far east printers will have a few multi-byte
> fonts to support CJK printing, for which the printmbcharset et al
> options and handling of multi-byte encodings was added. It is
> possible to install additional multi-byte fonts on the printer which
> could also be used
I have an old HP LaserJet 2100 that's still running on the original
cartridge. Do you mean that if I wanted to be able to use :hardcopy to
successfully print any character from the Unicode BMP, I would be able
to do so after installing a universal font such as GNU/Unifont on the
printer?
> Technically PostScript is text encoding agnostic - it just deals with
> sequences of byte values. The selected font defines how to interpret
> the byte sequence, as single bytes or a multi-byte encoding of some
> kind.
So, in a UTF-8 context and with multi-byte characters, I'm still unclear
as to why I can use paps to create a .ps file that will print correctly
on my printer, and unable to use Vim's :hardcopy command to do the same
thing.
Why can't the :hardcopy command perform the same magic?
> A lot depends on the characters being used. If you are using UTF-8
> encoding for text that exists in a single ISO-8859 character set then
> you can just set printencoding and VIM should translate the UTF-8
> encoded text to single bytes for printing. If you are using
> characters from multiple ISO-8859 character sets then things start to
> get complicated.
> If you are just using ISO-8859 characters then it would be possible
> (but not currently implemented) to support many such character sets
> when printing with a single font.
> If you are using true multiple-byte characters (i.e. ones not present
> in any of the ISO-8859 or cp character sets) then you will need to
> use a multi-byte font and the big issue is with handling them - their
> discovery on the host system, metrics calculation for text layout,
> selection of a sub-set of the contents (multi-byte fonts tend to be
> large - do you want to generate a 12MB PS file to print <1K of text?),
> and embedding in the generated PS.
Yes, GNU/unifont, at least the file on my HDD is 16MB and it would
hardly make sense to download it to the printer with each an every print
job. But that would not be necessary if the font resided on the printer.
In any event, the size of the .ps file created by paps from an one-line
Vim buffer containing 'Bột bột' and nothing else is only 7.2. I looked
at a 16K UTF8-encoded text file containing multi-byte characters and
the resulting .ps file that paps created was 329K.
So, I definitely missing something [some things] :-)
> Not a trivial problem to solve at the time. When discussed with Bram
> it was decided this was not wanted. Dunno if time has changed the
> argument at all.
Maybe these aspects should be clarified under :h postscript-printing
under limitations:multi-byte support.
Sorry if I'm asking the wrong questions, I don't know Postscript and I
have no experience with printers.
> TTFN
>
> Mike
> --
> yip yip yip yip yap yap yip *BANG* - NO TERRIER
That can't have been a *BULL*Terrier, then.. ;-)
CJ
> Not a trivial problem to solve at the time. When discussed with Bram it
> was decided this was not wanted. Dunno if time has changed the argument
> at all.
I'm complaining about this issue for the last ten years. This is just
unbelievable: such a mighty text editor as gVim just does not allow
Windows international users to print their texts when gVim is set to use
UTF-8 as it's internal encoding... :(
Note, please: you are _forced_ to use the UTF-8 as gVim internal
encoding if you want to be able to perform encoding conversions...
I just don't remember any other text editor with such restriction (not
counting the crippleware ones)...
For many of us printing is as important as saving your edits.
Can you imagine a full-featured text editor in a year 2010 which does
not allow users to save or print the text files? :(
--
Best regards,
Valery Kondakoff
PGP key:
http://pool.sks-keyservers.net:11371/pks/lookup?op=get&search=0xEEDF8590
np: The Big Pink'2009 (A Brief History Of Love) - Crystal Visions
While I don't know how to print in PS or gtk, however, from my
experience in using gdi api to print unicode CJK in window, I don't
think it is all that difficult to print CJK character.
--
regards,
====================================================
GPG key 1024D/4434BAB3 2008-08-24
gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3
Indeed. Windows supports encoding conversion so it should be possible
to do it as part of gvim without having to find a copy of iconv. It
just hasn't been an issue for any of the Windows VIM developers. There
is not a lot I can do about that.
> Note, please: you are _forced_ to use the UTF-8 as gVim internal
> encoding if you want to be able to perform encoding conversions...
>
> I just don't remember any other text editor with such restriction (not
> counting the crippleware ones)...
>
> For many of us printing is as important as saving your edits.
> Can you imagine a full-featured text editor in a year 2010 which does
> not allow users to save or print the text files? :(
>
I hear you. New features seem to be all the rage, I doubt this would be
a candidate for GSoC which would be a nice way to sort this all out.
TTFN
Mike
--
Education is what you get from reading the small print; experience is
what you get from not reading it.
Sorry, that will have to be a "that depends". The font has to be in a
format that that era of PS understands. AFAICR Level 2 PS did not
support Unicode encoding with PS fonts. They can support mult-byte
encoded text, which means that the text to be printed and your Unicode
font would need translating to a form that the printer can use. As I
said, it is complicated.
>> Technically PostScript is text encoding agnostic - it just deals with
>> sequences of byte values. The selected font defines how to interpret
>> the byte sequence, as single bytes or a multi-byte encoding of some
>> kind.
>
> So, in a UTF-8 context and with multi-byte characters, I'm still unclear
> as to why I can use paps to create a .ps file that will print correctly
> on my printer, and unable to use Vim's :hardcopy command to do the same
> thing.
I have had a quick look at paps. It is based on top of Pango, which is
a large piece of software to handle layout and rendering of Unicode
text. AFAICS paps interprets the pango output to draw each character as
a filled path. Not the quickest and most efficient method, and the
output will be poor at smaller font sizes - but it does work. This
removes the need for PS fonts altogether
> Why can't the :hardcopy command perform the same magic?
Writing a Unicode layout print engine is not trivial. paps leverages a
lot of the work done by the Pango developers that would need to be
written from scratch. Plus the normal aim of VIM has been to be
platform independent - using Pango for Unicode printing would prevent
multi-byte printing in environments that don't support Pango.
In general this level of complexity is usually supported by some level
of host OS service. This means that multi-byte printing becomes
platform and OS dependent - for example, on a box without X11/gtk2
multi-byte printing would not be supported. With sufficient work
implementing what is needed in VIM it could be, but I don't know if that
is what Bram wants.
>> A lot depends on the characters being used. If you are using UTF-8
>> encoding for text that exists in a single ISO-8859 character set then
>> you can just set printencoding and VIM should translate the UTF-8
>> encoded text to single bytes for printing. If you are using
>> characters from multiple ISO-8859 character sets then things start to
>> get complicated.
>
>> If you are just using ISO-8859 characters then it would be possible
>> (but not currently implemented) to support many such character sets
>> when printing with a single font.
>
>> If you are using true multiple-byte characters (i.e. ones not present
>> in any of the ISO-8859 or cp character sets) then you will need to
>> use a multi-byte font and the big issue is with handling them - their
>> discovery on the host system, metrics calculation for text layout,
>> selection of a sub-set of the contents (multi-byte fonts tend to be
>> large - do you want to generate a 12MB PS file to print<1K of text?),
>> and embedding in the generated PS.
>
> Yes, GNU/unifont, at least the file on my HDD is 16MB and it would
> hardly make sense to download it to the printer with each an every print
> job. But that would not be necessary if the font resided on the printer.
Assuming there was space and it could be used as a PS font, then yes.
things can get tricky if anyone wants to use commercial fonts since you
cannot copy them around all over the place. This also makes sharing PS
files hard - embedding fonts (or a subset containing just the characters
used) in the generated file is usually the best way to do things.
> In any event, the size of the .ps file created by paps from an one-line
> Vim buffer containing 'Bột bột' and nothing else is only 7.2. I looked
> at a 16K UTF8-encoded text file containing multi-byte characters and
> the resulting .ps file that paps created was 329K.
>
> So, I definitely missing something [some things] :-)
As noted above I believe paps generates a lot of PS commands to draw the
outline of each character which is then filled. This can result in very
large PS files for large amounts of text, slower printing since it
doesn't take advantage of the PS font cache, and the output can be poor
at smaller font sizes. It may even have memory issues on larger paper
sizes.
>> Not a trivial problem to solve at the time. When discussed with Bram
>> it was decided this was not wanted. Dunno if time has changed the
>> argument at all.
>
> Maybe these aspects should be clarified under :h postscript-printing
> under limitations:multi-byte support.
>
> Sorry if I'm asking the wrong questions, I don't know Postscript and I
> have no experience with printers.
No problem, it is a bit of a specialist issue, and it is my day job. ;-)
>> TTFN
>>
>> Mike
>> --
>> yip yip yip yip yap yap yip *BANG* - NO TERRIER
>
> That can't have been a *BULL*Terrier, then.. ;-)
>
> CJ
>
Mike
Assuming you are talking gdi as in windows, then no it isn't. I believe
it just needs an appropriate call to re-encode the character for the
encoding being used for printing. It just hasn't been a big enough itch
for any VIM developer.
[..]
Thank yoy very much for taking the time to explain what the problem is
and why this is not a simple issue.
Since afaict this has nothing to do with Windows & to make the thread
searchable, I changed the title to something more relevant.
Thanks,
CJ
After reading about half of this thread, I have the following remarks:
- I haven't succeeded to print "full-Unicode" text with :hardcopy. When
I have a file with some exotic characters in it (Hebrew, maybe, or
Chinese, embedded in French text), I write it to disk as a *.txt file
(in UTF-8 with BOM), then print it in my browser.
- IIUC, valid 'printencoding' values are those for which there is a
PostScript conversion file in $VIMRUNTIME/print/ -- anything else is
treated as Latin1, including UTF-8, UTF-16, UTF-16le, UTF-32 and UTF-32le.
- Most gvim versions for Windows are built with +printer but
-postscript. In that case, according to its help, the 'printencoding'
option is not supported.
Best regards,
Tony.
--
GALAHAD: Camelot ...
LAUNCELOT: Camelot ...
GAWAIN: It's only a model.
"Monty Python and the Holy Grail" PYTHON (MONTY)
PICTURES LTD