Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Automatically change encoding when opening file
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
BPJ  
View profile  
 More options Nov 15 2012, 3:19 am
From: BPJ <b...@melroch.se>
Date: Thu, 15 Nov 2012 09:18:52 +0100
Local: Thurs, Nov 15 2012 3:18 am
Subject: Automatically change encoding when opening file
I regularly get files encoded in UTF16LE sent to me and want them
to be automatically converted to UTF8 when opening them.
I'm not sure what command to use/put in .vimrc for that though, so
I wonder if anybody is already doing this, and how?

/bpj


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tony Mechelynck  
View profile  
 More options Nov 15 2012, 8:31 am
From: Tony Mechelynck <antoine.mechely...@gmail.com>
Date: Thu, 15 Nov 2012 14:31:14 +0100
Local: Thurs, Nov 15 2012 8:31 am
Subject: Re: Automatically change encoding when opening file
On 15/11/12 09:18, BPJ wrote:

> I regularly get files encoded in UTF16LE sent to me and want them to be
> automatically converted to UTF8 when opening them.
> I'm not sure what command to use/put in .vimrc for that though, so
> I wonder if anybody is already doing this, and how?

> /bpj

If a file in UTF-16le has a BOM (the codepoint U+FEFF at the very
beginning of the file, which for UTF-16le means the bytes 0xFF 0xFE),
then if you have set Vim to use UTF-8 'encoding' in your vimrc that file
will usually be opened correctly (because the default 'fileencodings'
-plural- starts with "ucs-bom"). See
http://vim.wikia.com/wiki/Working_with_Unicode about how to set Vim up
like that.

If the file has no BOM it is a little harder to detect the correct
'fileencoding'. I think there is a Chinese regular of this list who has
a plugin to do more detailed encoding matching than what Vim does out of
the box but I don't know the details.

Now, once the 'fileencoding' -singular- has been detected as UTF-16le it
is possible to convert it, even automatically, like this:

if has('multi_byte') && &encoding == 'utf-8'
     augroup vimrc_utf8
         autocmd VimEnter * autocmd vimrc_utf8 BufReadPost *
             \ if &fenc ==? 'utf-16le' || &fenc ==? 'ucs-2le' |
                 \ setlocal fenc=utf-8 |
                 \ w! |
             \ endif
     augroup END
endif

The exclamation mark is necessary if the file is marked readonly, but is
not on a readonly filesystem. Alternatively, if you don't want to
convert read-only files (including files opened by :view), replace the
"if" line inside the autocommand by

        \ if !&ro && (&fenc ==? 'utf-16le' || &fenc ==? 'ucs-2le') |

and then the exclamation mark may be left out on the "w[rite]" line.

It may look weird to define an autocommand inside an autocommand, but I
do it like this to ensure that this BufReadPost autocommand comes last,
and in particular, after anything defined by any additional
encoding-detection plugin you might install.

Best regards,
Tony.
--
The mome rath isn't born that could outgrabe me.
                -- Nicol Williamson


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ven Tadipatri  
View profile  
 More options Nov 29 2012, 3:41 pm
From: Ven Tadipatri <vtadipa...@gmail.com>
Date: Thu, 29 Nov 2012 15:41:16 -0500
Local: Thurs, Nov 29 2012 3:41 pm
Subject: Re: Automatically change encoding when opening file
On Thu, Nov 15, 2012 at 8:31 AM, Tony Mechelynck

<antoine.mechely...@gmail.com> wrote:

> If a file in UTF-16le has a BOM (the codepoint U+FEFF at the very beginning
> of the file, which for UTF-16le means the bytes 0xFF 0xFE), then if you have
> set Vim to use UTF-8 'encoding' in your vimrc that file will usually be
> opened correctly (because the default 'fileencodings' -plural- starts with
> "ucs-bom"). See http://vim.wikia.com/wiki/Working_with_Unicode about how to
> set Vim up like that.

Hi Antoine,

I'm not really that familiar with the different encoding types (UTF-8,
UTF-16, etc), but when I came across a strange <feff> character which
I think is related to what you're describing.
  I open up two files in gedit and they seem to contain the same exact
line. But in vim, there's a strange character at the beginning
"<feff>". It's not a string, because if I go to the beginning of the
line and hit 'x', it deletes the entire <feff>, indicating it's some
sort of special hidden character.
  What is this strange character?  In Vi's hex mode (%!xxd), I can see
there is a sequence of bits "efbbbf", and the rest of the file seems
to somehow be offset

Thanks,
Ven


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Fritz  
View profile  
 More options Nov 29 2012, 4:15 pm
From: Ben Fritz <fritzophre...@gmail.com>
Date: Thu, 29 Nov 2012 13:15:51 -0800 (PST)
Local: Thurs, Nov 29 2012 4:15 pm
Subject: Re: Automatically change encoding when opening file

This strange character is the byte-order-mark ( http://en.wikipedia.org/wiki/Byte_order_mark ). The exact byte sequence you see indicates the file is in utf-8. Vim probably did not detect the file as utf-8.

Check that:
1. your Vim is compiled with multibyte support
2. your 'encoding' option is set AT THE VERY BEGINNING OF YOUR .VIMRC to utf-8
3. your 'fileencodings' option contains ucs-bom or utf-8 or both, before any 8-bit encodings.

If these are all the case your Vim should automatically detect the utf-8 fileencoding and the presence of a BOM, and set 'fenc' and 'bomb' appropriately.

See the http://vim.wikia.com/wiki/Working_with_Unicode linked by Tony.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marcin Szamotulski  
View profile  
 More options Nov 29 2012, 4:16 pm
From: Marcin Szamotulski <msza...@gmail.com>
Date: Thu, 29 Nov 2012 21:16:33 +0000
Local: Thurs, Nov 29 2012 4:16 pm
Subject: Re: Automatically change encoding when opening file
On 15:41 Thu 29 Nov     , Ven Tadipatri wrote:

Hi,

This is the bom (byte order mark) character:
http://en.wikipedia.org/wiki/Byte_order_mark

<feff> is the BOM character for UTF-16 encoding.  UTF-16 uses 2 bytes to
encode a character, but the order of them might differ. This BOM
character tells which byte comes first.

Best,
Marcin

Best,
Marcin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben Fritz  
View profile  
 More options Nov 29 2012, 5:21 pm
From: Ben Fritz <fritzophre...@gmail.com>
Date: Thu, 29 Nov 2012 14:21:28 -0800 (PST)
Local: Thurs, Nov 29 2012 5:21 pm
Subject: Re: Automatically change encoding when opening file

On Thursday, November 29, 2012 3:16:33 PM UTC-6, coot_. wrote:

> <feff> is the BOM character for UTF-16 encoding.  UTF-16 uses 2 bytes to

> encode a character, but the order of them might differ. This BOM

> character tells which byte comes first.

feff is the BOM character for UTF-8 as well, where it does not have any meaning in terms of byte ordering, but can be used to identify a file as UTF-8.

In UTF-8, the feff character is represented as efbbbf (three bytes) due to the way UTF-8 encodes multi-byte values in varying length.

The interesting thing about UTF-8 is that often even if an editor misidentifies a UTF-8 file as Latin1, or as windows-1252, for example, most of the file will remain readable, because UTF-8 has the same byte representation for many characters as Latin1 does.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »