How do I search for badly encoded characters

26 views
Skip to first unread message

James Barnett

unread,
Oct 27, 2014, 12:53:30 PM10/27/14
to vim_mu...@googlegroups.com
Dear Forum,

I maintain an application that reads and writes text in utf-8. Due to bad joss incurred during my (and others) utf-8 learning curve I now have some garbled characters in my input. These show up in vim -b as '<nn>', where nn is a lower-case hex string. Here's an example:
3020 tuomas jorma juhani r<e4>s<e4>nen

My question is, how do I search for these characters in vim so I can fix or delete them? Treating them as literal strings doesn't work.

Thanks!

John Beckett

unread,
Oct 27, 2014, 7:20:25 PM10/27/14
to vim_mu...@googlegroups.com
In principle, vim_multibyte is the right mailing list, but in
practice it is hardly every used, and I suggest using the main
vim_use mailing list in the future unless a very esoteric issue
regarding multibyte issues needs to be discussed at length.

There are three very useful commands entered in normal mode:
ga
g8
8g8

ga and g8 display information about the character at the cursor.
8g8 finds the next illegal UTF-8 sequences (it does nothing if
none found).

Use ':help 8g8' for info.

John

Kenneth Reid Beesley

unread,
Oct 28, 2014, 12:08:41 PM10/28/14
to vim_mu...@googlegroups.com, johnb....@gmail.com, jlawr...@gmail.com

On 27Oct2014, at 17:20, John Beckett <johnb....@gmail.com> wrote:

> James Barnett wrote:
>> I maintain an application that reads and writes text in utf-8.
>> Due to bad joss incurred during my (and others) utf-8 learning
>> curve I now have some garbled characters in my input. These
>> show up in vim -b as '<nn>', where nn is a lower-case hex
>> string.

>
> There are three very useful commands entered in normal mode:
> ga
> g8
> 8g8
>
> ga and g8 display information about the character at the cursor.
> 8g8 finds the next illegal UTF-8 sequences (it does nothing if
> none found).
>
> Use ':help 8g8' for info.
>

I assume that your ‘encoding’ (vim buffer internal encoding) is UTF-8.

Once you know the hex value that you want to find, e.g 00E4,
I think that you should be able to search for it by entering / (the slash),
Ctrl-v, u, 00E4.

********************************
Kenneth R. Beesley, D.Phil.
P.O. Box 540475
North Salt Lake, UT
84054 USA





Павлов Николай Александрович

unread,
Oct 28, 2014, 12:27:14 PM10/28/14
to vim_mu...@googlegroups.com, Kenneth Reid Beesley, johnb....@gmail.com, jlawr...@gmail.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On October 28, 2014 7:08:27 PM EAT, Kenneth Reid Beesley <krbe...@gmail.com> wrote:
>
>On 27Oct2014, at 17:20, John Beckett <johnb....@gmail.com> wrote:
>
>> James Barnett wrote:
>>> I maintain an application that reads and writes text in utf-8.
>>> Due to bad joss incurred during my (and others) utf-8 learning
>>> curve I now have some garbled characters in my input. These
>>> show up in vim -b as '<nn>', where nn is a lower-case hex
>>> string.
>
>>
>> There are three very useful commands entered in normal mode:
>> ga
>> g8
>> 8g8
>>
>> ga and g8 display information about the character at the cursor.
>> 8g8 finds the next illegal UTF-8 sequences (it does nothing if
>> none found).
>>
>> Use ':help 8g8' for info.
>>
>
>I assume that your ‘encoding’ (vim buffer internal encoding) is UTF-8.
>
>Once you know the hex value that you want to find, e.g 00E4,
>I think that you should be able to search for it by entering / (the
>slash),
>Ctrl-v, u, 00E4.

This only allows you to search for unicode characters. They never show up as <xx> AFAIK. To enter invalid character one needs to use <C-r>="\xXX"<CR>.

>
>********************************
>Kenneth R. Beesley, D.Phil.
>P.O. Box 540475
>North Salt Lake, UT
>84054 USA

-----BEGIN PGP SIGNATURE-----
Version: APG v1.1.1

iQI1BAEBCgAfBQJUT8PZGBxaeVggPHp5eC52aW1AZ21haWwuY29tPgAKCRCf3UKj
HhHSvutZEACqiQyQd8mJZKDxM1s4hkLcFhtTqX5WC+euSBB37pOsK8w/X5qjPxjS
Z7Em9swlg777/ngBr3Lu0vWWBgYuoYp2Ad7/YE4HAzaT3NhUwWx3nhNGQbcaO9AN
6h9eAqVhtOki0/g3/kQT2cN2Md1kzcYYYRNGs6jRxeNW2+O/mMXbLXkDls2N46mK
WIIaklb+4El2zCT7+PXxDC+vLGpDEdktbHzOnAldfjpOxM1Apu5mqkp6weDHhWaU
iLKUaVhRDW2CFJAXyVKsr3q/ei5EPx3Xcrd1xn6BZcYy0fRbVYLBYLbGbtSVV5tw
PAvhsKL4xnVaBKK9n7d2KgdOqaSOkUprmh8Y13kMUE/oyuT+1SvnNnX9I4eUCIOg
evgrY5qi++zM/MsuuNYK16VJgicpxo8TD+QqKjyr+yPfS806AMTnnzoD0/lqsE4Q
iIQjSg1bj+Z7s4jC9cSbRBQl7jUrCw5XhSjnmCwdIRl5tErD+yRHWPAw+2EML+Xi
N28gxtR3gKaPBD4D40XFE9XNYCC48yjBcqupd5w8nJD4pURPMhQ8gIhbSQeh6ezA
q1V/E/0IAL31jn5DgYpsHl5pGAzuFumjCnibsHnISk2x9Q8pktSiP/T9Gsomv6AP
A+Fu8hqsqnUwKgKwvMW96mhQV4PfO31K7+fMNc0q07qxQP6DHi/4eQ==
=CkpT
-----END PGP SIGNATURE-----

Reply all
Reply to author
Forward
0 new messages