Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
search and replace using a list of characters to replace
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  5 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Chris Lott  
View profile  
 More options Nov 3 2012, 3:12 pm
From: Chris Lott <ch...@chrislott.org>
Date: Sat, 3 Nov 2012 11:11:54 -0800
Local: Sat, Nov 3 2012 3:11 pm
Subject: search and replace using a list of characters to replace
I have a large text file in which I need to remove all punctuation,
all special characters ("smart quotes") and the like, and a bunch of
selected words.

Can this be done within Vim?

c
--
Chris Lott <ch...@chrislott.org>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Nov 3 2012, 3:23 pm
From: Tim Chase <v...@tim.thechases.com>
Date: Sat, 03 Nov 2012 14:24:03 -0500
Local: Sat, Nov 3 2012 3:24 pm
Subject: Re: search and replace using a list of characters to replace
On 11/03/12 14:11, Chris Lott wrote:

> I have a large text file in which I need to remove all punctuation,
> all special characters ("smart quotes") and the like, and a bunch of
> selected words.

> Can this be done within Vim?

Yes.

Oh, you want to know *how*? :-P

The smart-quotes are the hardest ones to do, but if you can enter
them in vim (or select+yank them, and then paste them into an Ex
command using control+R followed by a double-quote), they should be
usable:

 :%s/\([[:punct:]]\+\|”\|“\|selected\|words\)//g

Alternatively, you might want to specify what *is* allowed and
invert it:

  :%s/\W\+//g   " that's "everything that isn't a Word character"
or
  :%s/[^[:alnum:][:space]]\+//g  "all but alnum & spaces"

which you can read about at

  :help :alnum:
  :help /\W
  :help /\|

-tim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
stosss  
View profile  
 More options Nov 3 2012, 3:42 pm
From: stosss <sto...@gmail.com>
Date: Sat, 3 Nov 2012 15:42:03 -0400
Local: Sat, Nov 3 2012 3:42 pm
Subject: Re: search and replace using a list of characters to replace
Greetings,
Just trying to learn

Asking because I don't know and I don't use smart quotes. What makes
them so difficult to remove in a s/search/replace/g ?

Aren't they just quotation marks?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tony Mechelynck  
View profile  
 More options Nov 3 2012, 5:36 pm
From: Tony Mechelynck <antoine.mechely...@gmail.com>
Date: Sat, 03 Nov 2012 22:35:33 +0100
Local: Sat, Nov 3 2012 5:35 pm
Subject: Re: search and replace using a list of characters to replace
On 03/11/12 20:42, stosss wrote:

Well, yes, but most keyboards haven't got them. The "usual" quotation
marks (which I just used) are the same opening and closing, U+0022, and
all keyboards that I know of (even US-ASCII keyboards with no accents)
can easily produce them. Smart quotes can be smart or smart or even
smart : there are three characters for smart double quotes, U+201C
upper-6, U+201D upper-9, U+201E lower-9, and they are not the same
opening and closing, though which one is opening and which one is
closing varies by language and sometimes by country. These characters
are not in Latin1, I think they are in none of the ISO-8859 charsets, so
you need some Unicode charset (such as UTF-8) to be able to represent
them, and most keyboards either don't have them, or require some unusual
fingering to type them: on this Linux system with Belgian keyboard
layout, isn't available (I paste it from Vim where it has the digraph
:9), and are AltGr-v and AltGr-b respectively (hold AltGr while
hitting the letter). For single smart quotes it's even harder:
AltGr-Shift-v, AltGr-Shift-b, (not the comma but the low-9 single
quote) not available. (AltGr is the key right of the space bar on
international keyboards, and if you've got a second plain Alt key there
you might try Alt together with Ctrl).

While I'm here, if you want to select your selected words only as full
words (e.g. "unusual" as a word but not as part of "unusually"; "word"
or "words" but not "worded") you should use \< (zero-length start of
word) and \> (zero-length end of word) as part of your pattern:

        :%s/[[:punct:] ]\|\<unusual\>\|\<words\=\>//g

If you want to remove other kinds of quotes, e.g. i.e. U+00AB
U+00BB U+2018 U+2019 U+201A, the pattern can easily be extended.

To type smart quotes in Vim, if you haven't got them on your keyboard, I
recommend using digraphs, they're easy to remember:
            Ctrl-K " 6         double 6 above
            Ctrl-K " 9         double 9 above
            Ctrl-K : 9              double 9 below
            Ctrl-K ' 6              single 6 above
            Ctrl-K ' 9              single 9 above
            Ctrl-K . 9              single 9 below
            Ctrl-K < <                opening French
            Ctrl-K > >                closing French
see :help digraph.txt; or you can input them by their Unicode codepoint
in hex, see :help i_CTRL-V_digit. ( French quotes are sometimes used in
German with the opposite meaning, BTW.)

The substitute above will not remove spaces around the words. You may
(if you want) *follow* this substitute with

         :%s/ \{2,}/ /
to replace two or more spaces by one space, or with
        :%s/ *\ze\%( \|$\)//
if you also want to remove any number of spaces at end of line. To
remove all spaces at begin or end of line but replace them by one space
elsewhere is harder to do in one operation. Hm...
        :%s/\%(\%(^\| \)\zs *\)\|\%( *\ze\%( \|$\)\)//
should work I think, but it isn't very elegant.

Best regards,
Tony.
--
"But officer, I was only trying to gain enough speed so I could coast
to the nearest gas station."


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Tim Chase  
View profile  
 More options Nov 3 2012, 10:02 pm
From: Tim Chase <v...@tim.thechases.com>
Date: Sat, 03 Nov 2012 21:03:07 -0500
Local: Sat, Nov 3 2012 10:03 pm
Subject: Re: search and replace using a list of characters to replace
On 11/03/12 14:42, stosss wrote:

> Just trying to learn

Welcome aboard.  We're a pretty friendly bunch here and are glad to
have you.

> Asking because I don't know and I don't use smart quotes. What makes
> them so difficult to remove in a s/search/replace/g ?

> Aren't they just quotation marks?

Depending on your encoding, they're more than one byte which makes
them slightly dependent on your 'encoding' and 'fileencoding'
settings, as well as how you might enter them into Vim (copy&paste
vs. using digraphs vs. typing some special character on your
keyboard like I do in X)

-tim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »