Remove duplicate words or patterns inside lines

668 views
Skip to first unread message

tjg

unread,
Oct 11, 2012, 7:32:03 AM10/11/12
to v...@vim.org
I have this type of file (plain text) :

sometext *sometext* @me &project1 *@me*
&project2 sometext *&project2* @john @me
something @john &project2
sometext #1 @me something else *#1*

and I would like to remove all inside-a-line duplicates so as to obtain :

sometext @me &project1
sometext &project2 @john @me
something @john &project2
sometext #1 @me something else

btw, the order of items (sometext, #, @ or &) do not matter, as long as they
are unique per line

Is it possible to do that ? This is for me out of reach...



--
View this message in context: http://vim.1045645.n5.nabble.com/Remove-duplicate-words-or-patterns-inside-lines-tp5711215.html
Sent from the Vim - General mailing list archive at Nabble.com.

Tim Chase

unread,
Oct 11, 2012, 10:04:04 AM10/11/12
to vim...@googlegroups.com
On 10/11/12 06:32, tjg wrote:
> I have this type of file (plain text) :
>
> sometext *sometext* @me &project1 *@me*
> &project2 sometext *&project2* @john @me
> something @john &project2
> sometext #1 @me something else *#1*

I presume the "*" were added by your MUA as an attempt to highlight
the duplicates.

> and I would like to remove all inside-a-line duplicates so as to obtain :
>
> sometext @me &project1
> sometext &project2 @john @me
> something @john &project2
> sometext #1 @me something else
>
> btw, the order of items (sometext, #, @ or &) do not matter, as long as they
> are unique per line

To remove the first instance of each pair, you can use this ugly brute:

:%s/\([#@&]\=\<\w\+\>\).\{-}\zs \+[#@&]\@<!\1\>//g

There are some cases where if there are two duplicates that overlap
such as

sometext @me sometext @me

where you'll have to run it a second time, but it otherwise seems to
catch all the edge-cases I threw at it:

- when the text such as "sometext" also appears as "&sometext" or
"@sometext" or "#sometext"

- when substrings match such as "mete" and "sometext"

Hope this helps,

-tim


tjg

unread,
Oct 11, 2012, 11:38:42 AM10/11/12
to v...@vim.org
@Tim-Chase-9

It does help ! Thank you very much.

Just for my information, and if you have the time, could you detail for me
the forensics of this not-so-"ugly brute" ?

Thanks again



--
View this message in context: http://vim.1045645.n5.nabble.com/Remove-duplicate-words-or-patterns-inside-lines-tp5711215p5711222.html

Tim Chase

unread,
Oct 11, 2012, 12:04:43 PM10/11/12
to v...@vim.org, tjg
On 10/11/12 10:38, tjg wrote:
> Just for my information, and if you have the time, could you
> detail for me the forensics of this not-so-"ugly brute" ?

:%s/\([#@&]\=\<\w\+\>\).\{-}\zs \+[#@&]\@<!\1\>//g

\(...\) capture something of interest which is
[#@&] one of these characters
\= optionally
\< a word-starts-here boundary
\w\+ one or more Word characters
\> the word must end here
.\{-} as little as possible to meet the following
\zs consider the pattern-match starting here
(effects what gets replaced)
<space>\+ one-or-more mandatory spaces before the dupe
[#@&]\@<! assert that we can't mismatch[1] before here
\1 the thing we captured must match here
\> the match must end here too[2]

So it's roughly looking for something of interest followed by stuff,
followed by some spaces, followed by the thing-of-interest again
(and no more). It then replaces the spaces-plus-duplicate-match
with nothing.

Hope that helps make sense of it.

-tim



[1]
this is what prevents it from getting snagged on things like

my text has a &text or @text or #text

from matching



[2]
this prevents partial matches like

this text is textual

where "text" and "textual" might be considered a match





tjg

unread,
Oct 11, 2012, 12:19:28 PM10/11/12
to v...@vim.org
@Tim Chase-9

Thank you.

I am not sure I grasp it totally yet, though.
I shall have to do some Regex homework, and this will most certainly be
helpful.

Thank you very much again.



--
View this message in context: http://vim.1045645.n5.nabble.com/Remove-duplicate-words-or-patterns-inside-lines-tp5711215p5711225.html
Reply all
Reply to author
Forward
0 new messages