Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[VIM] Finding and removing duplicate lines

1,777 views
Skip to first unread message

gv...@iitk.ac.in

unread,
Feb 8, 2005, 12:11:49 PM2/8/05
to
Hi !

I have a file which contains some duplicate lines. I want to remove all
copies of a line if it occurs more than once in the file. So my
original file might look like:
l1
l2
l1
l3
l4
l3

After processing, I want it to look like:
l2
l4

I believe this should be possible in vim using some commands. Please
tell me how can I do this.

Thanks,
Gaurav

Jehannes

unread,
Feb 8, 2005, 12:34:26 PM2/8/05
to
gv...@iitk.ac.in wrote in
news:1107882709.6...@g14g2000cwa.googlegroups.com:

I think this is from Luc Hermitte:
"
There are two versions, the first leaves only the last line,
the second leaves only the first line.
1] g/^\(.*\)$\n\1$/d
2] g/\%(^\1$\n\)\@<=\(.*\)$/d
Breakdown of the second version:
g//d <-- Delete the lines matching the regexp
\@<= <-- If the bit following matches, make sure the bit preceding this
symbol
directly precedes the match
\(.*\)$ <-- Match the line into subst register 1
\%( ) <--- Group without placing in a subst register.
^\1$\n <--- Match subst register 1 followed by end of line and the new
line between the 2 lines
"

HTH
John
--
jo...@beeverWITHOUTTHIS.nl

= =
Sane sicut lux seipsam, & tenebras manifestat, sic veritas norma sui, &
falsi est.
-Spinoza-
= =
Posting powered by Vim 6.3 and ten fingers
vim.org

gv...@iitk.ac.in

unread,
Feb 8, 2005, 12:58:53 PM2/8/05
to
Hi !

The solution that you suggested, works only when the duplicate lines
are in succession. Also, it keeps one of the lines (in the case of
consecutive lines, this could be fixed by d|d instead of d at the end,
but what if the 2 lines are not consecutive ?). Any other suggestions ?

Thanks in advance,
Gaurav

Bernard Schmitz

unread,
Feb 8, 2005, 2:20:43 PM2/8/05
to

If you have sort and uniq available you could try this:

:%!sort|uniq -c

and then

:g!/\s\+1/d|%s/\s\+\d*\s//

Which will work but will leave the lines in sorted order.

What it does is this: uniq -c counts the number of lines that are the
same. uniq expects the lines to be sorted. We delete all lines that do
not start with a count of 1 and then we strip the counts from each line.


Bernard.

Preben 'Peppe' Guldberg

unread,
Feb 8, 2005, 3:21:38 PM2/8/05
to
gv...@iitk.ac.in wrote:
> Hi !

Just a quick idea for a script:

" Looping over all lines in the buffer, copy the line and escape it
" and build a pattern to match identical lines.
" If there are identical lines, remove them all. If not, increment
" the line counter to process the following line.
" Setting the current line in each iteration ensures that the
" search() does not find the first matching line in the buffer.
let l = 1
while l < line('$')
exe l
let pat = '\V\^' . escape(getline('.'), '/\') . '\$'
if search(pat, 'W')
exe 'g/' . pat . '/d'
else
let l = l + 1
endif
endwhile

Remove comments and join the lines, seperated by |'s, and you should
have a neat oneliner.

Peppe
--
se nocp cpo=BceFsx!$ hid bs=2 ls=2 hls ic " P. Guldberg /bin/v...@wielders.org
se scs ai isf-== fdo-=block cino=t0,:0 hi=100 ru so=4 noea lz|if has('unix')
se sh=/bin/sh|en|syn on|filetype plugin indent on|ono S V/\n^-- $\\|\%$/<CR>
cno <C-A> <C-B>|au FileType vim,mail se sw=4 sts=4 et|let&tw=72+6*(&ft=~'v')

Sven Guckes

unread,
Feb 8, 2005, 7:12:17 PM2/8/05
to
* gv...@iitk.ac.in <gv...@iitk.ac.in> [2005-02-08]:

:%!sort -u

Sven

Antony Scriven

unread,
Feb 10, 2005, 5:26:15 AM2/10/05
to
Hello

Preben 'Peppe' Guldberg wrote:

Don't you need to limit where the g operates with a .,$ ? I
guess it depends on how you interpret `copies' in the
original message. I'm going to assume that OP wants to
remove all duplicate lines except for the first occurrence
in the buffer.

> else
> let l = l + 1
> endif
> endwhile
>
> Remove comments and join the lines, seperated by |'s, and
> you should have a neat oneliner.

Indeed. I haven't properly tested these. None require
sorting and they should all leave the lines in the buffer in
the same order as before the command.

This first one searches backwards for a match.

:g/^/ kl | if search('\V\^' . escape(getline('.') '/\') . '\$',
'bW') | 'ld

(I think maybe Peppe wrote that one anyway, my memory is
hazy.)

Maybe something like this will also work.

:g/^/exe ',$s/\V\n' . escape(getline('.'), '/\') . '\$'

And this one is slow in comparison.

:g/^/m0
:g/^\(.*\)\n\_.*\%(^\1$\)/d
:g/^/m0

Antony

da...@tvis.co.uk

unread,
Feb 10, 2005, 5:55:46 AM2/10/05
to
Dare
As a VIM lover, might I timidly suggest that this may easier done using
an external scripting language Perl,PHP etc.

In Perl I'd load each line into a hash and then ignore any line that
was already loaded into the hash, otherwise write the line to output

zzapper

Antony Scriven

unread,
Feb 10, 2005, 12:34:22 PM2/10/05
to
Antony Scriven wrote:

> Hello
>
> Preben 'Peppe' Guldberg wrote:
>
> > gv...@iitk.ac.in wrote:
> > > Hi !
> >
> > > I have a file which contains some duplicate lines. I
> > > want to remove all copies of a line if it occurs more
> > > than once in the file. So my original file might look

> > > [...]


> >
> > > I believe this should be possible in vim using some
> > > commands. Please tell me how can I do this.

Hmm, I'm sensing deja vu here. In fact a quick google
reveals ...

> [...]


>
> :g/^/exe ',$s/\V\n' . escape(getline('.'), '/\') . '\$'

... that \C might be a good idea to force case sensitivity
(thanks Peppe!).

Antony

William James

unread,
Feb 14, 2005, 4:48:05 AM2/14/05
to

gv...@iitk.ac.in wrote:
> Hi !
>
> I have a file which contains some duplicate lines. I want to remove
all
> copies of a line if it occurs more than once in the file.

Move to first line of file and in command mode type:

!G awk "$0 in a{next} {print;a[$0]++}"

This works in Elvis under Windows. Under Linux or Unix, you
should probably replace each " with '.

William James

unread,
Feb 14, 2005, 5:17:26 PM2/14/05
to

Too prolix. Try

!G awk "0==a[$0]++"

Preben 'Peppe' Guldberg

unread,
Feb 15, 2005, 5:30:07 PM2/15/05
to

In the example above, the OP removed all lines, except those that only
appeared once.

> > else
> > let l = l + 1
> > endif
> > endwhile

> > Remove comments and join the lines, seperated by |'s, and
> > you should have a neat oneliner.

> Indeed. I haven't properly tested these. None require
> sorting and they should all leave the lines in the buffer in
> the same order as before the command.

> This first one searches backwards for a match.

> :g/^/ kl | if search('\V\^' . escape(getline('.') '/\') . '\$',
> 'bW') | 'ld

Nice. For the OP, it should probably be adjusted to

:g/^/ kl | if search('\V\C' . escape(getline('.') '/\') . '\$',
'bW') | 'l,$d | endif

> (I think maybe Peppe wrote that one anyway, my memory is
> hazy.)

Don't know... You did seem to dig some old solution up, though, details
of which found it's way into the command above.

Preben 'Peppe' Guldberg

unread,
Feb 16, 2005, 4:29:38 AM2/16/05
to
Preben 'Peppe' Guldberg wrote:

> Nice. For the OP, it should probably be adjusted to

> :g/^/ kl | if search('\V\C' . escape(getline('.') '/\') . '\$',
> 'bW') | 'l,$d | endif

Ahem... I got carried away there.

The '1,$d part will blow away all lines in the range, not just the ones
that would be matched by the search(). To delete only the lines that
matches, we need :global - but then we are back to my initial point of
not being able to nest :global commands and having to script it (or use
some other magic).

Peppe [realising just before getting the sleep I should have had then]

0 new messages