Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to implement line sorting, uniquifying and counting function in emacs?

3 views
Skip to first unread message

gnuist006

unread,
Sep 29, 2002, 10:13:06 PM9/29/02
to
In shell you can do this:

cat file | sort | uniq -d | wc

to count the repeated lines. You can also do

cat file | sort | uniq -u | wc

to count the unique lines.

Sometimes I have to do this on windows platform where I do have emacs.
This means that I cannot escape to shell and that route is not available.

Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
know the equivalent to wc.

This is where some help is requested. I think that this is not only a
problem of lisp programming, but also algorithms. Which group has this
kind of expertise?

Cheers!
gnuist

Evgeny Roubinchtein

unread,
Sep 30, 2002, 1:15:16 AM9/30/02
to

,----

| In shell you can do this:
| cat file | sort | uniq -d | wc
|
| to count the repeated lines. You can also do
|
| cat file | sort | uniq -u | wc
|
| to count the unique lines.
|
| Sometimes I have to do this on windows platform where I do have emacs.
| This means that I cannot escape to shell and that route is not available.
|
| Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
| know the equivalent to wc.
`----

Assuming the text you are interested in is in a buffer, one apprach is
to use the `sort-lines' function. Once the lines are sorted, it's
pretty easy to count unique and non-unique lines. That's one
approach.

(defun count-repeated-lines (&optional beg end)
(let ((buf (current-buffer))
(repeated-count 0)
(unique-count 0)
(cur-line nil)
(prev-line nil))
(with-temp-buffer
(insert-buffer-substring buf
(and beg
(with-current-buffer buf
(save-excursion
(goto-char beg)
(line-beginning-position))))
end)
(sort-lines nil (point-min) (point-max))
;; put a dummy line before the text to make the loop simpler
(goto-char (point-min))
(insert "\n")
(goto-char (point-min))
(while (and (zerop (forward-line 1)) (/= (point) (point-max)))
(setq cur-line (buffer-substring-no-properties (point)
(save-excursion (end-of-line)
(point))))
(if (and prev-line (string= prev-line cur-line))
(setq repeated-count (1+ repeated-count))
(setq unique-count (1+ unique-count)))
(setq prev-line cur-line))
(cons unique-count repeated-count))))

Instead of sorting lines, you could use Emacs built-in hash tables
(built-in as of GNU Emacs v21, not sure what version of XEmacs first
introduced hash tables) to keep track of lines you've encountered so
far. (You also don't need a temporary buffer in that case).

(defun count-repeated-lines (&optional beg end)
(let ((buf (current-buffer))
(beg (or (and beg (save-excursion (goto-char beg) (line-beginning-position)))
(point-min)))
(end (or end (point-max)))
(lines-hash (make-hash-table :test #'equal))
(unique-count 0)
(repeated-count 0)
(cur-line nil))
(save-excursion
(goto-char beg)
(beginning-of-line)
(while (< (point) end)
(setq cur-line (buffer-substring-no-properties (point)
(save-excursion (end-of-line)
(point))))
(if (gethash cur-line lines-hash)
(setq repeated-count (1+ repeated-count))
(setq unique-count (1+ unique-count))
(puthash cur-line t lines-hash))
(forward-line))
(cons unique-count repeated-count ))))

Evgeny Roubinchtein

unread,
Sep 30, 2002, 1:23:38 AM9/30/02
to
Oops... Just noticed I didn't really need to insert a dummy newline
when using a temp buffer.

(defun count-repeated-lines (&optional beg end)
(let ((buf (current-buffer))
(repeated-count 0)
(unique-count 0)
(cur-line nil)
(prev-line nil))
(with-temp-buffer
(insert-buffer-substring buf
(and beg
(with-current-buffer buf
(save-excursion
(goto-char beg)
(line-beginning-position))))
end)
(sort-lines nil (point-min) (point-max))

(goto-char (point-min))
(while (/= (point) (point-max))


(setq cur-line (buffer-substring-no-properties (point)
(save-excursion (end-of-line)
(point))))
(if (and prev-line (string= prev-line cur-line))
(setq repeated-count (1+ repeated-count))
(setq unique-count (1+ unique-count)))
(setq prev-line cur-line)

(forward-line))
(cons unique-count repeated-count))))

Marc Spitzer

unread,
Sep 30, 2002, 3:04:51 AM9/30/02
to
gnui...@hotmail.com (gnuist006) wrote in
news:b00bb831.02092...@posting.google.com:

if elisp has hashes do the following:
1: open file
2: for each line set it as the key of the hash
and add 1 to the previous value, first time
set it to 1
3a: for the uniq -u count the number of keys
3b: for the uniq -d for each value > 1 add it to
a total then print the total
3c: for the truely uniq lines, value == 1, count
the number of keys who have a value == 1 and
print

marc

Jens Schmidt

unread,
Sep 30, 2002, 4:12:56 AM9/30/02
to
Sorting:

M-x apropos sort-.* RET

Counting:

M-x apropos count.*lines RET

The only non-trivial part is uniquifying of buffer lines:

M-x query-replace-regexp ^\(.*^Q^J\)\1+ \1 RET

where you need to type ^Q^J as C-q C-j, of course. A non-interactive
variant should be as easy as the interactive.

Kaz Kylheku

unread,
Sep 30, 2002, 12:14:36 PM9/30/02
to
gnui...@hotmail.com (gnuist006) wrote in message news:<b00bb831.02092...@posting.google.com>...
> Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
> know the equivalent to wc.

Lisp does not have sort-lines. *Emacs* Lisp has sort-lines. Please do
not include the comp.lang.lisp newsgroup in Emacs Lisp discussions.

Think before you crosspost; your question ought to have been directed
to the Emacs newsgroup only.

Steven M. Haflich

unread,
Oct 1, 2002, 4:00:01 AM10/1/02
to
gnuist006 wrote:

> Lisp has sort-lines, but no uniq -u or uniq -d available. Also I do not
> know the equivalent to wc.

A pure Common Lisp equivalent is the following, reading standard-input:

(loop with last-line
for line in (sort (loop as x = (read-line *standard-input* nil nil)
while x collect x)
#'string<)
unless (equal last-line line)
count 1
do (setf last-line line))

Probably not want you wanted. Probably meaningless to you.

0 new messages