text processing 5 thousand files: find-file vs with-temp-buffer

75 views
Skip to first unread message

Xah Lee

unread,
Dec 19, 2011, 3:32:16 PM12/19/11
to
#emacs #lisp processing 5 thousand files. find-file vs with-temp-
buffer/with-temp-file.

Using find-file to open 5565 files, with font-lock-mode off, backup
off etc, takes 10 min plus 8 min garbage collection. Total wall clock
time is 18 minutes.

Using with-temp-buffer, 22 seconds.

Moral: when doing batch text processing of thousands of files, don't
use find-file, use with-temp-buffer or with-temp-file instead.

https://plus.google.com/b/113859563190964307534/113859563190964307534/posts/AdmLCjPaGbT

Xah

Stefan Monnier

unread,
Dec 19, 2011, 8:26:52 PM12/19/11
to
> #emacs #lisp processing 5 thousand files. find-file vs with-temp-
> buffer/with-temp-file.

> Using find-file to open 5565 files, with font-lock-mode off, backup
> off etc, takes 10 min plus 8 min garbage collection. Total wall clock
> time is 18 minutes.

> Using with-temp-buffer, 22 seconds.

I hadn't seen such a measure until now, and it's quite impressive.
My gut feeling is that setting vc-handled-backends to nil should recover
a large part of the speed, tho looking for .dir-locals.el could still
eat up a non-negligible fraction.

> Moral: when doing batch text processing of thousands of files, don't
> use find-file, use with-temp-buffer or with-temp-file instead.

Agreed (unless of course you need the major-mode to be set properly).
Also because find-file on some files can cause unexpected side-effects
(think of opening .tar or .pdf files).


Stefan

Xah Lee

unread,
Dec 19, 2011, 10:17:06 PM12/19/11
to
On Dec 19, 5:26 pm, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
> > #emacs #lisp processing 5 thousand files. find-file vs with-temp-
> > buffer/with-temp-file.
> > Using find-file to open 5565 files, with font-lock-mode off, backup
> > off etc, takes 10 min plus 8 min garbage collection. Total wall clock
> > time is 18 minutes.
> > Using with-temp-buffer, 22 seconds.
>
> I hadn't seen such a measure until now, and it's quite impressive.
> My gut feeling is that setting vc-handled-backends to nil should recover
> a large part of the speed, tho looking for .dir-locals.el could still
> eat up a non-negligible fraction.

missed your comment about setting vc-handled-backends. Thanks. Will
try that and see how it goes.

Xah

Xah Lee

unread,
Dec 19, 2011, 10:13:00 PM12/19/11
to

On Dec 19, 5:26 pm, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
i haven't traced down what exactly is causing the slow down. I thought
turning off font-lock must be it, but apparently not. Any got any idea
what exactly might it be?

btw, this is all html files, default to html-mode. And also running it
while emacs is open (e.g. load the byte compiled script in emacs; as
opposed to running it with emacs --script in terminal.)

when using find-file, after the script finished in 10 min, emacs froze
for another 8 minutes. Is this really garbage collection or is emacs
running some queue'd cleanup due to find-file?

if anyone wants to test, here's the 2 versions i used that produced
the reported timings:

;; find-file version
;; with backup turned off, font-lock-mode off, recentf off. (and no
;; tabbar mode and or anything i can think of but might have missed)
(defun my-process-file (fpath destBuff)
"process the file at fullpath fpath.
Write result to buffer destBuff."
(let (fBuf)
(when (not (string-match "/xx" fpath)) ; skip dir/file starting
with xx
(setq fBuf (find-file fpath)) ; open file
(goto-char 1)
(when (not (search-forward "<meta http-equiv=\"refresh\"" nil
"noerror"))
(with-current-buffer destBuff ; insert url to sitemap buffer
(insert "<url><loc>")
(insert (concat "http://xahlee.org/" (substring fpath
(length webroot))))
(insert "</loc></url>\n")
))
(kill-buffer fBuf) ; close file
)))

;; with-temp-buffer version
(defun my-process-file (fPath destBuff)
"Process the file at fullpath FPATH.
Write result to buffer DESTBUFF."
(when (not (string-match "/xx" fPath)) ; dir/file starting with xx
are not public
(with-temp-buffer
(insert-file-contents fPath nil nil nil t)
(goto-char 1)
(when (not (search-forward "<meta http-equiv=\"refresh\"" nil
"noerror"))
(with-current-buffer destBuff
(insert "<url><loc>")
(insert (concat "http://" domainName "/" (substring fPath
(length webroot))))
(insert "</loc></url>\n") )) ) ) )

the code basically just open a file, see if it contains a meta refresh
string, if not, write the file name into another buffer.

GNU Emacs 23.2.1 (i386-mingw-nt6.1.7601) of 2010-05-08 on G41R2F1

Xah

Xah Lee

unread,
Dec 21, 2011, 2:15:49 PM12/21/11
to
On Dec 19, 5:26 pm, Stefan Monnier <monn...@iro.umontreal.ca> wrote:
now full report at

〈Emacs Lisp Text Processing: find-file vs with-temp-buffer〉
http://xahlee.org/emacs/elisp_find-file_vs_with-temp-buffer.html

still haven't tried your idea of vc-handled-backends. Got other things
going on today. Will do so tomorrow.

Xah
Reply all
Reply to author
Forward
0 new messages