Gary Johnson <
gary...@spocom.com> wrote:
Yes, I was wondering about that too. But I checked and
tac only uses very little memory even on huge files.
Actually, tac behaves differently with an input file and with
an input stream:
* with an input file, it outputs results immediately. I suppose
tac reads the file by block from the end of the file and reverse
each blocks.
* with an input stream it can't do that, so it has to read the
full stream before it can output. Yet it uses very little memory.
It uses temporary files (confirmed by looking at /prof/<pid>/fd)
So tac solution works. But given that tac is more efficient on
an input file than on an input stream, changing the order
should be better. In other words, this...
$ tac input.txt | sed 1,10d | tac | sed 1,10d > output.txt
... should be faster than this:
$ sed 1,10d input.txt | tac | sed 1,10d | tac > output.txt
I measured it on a big file to confirm it:
first solution took 9.2 sec, second solution took 12.2 sec
In any case, the Perl solution that I gave and which use a
rotating buffer does one pass only, does not use much memory
and does not use temporary files either.
$ perl -ne 'print $l[$.%10] if ($. >= 10*2); $l[$.%10] = $_' input.txt
>output.txt
Yet this Perl solution is slower than tac. It takes 14.8 sec on
the same input file.
The strange looking solution...
$ sed -e :a -e '$d;N;2,10ba' -e 'P;D' input.txt > output.txt
... takes 6.0 sec.
Tim Chase wrote:
> I think you're reading it backwards, as head/tail (at least GNU
> versions; for other flavors, YMMV) allow for a "+" in front of the
> number so
>
> tail -n +20
>
> chops off the first 19 lines in the file; similarly, "-" in front of
> the number with head does all but the N last lines of the file. The
> example above should likely read something like
>
> tail -n +11 file.sql | head -n -10 > trimmed.sql
Right. My apologies. That works indeed and it's much faster.
It does not have to parse line by line with this solution I suppose.
With the same large input as above, it only took 2.6 sec.
-- Dominique