Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Deleting first N lines from a text file

40 views
Skip to first unread message

pozz

unread,
Nov 14, 2011, 7:02:29 PM11/14/11
to
I want to delete the first N lines from a file text. I imagine two
approaches:
- use a temporary file to copy the last lines only
- use the same file to move characters starting from N+1 line to the
beginning

The temporary file could be more complex to write (at last I have to
delete the original file and rename the temporary file), but at any
moment I have a coherent text file. So this approach is safe if the
application crashes during the deleting process. If the application
crashes just after deleting the original text file but before renaming
the temporary file, during initialization I can detect this situation
and proceed with the renaming.

The second approach is simpler, but leaves a malformed text file on
the filesystem if the application crashes during the deleting process.

What do you think about those thoughts? Do you agree with me?

My "deleting first N lines" function is:

int text_delete(unsigned int N) {
FILE *f;
FILE *ftmp;
int c;
f = fopen(filename, "rt");
ftmp = fopen(tmpfilename, "wt");
if ((f == NULL) || (ftmp == NULL)) {
return -1;
}
while((c = fgetc(f)) != EOF) {
if ((char)c == '\n') {
if (--N == 0) break;
}
}
while((c = fgetc(f)) != EOF) {
fputc(c, ftmp);
}
fclose(f);
fclose(ftmp);
if (remove(filename) < 0) return -1;
if (rename(tmpfilename, filename) < 0) return -1;
return 0;
}

At initialization I try to open the text file or the temporary file;

int text_init(void) {
FILE *f;
f = fopen(filename, "rt");
if (f == NULL) {
/* Does the temporary file exist? */
f = fopen(tmpfilename, "rt");
if (f != NULL) {
/* Yes!, recover temporary file */
fclose(f);
if (rename(tmpfilename, filename) < 0) return -1;
} else {
/* Create an empty log file... */
f = fopen(filename, "wt");
if (f == NULL) return -1;
fclose(f);
}
} else {
fclose(f);
}
return 0;
}

Roberto Waltman

unread,
Nov 14, 2011, 7:21:38 PM11/14/11
to
pozz wrote:
>I want to delete the first N lines from a file text.
>...
>The second approach is simpler,...
>...
>What do you think about those thoughts?

Only that the second approach is not simpler.
Also, depending on the underlying OS, it may not be possible to read
from and write to the same file as you propose.

--
Roberto Waltman

[ Please reply to the group,
return address is invalid ]
Message has been deleted

Ben Pfaff

unread,
Nov 14, 2011, 7:36:35 PM11/14/11
to
Acid Washed China Blue Jeans <chine...@yahoo.com> writes:

> In article <mub3c71nfmmvbvmcb...@4ax.com>,
> Roberto Waltman <use...@rwaltman.com> wrote:
>
>> pozz wrote:
>> >I want to delete the first N lines from a file text.
>> >...
>> >The second approach is simpler,...
>> >...
>> >What do you think about those thoughts?
>>
>> Only that the second approach is not simpler.
>> Also, depending on the underlying OS, it may not be possible to read
>> from and write to the same file as you propose.
>
> Fopen with "r+". If fopen succeeds, the library has promised
> you you are allowed to read and write an existing file.

However, writing in a text file may truncate it, see 7.19.3
"Files":

Whether a write on a text stream causes the associated file
to be truncated beyond that point is implementation-defined.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Gordon Burditt

unread,
Nov 14, 2011, 7:53:03 PM11/14/11
to
> I want to delete the first N lines from a file text. I imagine two
> approaches:
> - use a temporary file to copy the last lines only
> - use the same file to move characters starting from N+1 line to the
> beginning

You forgot something here: for the second method, you need to
shrink the file by the length of the first N lines, or you end up
with extra stuff at the end of the file (you might make the program
reading the file recognize it and ignore it). There is no portable
way to shrink a file to a size other than zero.

> The temporary file could be more complex to write (at last I have to
> delete the original file and rename the temporary file), but at any
> moment I have a coherent text file. So this approach is safe if the
> application crashes during the deleting process. If the application
> crashes just after deleting the original text file but before renaming
> the temporary file, during initialization I can detect this situation
> and proceed with the renaming.

> The second approach is simpler, but leaves a malformed text file on
> the filesystem if the application crashes during the deleting process.

IMHO, the second method *always* leaves extra garbage at the end
of the text file (assuming N > 0), unless you figure out some way
to remove it (unportable) or flag it as meaningless (e.g. set it
to a line consisting of all spaces or something).

There is no documented mode for fopen "rt" or "wt".

Roberto Waltman

unread,
Nov 14, 2011, 8:10:52 PM11/14/11
to
Acid Washed China Blue Jeans wrote:

>Fopen with "r+". If fopen succeeds, the library has promised you you are allowed
>to read and write an existing file.

In the general case, a write may truncate the file at the end of the
written data, so it may be OK to read from a location before the last
location written, but not after it.

And there may be environments in which fopen(..., "r+") always fails.

Eric Sosman

unread,
Nov 14, 2011, 10:06:41 PM11/14/11
to
On 11/14/2011 7:02 PM, pozz wrote:
> I want to delete the first N lines from a file text. I imagine two
> approaches:
> - use a temporary file to copy the last lines only

Do this.

> - use the same file to move characters starting from N+1 line to the
> beginning

Don't do this.

> The temporary file could be more complex to write (at last I have to
> delete the original file and rename the temporary file), but at any
> moment I have a coherent text file. So this approach is safe if the
> application crashes during the deleting process. If the application
> crashes just after deleting the original text file but before renaming
> the temporary file, during initialization I can detect this situation
> and proceed with the renaming.
>
> The second approach is simpler, but leaves a malformed text file on
> the filesystem if the application crashes during the deleting process.
>
> What do you think about those thoughts? Do you agree with me?

No, not at all. One problem with your supposedly simpler
solution: How do you tell subsequent readers of the file that they
should stop before reaching the end? Observe that <stdio.h> offers
no way to shorten an existing file to any length other than zero.

--
Eric Sosman
eso...@ieee-dot-org.invalid

jacob navia

unread,
Nov 15, 2011, 6:48:56 AM11/15/11
to
Using the containers library (and if your file fits in memory)

#include <containers.h>
int main(int argc,char *argv[])
{
if (argc != 3) {
printf("Usage: deletelines <file> <N>\n");
return -1;
}
strCollection *data = istrCollection.CreateFromFile(argv[1]);
if (data == NULL) return -1;
istrCollection.RemoveRange(data,0,atoi(argv[2]));
istrCollection.WriteToFile(data,argv[1]);
istrCollection.Finalize(data);
}

Giuseppe

unread,
Nov 15, 2011, 7:57:03 PM11/15/11
to
On 15 Nov, 04:06, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> > What do you think about those thoughts? Do you agree with me?
>
>      No, not at all.  One problem with your supposedly simpler
> solution: How do you tell subsequent readers of the file that they
> should stop before reaching the end?  Observe that <stdio.h> offers
> no way to shorten an existing file to any length other than zero.

Ok, I implemented the "temporary file" solution and it works well.
The
only disadvantage is time: when the file is big (1000 lines of about
50 bytes
each), the time to delete the first line could be very high.

Do you think the process could be reduced launching an external script
(for
example, 'head' based) with system()? If I redirect the output to the
original
filename I could avoid the time consuming process of copying the
original
to the temporary file.

Keith Thompson

unread,
Nov 15, 2011, 8:50:49 PM11/15/11
to
Giuseppe <giuseppe...@gmail.com> writes:
> On 15 Nov, 04:06, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>> > What do you think about those thoughts? Do you agree with me?
>>
>>      No, not at all.  One problem with your supposedly simpler
>> solution: How do you tell subsequent readers of the file that they
>> should stop before reaching the end?  Observe that <stdio.h> offers
>> no way to shorten an existing file to any length other than zero.
>
> Ok, I implemented the "temporary file" solution and it works well.
> The only disadvantage is time: when the file is big (1000 lines of
> about 50 bytes each), the time to delete the first line could be very
> high.

A text file of 1000 lines of 50 bytes each really isn't all that big.
The time to copy and rename it probably won't even be noticeable.

> Do you think the process could be reduced launching an external script
> (for example, 'head' based) with system()? If I redirect the output
> to the original filename I could avoid the time consuming process of
> copying the original to the temporary file.

The behavior of external program is outside the scope of the C language.

(But I'll mention that on Unix-like systems, running a command with its
input and output directed to the same file can cause serious problems;
it can easily end up reading a partially modified version of the file
instead of the original. And even if it works, it's likely going to be
doing the same thing you would have done in your program.)

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman

unread,
Nov 15, 2011, 9:48:10 PM11/15/11
to
On 11/15/2011 7:57 PM, Giuseppe wrote:
> On 15 Nov, 04:06, Eric Sosman<esos...@ieee-dot-org.invalid> wrote:
>>> What do you think about those thoughts? Do you agree with me?
>>
>> No, not at all. One problem with your supposedly simpler
>> solution: How do you tell subsequent readers of the file that they
>> should stop before reaching the end? Observe that<stdio.h> offers
>> no way to shorten an existing file to any length other than zero.
>
> Ok, I implemented the "temporary file" solution and it works well.
> The
> only disadvantage is time: when the file is big (1000 lines of about
> 50 bytes
> each), the time to delete the first line could be very high.

Fifty K shouldn't take long. Even on a system from forty years
ago it didn't take long. Even on paper tape, for goodness' sake, it
took less than a minute!

For "really big" files (terabytes) copying most of the file from
one place to another could take an unacceptably long time. Also, the
need to find space for a second nearly complete copy could be
troublesome. In such cases you'd be justified in seeking fancier
solutions -- but I sincerely doubt that "slide all those terabytes
a couple hundred positions leftward" would produce a savings. More
likely it would produce a slowdown, plus the risks you've already
mentioned about data loss in the event of an error. No, the fancier
solution would probably involve some kind of an index external to the
file, describing which parts of the file were "live" and which "dead,"
and fancier routines to read just the live parts.

> Do you think the process could be reduced launching an external script
> (for
> example, 'head' based) with system()? If I redirect the output to the
> original
> filename I could avoid the time consuming process of copying the
> original
> to the temporary file.

First, just what do you imagine the "head" program does, hmmm?

However, on the systems I've encountered that provide a "head"
utility and support "redirection," your solution is likely to run
very quickly indeed. And save a lot of disk space, too! (Hint:
Try it yourself: `head <foo.txt >foo.txt', then `ls -l foo.txt',
and then you get to test your backups ...)

But all this is mostly beside the point. You are worried about
the time to copy 50K bytes: Have you *measured* the time? Have you
actually found it to be a problem for your application? Or are you
just imagining monsters under your bed? The fundamental theorem of
all optimization is There Are No Monsters Until You've Measured Them.

--
Eric Sosman
eso...@ieee-dot-org.invalid

pozz

unread,
Nov 16, 2011, 2:11:43 AM11/16/11
to
On 16 Nov, 02:50, Keith Thompson <ks...@mib.org> wrote:
> Giuseppe <giuseppe.modu...@gmail.com> writes:
> > Ok, I implemented the "temporary file" solution and it works well.
> > The only disadvantage is time: when the file is big (1000 lines of
> > about 50 bytes each), the time to delete the first line could be very
> > high.
>
> A text file of 1000 lines of 50 bytes each really isn't all that big.
> The time to copy and rename it probably won't even be noticeable.

It takes about 100ms to finish the shrink procedure. It's not a long
time
on a desktop PC, but I'm working on ambedded Linux based on ARM9
processor.

The slowest part of my application is this. Anyway I'm thinking if
there
are some simple improvements to reduce the time taken by this task.


> > Do you think the process could be reduced launching an external script
> > (for example, 'head' based) with system()?  If I redirect the output
> > to the original filename I could avoid the time consuming process of
> > copying the original to the temporary file.
>
> The behavior of external program is outside the scope of the C language.

Oh, I now, I was asking for on "off-topic" opinion :-)


> (But I'll mention that on Unix-like systems, running a command with its
> input and output directed to the same file can cause serious problems;
> it can easily end up reading a partially modified version of the file
> instead of the original.  And even if it works, it's likely going to be
> doing the same thing you would have done in your program.)

Ok, I'll not try.

pozz

unread,
Nov 16, 2011, 2:14:23 AM11/16/11
to
On 16 Nov, 03:48, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 11/15/2011 7:57 PM, Giuseppe wrote:
> > Ok, I implemented the "temporary file" solution and it works well.
> > The
> > only disadvantage is time: when the file is big (1000 lines of about
> > 50 bytes
> > each), the time to delete the first line could be very high.
>
>      Fifty K shouldn't take long.  Even on a system from forty years
> ago it didn't take long.  Even on paper tape, for goodness' sake, it
> took less than a minute!

100ms (see my answer to Keith above). It's not too much, but I was
thingking
about improvements.


>      For "really big" files (terabytes) copying most of the file from
> one place to another could take an unacceptably long time.  Also, the
> need to find space for a second nearly complete copy could be
> troublesome.  In such cases you'd be justified in seeking fancier
> solutions -- but I sincerely doubt that "slide all those terabytes
> a couple hundred positions leftward" would produce a savings.  More
> likely it would produce a slowdown, plus the risks you've already
> mentioned about data loss in the event of an error.  No, the fancier
> solution would probably involve some kind of an index external to the
> file, describing which parts of the file were "live" and which "dead,"
> and fancier routines to read just the live parts.

Ok.


> > Do you think the process could be reduced launching an external script
> > (for
> > example, 'head' based) with system()?  If I redirect the output to the
> > original
> > filename I could avoid the time consuming process of copying the
> > original
> > to the temporary file.
>
>      First, just what do you imagine the "head" program does, hmmm?
>
>      However, on the systems I've encountered that provide a "head"
> utility and support "redirection," your solution is likely to run
> very quickly indeed.  And save a lot of disk space, too!  (Hint:
> Try it yourself: `head <foo.txt >foo.txt', then `ls -l foo.txt',
> and then you get to test your backups ...)

:-)

Phil Carmody

unread,
Nov 16, 2011, 4:04:30 AM11/16/11
to
Acid Washed China Blue Jeans <chine...@yahoo.com> writes:
> In article <mub3c71nfmmvbvmcb...@4ax.com>,
> Roberto Waltman <use...@rwaltman.com> wrote:
> > pozz wrote:
> > >I want to delete the first N lines from a file text.
> > >...
> > >The second approach is simpler,...
> > >...
> > >What do you think about those thoughts?
> >
> > Only that the second approach is not simpler.
> > Also, depending on the underlying OS, it may not be possible to read
> > from and write to the same file as you propose.
>
> Fopen with "r+". If fopen succeeds, the library has promised you you are allowed
> to read and write an existing file.

Being allowed to write to it at the point that you open the file
doesn't mean that it's possible to write to the file at any point
later in time.

Think wire-cutters.

Phil
--
Unix is simple. It just takes a genius to understand its simplicity
-- Dennis Ritchie (1941-2011), Unix Co-Creator

jgharston

unread,
Nov 16, 2011, 7:11:44 AM11/16/11
to
pozz wrote:
> It takes about 100ms to finish the shrink procedure.  It's not a long
> time on a desktop PC, but I'm working on ambedded Linux based on ARM9
> processor.

Are you doing it byte by byte? Try buffering it, even chunks of
16 bytes at a time will speed it up significantly. What's the
biggest chunk of memory you can claim, use, release without
memory fragmentation impacting your program more than acceptably?

JGH

-.-

unread,
Nov 16, 2011, 8:01:18 AM11/16/11
to
jacob navia was trying to save the world with his stuff:

> Using the containers library (and if your file fits in memory)
>
> #include <containers.h>

You self-celebrating fucko. There only exist your things to you:
that silly lcc-win and your funny containers.
Stop making this newsgroup your personal advertisements page.


jacob navia

unread,
Nov 16, 2011, 8:44:47 AM11/16/11
to
Le 16/11/11 14:01, -.- a écrit :
> jacob navia was trying to save the world with his stuff:
>
>> Using the containers library (and if your file fits in memory)
>>
>> #include <containers.h>
>
> You self-celebrating fucko.

That is why you hide behind a pseudo, because you have the courage of
your opinions...

BartC

unread,
Nov 16, 2011, 9:26:28 AM11/16/11
to


"pozz" <pozz...@gmail.com> wrote in message
news:f06508a1-a424-4bc7...@w7g2000yqc.googlegroups.com...
> On 16 Nov, 03:48, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>> On 11/15/2011 7:57 PM, Giuseppe wrote:
>> > Ok, I implemented the "temporary file" solution and it works well.
>> > The
>> > only disadvantage is time: when the file is big (1000 lines of about
>> > 50 bytes
>> > each), the time to delete the first line could be very high.
>>
>> Fifty K shouldn't take long. Even on a system from forty years
>> ago it didn't take long. Even on paper tape, for goodness' sake, it
>> took less than a minute!

(That's a fast paper tape reader. The last one I used would have taken
nearly 3 hours.)

> 100ms (see my answer to Keith above). It's not too much, but I was
> thingking
> about improvements.

How long for a file containing ten lines instead of 1000? How long for
double the number of lines?

That will tell you the overheads involved and the fastest speed achievable.

While you're about, how long does it take to create a file, write 50,000
bytes to it (of anything) and close it? And how long to read such a file?

Take care when taking measurements, to eliminate the effects of
disk-caching.

--
Bartc

jgharston

unread,
Nov 16, 2011, 9:42:40 AM11/16/11
to
Try replacing:
>         while((c = fgetc(f)) != EOF) {
>                 fputc(c, ftmp);
>         }

with:
bsize=m_free(0);
buff=m_alloc(bsize);
numread=-1;

while(numread) {
numread=fread(buff,1,bsize,f);
fwrite(buff,1,numread,ftmp);
}
m_free(buff);

As with usenet tradition, completely untested.

JGH

jgharston

unread,
Nov 16, 2011, 10:43:01 AM11/16/11
to
jgharston wrote:
>         bsize=m_free(0);
>         buff=m_alloc(bsize);

Following up my own post, that call to m_free(0) is supposed to
return a size of a free block that can subsequently be claimed
with m_alloc(). A bit of a skim of through the web shows that
functionality isn't in any of the malloc libraries documented
there. All I can say is it worked 25 years ago! and inspired
me to include that functionality in my own malloc library.

Just replace bsize=m_free(0) with a suitable bsize=(some
method of deciding an amount of memory to claim).

JGH

Keith Thompson

unread,
Nov 16, 2011, 2:13:09 PM11/16/11
to
Leaving aside the m_free and m_alloc calls, why do you assume that this
will be significantly faster than the fgetc/fputc loop? stdio does its
own buffering.

jgharston

unread,
Nov 16, 2011, 4:50:30 PM11/16/11
to
Keith Thompson wrote:
> Leaving aside the m_free and m_alloc calls, why do you assume that this
> will be significantly faster than the fgetc/fputc loop?  stdio does its
> own buffering.

As I recall, this was a standard exam question back when I worra
litt'un.
If doing bulk data copying a program buffer is likely to be bigger
than stdio's buffer and bulk read/write/read/write is more efficient
for simple chucking of large lumps of data from one place to another,
one bit being the skipping of fgetc's unget functionality.

JGH

Gordon Burditt

unread,
Nov 16, 2011, 5:28:39 PM11/16/11
to
> Do you think the process could be reduced launching an external script
> (for
> example, 'head' based) with system()? If I redirect the output to the
> original
> filename I could avoid the time consuming process of copying the
> original
> to the temporary file.

A command like:

head -n 99 x.txt > x.txt

(This particular command does not delete the first N lines; it keeps
only the first N lines).

has a tendancy to produce a zero-length file because the shell opens
the output file (truncating it, as in fopen(..., "w") ) before the
"head" program starts running. Then "head" reads an empty x.txt,
and outputs the first 99 lines (except only 0 exist) of it.

You need to sequence things so the file isn't truncated until *after*
you have retrieved any data that might be needed from it. I also
consider the assumption that the contents of a file can fit in
memory (e.g. a 4.7GB video file fitting in memory on a 32-bit-address
machine) to be questionable unless you have prior knowledge of what
kinds of files will be fed to the program.

Kaz Kylheku

unread,
Nov 16, 2011, 5:53:25 PM11/16/11
to
On 2011-11-16, Gordon Burditt <gordon...@burditt.org> wrote:
>> Do you think the process could be reduced launching an external script
>> (for
>> example, 'head' based) with system()? If I redirect the output to the
>> original
>> filename I could avoid the time consuming process of copying the
>> original
>> to the temporary file.
>
> A command like:
>
> head -n 99 x.txt > x.txt
>
> (This particular command does not delete the first N lines; it keeps
> only the first N lines).
>
> has a tendancy to produce a zero-length file because the shell opens
> the output file (truncating it, as in fopen(..., "w") ) before the
> "head" program starts running. Then "head" reads an empty x.txt,
> and outputs the first 99 lines (except only 0 exist) of it.

This could be fixed with a fictitious utility:

head -n 99 x.txt | wne x.txt # 'write if not empty'

wne reads standard input and copies to the named file, but does
not create the file if the input is empty.

> You need to sequence things so the file isn't truncated until *after*
> you have retrieved any data that might be needed from it.

"wne" could also have the behavior of delayed truncation
in addition to delayed creation. ("dtc"?)

This could be used in any situation where a file is being
filtered in place such that characters written
to an earlier position are all derived from the same or
later positions.

Gordon Burditt

unread,
Nov 16, 2011, 7:34:03 PM11/16/11
to
>> A command like:
>>
>> head -n 99 x.txt > x.txt
>>
>> (This particular command does not delete the first N lines; it keeps
>> only the first N lines).
>>
>> has a tendancy to produce a zero-length file because the shell opens
>> the output file (truncating it, as in fopen(..., "w") ) before the
>> "head" program starts running. Then "head" reads an empty x.txt,
>> and outputs the first 99 lines (except only 0 exist) of it.
>
> This could be fixed with a fictitious utility:
>
> head -n 99 x.txt | wne x.txt # 'write if not empty'

Or, more generally:
arbitrary_filter x.txt | wne x.txt
as a replacement for:
arbitrary_filter < x.txt > x.txt
which won't work because x.txt gets clobbered before it is read.

> wne reads standard input and copies to the named file, but does
> not create the file if the input is empty.

"wne" must not write to the file until it knows that whatever is sending
it data has finished reading it. How can it know that? When whatever
is sending it data closes the pipe, and it sees end-of-file on its stdin.

This means that "wne" (when the file is not empty) must keep reading and
storing (somewhere) data until it sees end-of-file, THEN it can write
the data. If it won't fit in memory, then it has to use a temporary file.

Ok, count up the I/O.
Original "copy and copy-back": 2 reads, 2 writes of whole file.
Alternative "copy and rename back": 1 read, 1 write of whole file.
Original "modify in place": 1 read, 1 write of whole file.
Method with external process: 2 reads, 2 writes of whole file if it
fits in memory, otherwise 3 reads, 3 writes.

I think we just managed to replace something horribly slow with
something even worse.

"wne" also has a vulnerability to data loss: if "arbitrary_filter"
crashes, then partial data written down the pipe gets put into the
output file, with no chance for "wne" to detect a problem unless it
knows what the data is supposed to look like.

I'll also challenge the function description "write if not empty".
If the objective is to run a file through an arbitrary filter, then
put the output back into the file, nobody said that putting an empty
file *in* gets you an empty file *out*. (Example: a program like
the UNIX "pr" which paginates files, with header lines, page numbers
and perhaps time stamps, will probably output something even for
empty files. Or perhaps a cross-reference program that generates an
index of where every identifier is used, which still generates the
index page with an empty file.)

> This could be used in any situation where a file is being
> filtered in place such that characters written
> to an earlier position are all derived from the same or
> later positions.

The UNIX "sort" utility has a solution for this (and sorting something
with the output back in the original file is fairly common):

sort x.txt > x.txt
will clobber x.txt, but
sort -o x.txt x.txt
will not. "sort" is likely to use temporary files anyway (perhaps
hundreds of them depending on the size of the input), so it can
ensure it has read all of the input before opening the output.

Eric Sosman

unread,
Nov 16, 2011, 9:32:27 PM11/16/11
to
On 11/16/2011 9:26 AM, BartC wrote:
> "pozz" <pozz...@gmail.com> wrote in message
> news:f06508a1-a424-4bc7...@w7g2000yqc.googlegroups.com...
>> On 16 Nov, 03:48, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>>>[...]
>>> Fifty K shouldn't take long. Even on a system from forty years
>>> ago it didn't take long. Even on paper tape, for goodness' sake, it
>>> took less than a minute!
>
> (That's a fast paper tape reader. The last one I used would have taken
> nearly 3 hours.)

http://www.springerlink.com/content/x42n45gk4811lpq1/ from 1963
describes a "high speed paper tape reader with a maximum speed of
1000 characters per second." 50KB / (1000 B/s) = 50s < one minute.

However, I realize I've erred, and in two ways. First, I've
neglected the time to punch the paper tape, an operation a good
deal slower than reading it. Second, I've overlooked an O(1)
solution: Take the original file, on paper tape, and apply a pair
of scissors to remove the first N lines -- no copying involved,
although there might be some difficulty finding enough "leader"
to feed into the read mechanism next time the data is wanted ...

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ian Collins

unread,
Nov 16, 2011, 9:43:49 PM11/16/11
to
On 11/17/11 03:32 PM, Eric Sosman wrote:
> On 11/16/2011 9:26 AM, BartC wrote:
>> "pozz"<pozz...@gmail.com> wrote in message
>> news:f06508a1-a424-4bc7...@w7g2000yqc.googlegroups.com...
>>> On 16 Nov, 03:48, Eric Sosman<esos...@ieee-dot-org.invalid> wrote:
>>>> [...]
>>>> Fifty K shouldn't take long. Even on a system from forty years
>>>> ago it didn't take long. Even on paper tape, for goodness' sake, it
>>>> took less than a minute!
>>
>> (That's a fast paper tape reader. The last one I used would have taken
>> nearly 3 hours.)
>
> http://www.springerlink.com/content/x42n45gk4811lpq1/ from 1963
> describes a "high speed paper tape reader with a maximum speed of
> 1000 characters per second." 50KB / (1000 B/s) = 50s< one minute.

That modern? The machine that preceded Colossus at Bletchley Park in
the early 40s could read paper tape at 1000 characters per second!

--
Ian Collins

Robert Wessel

unread,
Nov 16, 2011, 11:46:05 PM11/16/11
to
That's a non-issue. We repaired paper tape semi-regularly - there was
special tape (with all the holes punched), and an alignment tool for
aligning two pieces of paper tape, cutting them, and applying the
splicing tape. In a repair, you'd repunch a section that fully
overlapped the damaged area, and then splice that on to the sections
to either side.

If you've ever seen a (motion picture) film splicing rig, it's
basically the same idea.

The high speed reader/punches were, however, a lot less common than
slow ones. Perhaps the single most popular reader/punch was the
optional one on ASR-33s. 10cps either reading or writing (although it
could do both at the same time). The fast readers were capable of
eating a few yards of tape before being halted by a jam. More
commonly, the sprocket/drive holes would get torn out (which you could
repair with the special tape, but without actually splicing in new
tape).

BartC

unread,
Nov 17, 2011, 5:35:24 AM11/17/11
to


"Ian Collins" <ian-...@hotmail.com> wrote in message
news:9ijan5...@mid.individual.net...
I've seen Colossus itself in action (the replica, not the original!) and
apparently it could read paper tape at 5000 characters per second.

I think that might have been the program itself, on an endless loop.

But the one I was thinking of belonged to a teletype, which, even if the
read data was not printed but read into the host, was limited to 10cps.
However if it could read and write at the same time, then it would only have
taken 80 minutes or so for 50KB.

--
Bartc

tom st denis

unread,
Nov 17, 2011, 1:43:47 PM11/17/11
to
While rude the guy has a point. You can accomplish this goal easily
with something like

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
int skip;
char buf[2048];
skip = 0;
if (argc == 2) {
skip = atoi(argv[1]);
}
while (skip-- && fgets(buf, sizeof buf, stdin) != NULL);
while (fgets(buf, sizeof buf, stdin) != NULL) {
fputs(buf, stdout);
}
return 0;
}

This doesn't require buffering the entire file, it's way more
portable, it's about the same length as your original "solution," etc.

IOW you're just posting to advertise your wares... in a C group where
your offerings are not C....

Stop spamming USENET and maybe people will treat you better.

Tom

Jens Thoms Toerring

unread,
Nov 18, 2011, 6:12:58 PM11/18/11
to
I don't understand your complaints. As far as I can see Mr. Navia's
container library is written in standard compliant C - I just tried
to compile his program after minor modifications (added an include
for <stdio.h> and replaced the angle brackets around "containers.h"
by double quotes since the header file isn't in the systems include
directory on my system) with

gcc -std=c99 -pedantic -Wextra -Wall -Wwrite-strings

and there were no complaints at all. For C89 the main complaint
was the missing support for the 'long long' type used in the
library and a few minor niggles about two '//' comments in the
'containers.h" header file and about mixing of declarations and
code, finally, the missing return statement at the end - nothing
of any seriousness. (And if you try to compile the library it-
self in strict C89 mode all the compilers complaints seem to be
on the same level of seriousness, i.e. very low and not diffi-
cult to address.)

Further, the library is under a very permissive license, BSD,
so there are no strings attached.

So all I can see is that Mr. Navia proposed to use a library,
written by him and made available to the public, as a possible
solution for the problem the OP has. He didn't even mention
were his container library can be found - if the OP or others
are interested it's at

http://code.google.com/p/ccl/

If that is spamming then a lot of other posters in this group
are guilty of the same when they dare to mention that they have
written some functions or library others might use freely for a
problem they want to solve. Just because Mr. Navia is also the
author of a compiler you can pay him money for IMHO shouldn't
bare him from mentioning code he wrote and made available for
free and taking part in trying to help others, or should it?

The only valid point I can see in your post is the question if
it's an optimal solution having to read in all of the file into
memory. But then this was already explicitly pointed out by Mr.
Navia himself as a requirement (and thus a possible shortcoming)
of using his solution - and for a lot of cases I don't think it's
a complete showstopper.

On the other hand your solution doesn't address the second re-
quirement of the OP, i.e. that the input file itself is to be
changed on exit of the program - your program reads from stdin
and writes to stdout, so it's a simple filter. In contrast, Mr.
Navia's program does also handle this other requirement of the
OP in changing the input file itself with a very similar number
of lines of code the user has to type...

Finally, I've got to say that I find Mr. Navia's solution rather
easy to comprehend without even having read the documentation for
his library (which I will do now;-). It seems to be a rather nice
example of how using it could make writing as well as understanding
certain types of programs in C quite a bit easier.

Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

Dr Nick

unread,
Dec 16, 2011, 2:05:12 AM12/16/11
to
"BartC" <b...@freeuk.com> writes:

> "Ian Collins" <ian-...@hotmail.com> wrote in message
> news:9ijan5...@mid.individual.net...
>> On 11/17/11 03:32 PM, Eric Sosman wrote:
>>> On 11/16/2011 9:26 AM, BartC wrote:
>>>> "pozz"<pozz...@gmail.com> wrote in message
>>>> news:f06508a1-a424-4bc7...@w7g2000yqc.googlegroups.com...
>>>>> On 16 Nov, 03:48, Eric Sosman<esos...@ieee-dot-org.invalid> wrote:
>>>>>> [...]
>>>>>> Fifty K shouldn't take long. Even on a system from forty years
>>>>>> ago it didn't take long. Even on paper tape, for goodness' sake, it
>>>>>> took less than a minute!
>>>>
>>>> (That's a fast paper tape reader. The last one I used would have taken
>>>> nearly 3 hours.)
>>>
>>> http://www.springerlink.com/content/x42n45gk4811lpq1/ from 1963
>>> describes a "high speed paper tape reader with a maximum speed of
>>> 1000 characters per second." 50KB / (1000 B/s) = 50s< one minute.
>>
>> That modern? The machine that preceded Colossus at Bletchley Park
>> in the early 40s could read paper tape at 1000 characters per
>> second!
>
> I've seen Colossus itself in action (the replica, not the original!)
> and apparently it could read paper tape at 5000 characters per second.
>
> I think that might have been the program itself, on an endless loop.

That loop is the message that is being attacked. The machine runs the
message through, does some operations on it and carries out a set of
counts on the results. If the answer is "interesting" it outputs the
fact, otherwise it advances its internal state and repeats. The
internal state is on uniselctors - electromechanically - and so you get
a "clunk" once per tape loop as it advances. The counters are valve for
speed.
--
Online waterways route planner | http://canalplan.eu
Plan trips, see photos, check facilities | http://canalplan.org.uk
0 new messages