Re: change one line in a large fine

sup...@openmbox.net

unread,

Nov 20, 2022, 10:00:06 AM11/20/22

to Kang-min Liu, begi...@perl.org

May i ask a question about reading file?

while(0){ print 'hi' }

Will never print hi.

cat 1.txt:
0

open FH, '1.txt' or die;
while(<FH>) { print 'hi' }

This will print hi.

Since $_ == 0 here, why while become true?

I am confused about this.

Regards

November 20, 2022, 8:57 PM, "Kang-min Liu" <gu...@gugod.org> wrote:

>
> linu...@gmx.net writes:
>
> >
> > I have a large file which has millions of lines.
> > They are text only lines.
> > If I have to change one line in the file, what's the efficient way?
> > I don't want to slurp the whole file into memory, change that line and
> > write the full content back to disk again.
> >
>
> It seems like the editing is line-based, I could recommend checking out
> 'perl -i' for doing in-place editing, if you are OK with regexp-based
> search & replace -- it's basically the same as doing it in vim,
>
> perl -p -i.orig -e "s/hello/wow/" input.txt
>
> This finds the first line that matces regexp ^foo$ , then replaces the
> entire line with "foobar", shift the remainder of the file correctly and
> wrote everything back to input.txt -- while keep an original copy of input.txt at
> input.txt.orig
>
> However that may match on multiple lines, if you know the line number in
> advance, you could check the line number variable $.
>
> perl -p -i -e "s/^.+$/wow/ if $. == 2" input.txt
>
> Note that doing this would still scan the entire input.txt line by
> line. And adding `exit()` or `next` in the body of `-e` would make the
> program finish early but would also truncate input.txt -- which is
> probblay not what we want.
>
> ----
>
> Alternatively, if you are looking for doing this with some code but not
> with "perl" command, read on...
>
> How efficient it could be depends a little bit on how the target line is
> identified and the kind of editing that's required.
>
> For sure you could avoid slupring by doing doing line-based changes
> like:
>
> while (defined(my $line = <$fh>)) {
> ...
> }
>
> If the editing is replacing $line with something that's equal in length,
> then it can be pretty efficient -- just print the thing and the file is
> modified in place.
>
> If, say, we want to just just the 42nd line in the file, here's how I
> would do:
>
> # Open as read-write mode.
> open my $fh, "+<", "input.txt";
>
> # Seek to the beginning of 42nd line
> my $lineno = 1;
> while (defined(my $line = <$fh>)) {
> $lineno += 1;
> last if $lineno == 42;
> }
>
> # Print the new content at the begging of 42nd line.
> print $fh $newcontent;
>
> This is the most efficient scenario because the program can end here
> without reading the remainder of input.txt.
>
> However, if $newcontent is longer than the 42nd line, the program would
> still finish and when we inspect the file, we'll see that the text in
> $newcontext bleed over to the 43rd line and maybe further lines.
>
> Similarly, if the $newcontent is shorter, the original conten in the
> 42nd line will only be partially replaced.
>
> Most likely that's not the kind of editing we want to be doing.
>
> Meaninng, if $newcontent is longer or shorter, the remainder of the file
> should be shifted a few characters forward or backword and we want to
> re-print those lines back to $fh -- which also requires a lot of bookkeeping
> code just to get everything corner case right.
>
> If the editing we want is rather generic I'd say we probably want to put
> the output to a different file instead of doing in-place editing.
>
> And we will still end up slurping the entire file, but only keeping one
> line at a time in memory.
>
> --
> Cheers,
> Kang-min Liu
>
> --
> To unsubscribe, e-mail: beginners-...@perl.org
> For additional commands, e-mail: beginne...@perl.org
> http://learn.perl.org/
>

Ruprecht Helms (privat)

unread,

Nov 20, 2022, 10:15:05 AM11/20/22

to begi...@perl.org

Hi,

readline will be your friend.

Just have a look in the example mentioned this link:

https://www.tutorialspoint.com/perl/perl_readline.htm

Regards,
Ruprecht

Am 20.11.22 um 15:41 schrieb sup...@openmbox.net:

Kang-min Liu

unread,

Nov 20, 2022, 11:00:05 AM11/20/22

to sup...@openmbox.net, begi...@perl.org

sup...@openmbox.net writes:
> May i ask a question about reading file?
>
> while(0){ print 'hi' }
>
> Will never print hi.
>
> cat 1.txt:
> 0
>
> open FH, '1.txt' or die;
> while(<FH>) { print 'hi' }
>
> This will print hi.
>
> Since $_ == 0 here, why while become true?

This seems to be more about what are true-y and what are false-y.

If the content of 1.txt is one line containing a character "0" follewed
by the newline character, that means doing <FH> would make $_
contain "0\n", that's two characters, not one.

"0\n" would be a true-y value while it is still numerically the same as
0 when testing with the operator `==`. The `==` operator always convert
both of its operants to numerical values before doing the
comparison. Similarly `eq` operator always converts both operants to
string before doing the comparison.

There are a lot of ways to make variables being numerically 0 while also
being a true-y value. Most of them are strings with leading zeros. You
could try editing the following program to play around:

use strict;

$_ = "0\n";

print "true-y\n" if $_;

print "== 0\n" if $_ == 0;
print "eq 0\n" if $_ eq 0;

print "eq \"0\"\n" if $_ eq "0";
print "== \"0\"\n" if $_ == "0";

There are also a lot of values with leading "0" which would be true-y,
but fail all 4 other if-s. :-)

BTW, really, don't actually write `eq 0` or `== "0"`, they just look weird.

--
Cheers,
Kang-min Liu

sup...@openmbox.net

unread,

Nov 20, 2022, 7:00:05 PM11/20/22

to Kang-min Liu, begi...@perl.org

please see this ops:

$ echo -n 0 > 1.txt

1.txt has only one line without eof.

but the script below still got true for matching 0.

$ cat test.pl
use strict;

open HD,"1.txt" or die $!;
while(<HD>){
print "hi";
}

which will print hi.

can you help further?

Kang-min Liu

unread,

Nov 20, 2022, 8:00:06 PM11/20/22

to sup...@openmbox.net, begi...@perl.org

sup...@openmbox.net writes:

> please see this ops:
>
> $ echo -n 0 > 1.txt
>
> 1.txt has only one line without eof.
>
> but the script below still got true for matching 0.
>
> $ cat test.pl
> use strict;
>
> open HD,"1.txt" or die $!;
> while(<HD>){
> print "hi";
> }
>
>
> which will print hi.
>
> can you help further?
>

I see -- so it's really about how readline operator (<FH>) works, or how
it works within a while loop. I'm not sure if I could explain it better
than quoting some documentation.

Indeed the readline operator returns the line and may set $_ as side-effect,
but when being tested in `while` -- and only in `while` -- perl
compiler does something extra and put an `defined` operator in there.

If you follow link in the documentation of readline [1] to "perlop: I/O
Operatorns" [2], you'll find this statement saying:

Thus the following lines are equivalent:

while (defined($_ = <STDIN>)) { print; }
while ($_ = <STDIN>) { print; }
while (<STDIN>) { print; }
for (;<STDIN>;) { print; }
print while defined($_ = <STDIN>);
print while ($_ = <STDIN>);
print while <STDIN>;

[1]: https://perldoc.perl.org/functions/readline
[2]: https://perldoc.perl.org/perlop#I%2FO-Operators

And the paragraph right before them contains some text describing the
same thing.

Such effect may be verified by compiling test.pl with -MO=Deparse:

# perl -MO=Deparse test.pl
use strict;
die $! unless open HD, '1.txt';
while (defined($_ = readline HD)) {
print 'Hi';
}
test.pl syntax OK

You could see that <HD> is compiled to be "readline", and an extra
"defined" operator appears.

(See `perldoc B::Deparse` for more about -MO=Deparse)

You could furthermore play with it and see that no that extra "defined"
operator would be added to `if (<FH>)`, `unless (<FH>)`, `until (<FH>)`,
or when the while-condition is a bit more complicated (for
example: `while(! <FH>)`). This is a special case only for `while (<FH>)`

I guess this is made special to make it convenient for iterating through
the entire file line by line. Otherwise when line values are false-y by
chance, the loop ends early and mostly likely that'll be a surprise.

--
Cheers,
Kang-min Liu

sup...@openmbox.net

unread,

Nov 21, 2022, 8:00:06 AM11/21/22

to Kang-min Liu, begi...@perl.org

Thanks Kang-min. that makes things clear.
btw, I have uploaded my module to metacpan which is used to create my site openmbox.net. In the module I did need to read/write to a large file with high efficiency.

https://metacpan.org/pod/App::OpenMbox

if you have found any bugs in the module please let me know.

regards
Henry