I was wondering of some kind soul could tell me what I am doing wrong in the code below (one module, one test file, minimum case)
I can see why calling print $fh in "sub PRINT" is recursing - but I cannot find out how to stop it!! Even copying the filehandle (a suggestion via Google) seems to not work. I've played with "tied" and do not seem to be able to find the magic to obtain the underlying filehandle so that calling print() does not immediately redirect to sub PRINT() and thus recurse.
#### SafeFile.pm #####
package SafeFile;
use warnings;
sub TIEHANDLE
{
my ($self, $fh, @options) = @_;
my $data = {
fh => $fh,
@options,
};
return bless $data, $self;
}
sub FETCH {
my ($self) = @_;
return $self->{fh};
}
sub PRINT {
my $self = shift;
my $fh = *{$self->{fh}};
print $fh @_;
}
sub CLOSE
{
my ($self) = @_;
close $self->{fh};
}
sub safewrite {
my $path = shift;
open my $fh, '>', ;
tie *$fh, __PACKAGE__, *$fh,
(
path => $path,
);
return $fh;
}
1;
#### END SafeFile.pm #####
#### test #####
#!/usr/bin/perl
use warnings;
use SafeFile;
my $fh = SafeFile->safewrite('/tmp/wibble');
print $fh "Hello\n";
print $fh "World\n";
close $fh;
#### END test #####
% perl test
#### Result #####
Deep recursion on subroutine "SafeFile::PRINT" at SafeFile.pm line 23.
/bin/bash: line 1: 21037 Segmentation fault perl test
shell returned 139
Press ENTER or type command to continue
#### END Result #####
The purpose of this is to redirect a request to open ...., ">somefile"; to open ..., ">somefile.tmp";
The filehandle $fh should be usuable normally with print etc.
On close($fh), somefile.tmp is closed, and rename()'d (an atomic operation on Linux) to "somefile" - thus the target file is never in a half written state at any point.
Tim Watts <tw+use...@dionic.net> writes:
> I was wondering of some kind soul could tell me what I am doing wrong in the > code below (one module, one test file, minimum case)
> I can see why calling print $fh in "sub PRINT" is recursing - but I cannot > find out how to stop it!! Even copying the filehandle (a suggestion via > Google) seems to not work.
It works fine:
sub TIEHANDLE
{
my ($self, $fh, @options) = @_;
my $outfh;
you just really need to make a 'deep' copy of the file handle.
I think you also shouldn't store a reference to the tied thing in the
'tie object' itself. That will likely result in each of them referring
to the other. I didn't test this, though.
> I can see why calling print $fh in "sub PRINT" is recursing - but I cannot > find out how to stop it!! Even copying the filehandle (a suggestion via > Google) seems to not work.
> sub safewrite {
> my $path = shift;
> open my $fh, '>', ;
> tie *$fh, __PACKAGE__, *$fh,
> (
> path => $path,
> );
Don't do that. Use a new filehandle:
use Symbol qw/gensym/;
my $tied = gensym;
tie *$tied, __PACKAGE__, *$fh, ...;
return $tied;
Also, you don't need all those * derefs all over the place, and the code
would be safer without them. Stuffing a bare glob into a scalar variable
is a bit of a hack, and you're better off sticking to globrefs
throughout. I think the only place you need to deref is the first
argument of 'tie'.
>> I can see why calling print $fh in "sub PRINT" is recursing - but I
>> cannot find out how to stop it!! Even copying the filehandle (a
>> suggestion via Google) seems to not work.
>> sub safewrite {
>> my $path = shift;
>> open my $fh, '>', ;
>> tie *$fh, __PACKAGE__, *$fh,
>> (
>> path => $path,
>> );
> Don't do that. Use a new filehandle:
> use Symbol qw/gensym/;
> my $tied = gensym;
> tie *$tied, __PACKAGE__, *$fh, ...;
> return $tied;
> Also, you don't need all those * derefs all over the place, and the code
> would be safer without them. Stuffing a bare glob into a scalar variable
> is a bit of a hack, and you're better off sticking to globrefs
> throughout. I think the only place you need to deref is the first
> argument of 'tie'.
> Ben
Awesome! Thanks Ben. I did come across one google result with gensym - but for some reason it failed when I adapted it (the example was long and structurally different) - guess I cocked it up.
I have implemented your method and it does the job just right. And I can see the logic.
Initially, it seemed "wrong" to not tie to the actual filehandle, and I assumed there must be a way to request the filehandle without triggering the magic - but I guess not.
Many many thanks - I've been on this for a day and a half...
I also tried yours and, again, big thanks - that worked too, for me...
Saved me much hair pulling :)
I spent some time trying to decouple the tie with "tied" but as you say, pulling the internal reference of the tie'd filehandle just brings the tie magic back into play.
> Ben Morrow wrote:
> > Quoth Tim Watts <tw+use...@dionic.net>:
> >> I can see why calling print $fh in "sub PRINT" is recursing - but I
> >> cannot find out how to stop it!! Even copying the filehandle (a
> >> suggestion via Google) seems to not work.
> >> sub safewrite {
> >> my $path = shift;
> >> open my $fh, '>', ;
> >> tie *$fh, __PACKAGE__, *$fh,
> >> (
> >> path => $path,
> >> );
> > Don't do that. Use a new filehandle:
> > use Symbol qw/gensym/;
> > my $tied = gensym;
> > tie *$tied, __PACKAGE__, *$fh, ...;
> > return $tied;
> Awesome! Thanks Ben. I did come across one google result with gensym - but > for some reason it failed when I adapted it (the example was long and > structurally different) - guess I cocked it up.
All gensym does is give you a new filehandle. The only reason you need
it is because, unlike open, tie won't create one for you if you pass it
an empty scalar.
> Initially, it seemed "wrong" to not tie to the actual filehandle, and I > assumed there must be a way to request the filehandle without triggering the > magic - but I guess not.
Conceptually the tied filehandle points to a different 'file' from the
non-tied one: a tied filehandle doesn't need a real file behind it at
all, and as far as Perl is concerned the object it's tied to is 'the
file'. So you have one filehandle pointing to a real file, and another
pointing to an object, and the fact that object copies data into the
file is just a coincidence.
In principle you *could* use just one filehandle, by untying it and
then retying afterwards, but it would be pretty awkward. For one thing,
you'd need to set up your TIEHANDLE method so that if you passed it an
already-constructed object it used that rather than constructing a new
one. For another, as Rainer pointed out, you'd've just created a
reference loop, and you'd need to explicitly break it before the object
would be destroyed.
> The purpose of this is to redirect a request to open ...., ">somefile"; to > open ..., ">somefile.tmp";
> The filehandle $fh should be usuable normally with print etc.
> On close($fh), somefile.tmp is closed, and rename()'d (an atomic operation > on Linux) to "somefile" - thus the target file is never in a half written > state at any point.
An entirely different approach to accomplish this:
---------------
package SafeFile;
sub new
{
my ($class, $name) = @_;
my $fh;
open($fh, '>', $name.'.tmp')
or die("open: $name.tmp: $!");
This doesn't work with explicit close calls, a user is expected to let
the filehandle go out-of-scope once he is done creating the contents.
It also relies on the feature that the SCALAR slot of a glob
autovivifies like any other 'real reference', something which is
documented as
> ${*$fh{SCALAR}} = $name;
> It also relies on the feature that the SCALAR slot of a glob
> autovivifies like any other 'real reference', something which is
> documented as
> This might change in a future release.
> (perlref)
That's extremely unlikely. That sentence was (I believe) intended to
prepare people for the possibility of perl's glob creation mechanism
changing so that a glob did not necessarily have a SCALAR component.
In 5.8 and earlier, when a glob was created the SCALAR slot was filled
at the same time:
~% perl5.8.9 -MDevel::Peek -e'Dump *foo'
SV = PVGV(0x284b0068) at 0x28403d08
[...]
NAME = "foo"
NAMELEN = 3
GvSTASH = 0x2840309c "main"
GP = 0x284b7760
SV = 0x28403cb4
[...]
AV = 0x0
HV = 0x0
CV = 0x0
[...]
Notice that although AV, HV and CV are null, SV has already been filled.
This is reflected in the corresponding *foo{THING} operations:
~% perl5.8.9 -le'print for *foo{ARRAY}, *foo{SCALAR}'
SCALAR(0x28403cc0)
~%
For 5.10 this was changed, since most globs in modern Perl programs only
use the CODE slot (since most variables are now lexicals), so creating a
scalar for the SCALAR slot was a waste of memory:
~% perl5.10.0 -MDevel::Peek -e'Dump *foo'
SV = PVGV(0x800ecc630) at 0x800e8fed0
[...]
NAME = "foo"
NAMELEN = 3
GvSTASH = 0x800e0a108 "main"
GP = 0x800ebb4c0
SV = 0x0
[...]
AV = 0x0
HV = 0x0
CV = 0x0
[...]
However, in order not to break code like the example above, it was made
impossible to see this from Perl. As soon as you try to look at the
SCALAR slot, it autovivs and pretends there was a scalar there all
along, so the Perl test gives the same result as before:
~% perl5.10.0 -le'print for *foo{SCALAR}, *foo{ARRAY}'
SCALAR(0x800e0a2a0)
~%
Since the change has been made already, and in a way which preserves
backcompat, I think it extremely unlikely that this behaviour will ever
go away.
If you're worried, though, all you need to do is create a fresh scalar
and stuff it into the glob, just as you would if you wanted to use the
ARRAY or HASH slot. There isn't an explicit anonymous scalar constructor
(though I sometimes wish there was) but a lexical that's about to go out
of scope will do just fine:
*$fh = do { \my $tmp };
${*$fh{SCALAR}} = $name;
In a short method, of course, you don't need to bother with the do
block.
Ben Morrow wrote:
> Conceptually the tied filehandle points to a different 'file' from the
> non-tied one: a tied filehandle doesn't need a real file behind it at
> all, and as far as Perl is concerned the object it's tied to is 'the
> file'. So you have one filehandle pointing to a real file, and another
> pointing to an object, and the fact that object copies data into the
> file is just a coincidence.
> In principle you *could* use just one filehandle, by untying it and
> then retying afterwards, but it would be pretty awkward. For one thing,
> you'd need to set up your TIEHANDLE method so that if you passed it an
> already-constructed object it used that rather than constructing a new
> one. For another, as Rainer pointed out, you'd've just created a
> reference loop, and you'd need to explicitly break it before the object
> would be destroyed.
Yes - it's all clear now.
I will re-read the perltie man page, but I have a strong feeling that manpage is lacking in pointing out this - it was natural to assume one would tie the actual handle and that's where it all went wrong.
Perhaps as a "newbie" to this feature I should contribute a documentation patch :)
I did consider something like this - but more like a fully OO style where the print/write methods would be implemented and the filehandle would not be obtainable.
In the end, given tie works, I like the tie approach as it is less accident prone.
Background:
When I worked at Imperial College, we built a perl framework for driving sysadmin scripts - and one of the modules implemented "SafeFile" - so that a crash could never leave a half written system file (eg /etc/passwd) - also the module added some functionality such as lockouts (eg if /etc/passwd-
special existed, then /etc/passwd would not be overwritten, it could force file uid/gig/modes and various other tricks. It was declared to be the *only* way to write a file from the system scripts that ran from cron.
There were loads of other modules for logging, machine-class handling, merging config files and so on.
Sadly, we never opensourced it (though the nod was given by the Head of Dept).
Now I work at Kings College London, I have need of the same sort of setup - so I am reimplementing the functionality, but changing the design a little to add some new ideas.
The original "SafeFile" returned a ($fh, $obj) pair - use $fh and close via $obj. I never liked that in hindsight but we were lazy and it worked - so I set out to "fix" it this time around.
My code is open source from the get-go - though it's going to be a while before any of it is ready. Imperial had 4-5 people working 3 months over summer to do theirs, in work time. I have 1 person (me), a few % work time and my own time. The code name (for when I eventually upload to GoogleCode or BitBucket) is Sys::SysUpdate
Tim Watts <tw+use...@dionic.net> writes:
> Ben Morrow wrote:
>> Conceptually the tied filehandle points to a different 'file' from the
>> non-tied one: a tied filehandle doesn't need a real file behind it at
>> all, and as far as Perl is concerned the object it's tied to is 'the
>> file'.
[...]
> I will re-read the perltie man page, but I have a strong feeling that > manpage is lacking in pointing out this - it was natural to assume one would > tie the actual handle and that's where it all went wrong.
This actually seems rather 'unnatural' to me: The idea behind the
tieing mechanism is that some kind of 'familiar' Perl construct (like a
hash or a filehandle) can be used to interface with some conceptually
similar 'other thing', the classic example being a hashed flat-file
database, based on a set of abstract operations defining 'a hash' (or
'a filehandle') not in terms of what it is but in terms of how it
behaves in reply to certain messages. This means that tieing a
filehandle which actually refers to some file implies loss of the
ability to use this filehandle to manipulate the file in the 'usual'
way using the built in filehandle-based file manipulation operations.
This may be ok if this filehandle is henceforth supposed to act as
mock filehandle interface object to $something_completely_different,
although I would rather avoid that.
> I did consider something like this - but more like a fully OO style where > the print/write methods would be implemented
Read: You didn't "consider something like this". But you can't help
the temptation to badmouth it a little using whatever empty phrases
happen to come to you head such as 'not fully OO style' (aka doesn't
reimplement half of the universe uselessly just to control
finalization)
[...]
> I like the tie approach as it is less accident prone.
or "it is accident prone.
> Background:
> When I worked at Imperial College,
[...]
> Now I work at Kings College London,
"I'm a varsity guy!"
Not that the stench wasn't pungent enough on its own ...
> I did consider something like this - but more like a fully OO style where > the print/write methods would be implemented
Read: You didn't "consider something like this". But you can't help
the temptation to badmouth it a little using whatever empty phrases
happen to come to you head such as 'not fully OO style' (aka doesn't
reimplement half of the universe uselessly just to control
finalization)
[...]
> I like the tie approach as it is less accident prone.
>> I did consider something like this - but more like a fully OO style where
>> the print/write methods would be implemented
> Read: You didn't "consider something like this".
What's got your goat?
I did consider something like this (I think I'd know) - specifically:
my $fobj = SafeFile->new(filename, options...);
$obj->print(Blah);
$obj->abort(); # Backs out of the final rename and deletes the tmp file.
$obj->close();
> But you can't help
> the temptation to badmouth it a little using whatever empty phrases
> happen to come to you head such as 'not fully OO style' (aka doesn't
> reimplement half of the universe uselessly just to control
> finalization)
I am not "badmouthing" it - it is a valid solution. However, for the situation I have, I feel that the tie() solution fits better.
You said: "This doesn't work with explicit close calls"
This is exactly the type of error I am guarding against - this framework is designed to be fairly strict in use and it is not improbable that *someone else* programming against it might make the error of calling "close $fh" because it is normal to them.
>> I like the tie approach as it is less accident prone.
> or "it is accident prone" ...
>> Background:
>> When I worked at Imperial College,
> [...]
>> Now I work at Kings College London,
> "I'm a varsity guy!"
> How come I'm not surprised ...
What's that supposed to mean? I appreciate the good ideas you have suggested - and your explanation did help me to understand the internal mechanics of tie. But there is really no need to throw a strop. My job requires me to work with and configure and understand dozens of major software packages as well as the OS, VMWare, bits of python and god knows what else. Most of those I am aiming to be 70% competant in - sorry I cannot manage 100% on all of them, including perl!!
Tim Watts <tw+use...@dionic.net> writes:
> Rainer Weikusat wrote:
>> Tim Watts <tw+use...@dionic.net> writes:
>>> Rainer Weikusat wrote:
>>>> An entirely different approach to accomplish this:
>>>> ---------------
>>>> package SafeFile;
>>>> sub new
>>>> {
>>>> my ($class, $name) = @_;
>>>> my $fh;
>>>> open($fh, '>', $name.'.tmp')
>>>> or die("open: $name.tmp: $!");