There is a difference of assumptions here that can lead to people talking past
each other. Let me try to explain.
One way of seeing things is to weigh up the advantages and disadvantages of a
behaviour in typical situations. If there are more cases (likely to be
encountered in practice) where it helps than where it hurts, it is worth doing.
The other viewpoint is to define a notion of correctness and ask if there are
any situations where the code is not correct by that definition. Either the
code is correct for 100% of inputs or it has a bug. By that rule, even
if the incorrect behaviour only triggers when the program is run seventeen
subdirectories deep on Tuesday by a user whose name contains no vowels, it is
still, in principle, buggy.
As programmers we mix these two philosophies in our work. Usually, the more
general-purpose and low-level the component, the more it must follow the '100%'
rule rather than the heuristic rule. I have written plenty of code which only
works on filenames that do not contain spaces. Out of caution I'll make it
die if it finds a bad filename, but it still doesn't work 100%. That is fine
for my use case, and such code runs reliably in production, but it would not
be suitable for distribution as a general-purpose library. I think it was
Stroustrup who wrote that library code needs to be held to a higher standard
(of performance, robustness etc) because you just don't know what requirements
the user of the library will have. So where reuse is likely, one takes care
to write code which handles even pathological cases and gives No Surprises(tm).
It gives a nice warm glow to finish writing a function which is (believed to be)
correct for all possible inputs, and does not require any caveats or gotchas
to be mentioned in the documentation.
An example in Perl is 2-arg versus 3-arg open(). It is true that 2-arg open()
will trip up if a filename begins >, but how likely is that? Surely in practice
the extra convenience of being able to give pipes on the command line is worth
it? And if you really need to handle those obscure cases, you are surely
expert enough to know how to turn off the magic. Yet the recommendation for
beginners and for new code in general is to use the 3-arg form, which opens the
filename given with 'no ifs and no buts'. This is because as a fundamental
part of the language, to be used in all sorts of places we don't even know of,
it must work 100% rather than having helpful heuristic behaviours. The magic
is still there for those who want it, but it cannot be the recommended default.
The
strictures.pm library itself embodies the '100% or buggy' philosophy in its
disabling of indirect object syntax. The indirect syntax 'new Foo()' works fine
most of the time and can lead to more readable and expressive code. It goes
wrong only if you define a subroutine which has the same name as a method, and
if that does happen, the programmer can surely figure it out. Yet many think
that it is more important to have a method invocation that always works, without
any caveats or guessing.
That is why suggestions to improve the regular expression that checks directory
names will not satisfy those who don't think the code should ever behave
differently based on directory names (unless the programmer has explicitly
asked for it). As far as these people ('we') are concerned, if there is any
magic filename at all, there can still be a latent bug in that case.
There is another point which throws off the discussion, and that is the idea
that an expert user knows how to disable feature X, so it's okay to have X on
by default. If X is some magic behaviour which usually does nothing but
sometimes turns on, then the issue is not that an individual user can fix it
if that happens, but that your innocently written program (whether written by
beginner or expert) has a latent bug. Yes, perhaps anyone foolish enough to
put their executables in 'lib' or create filenames with '>' can also be expected
to deal with the consequences; but the program itself is faulty in some sense.
The default should be that code written without any special precautions is
safe and predictable in all cases.
You asked me to take
strictures.pm and Make It Break. From my point of view,
I have done so. I have found an example directory structure where code written
loading
strictures.pm will start behaving differently - which implies that
in general, programs loading the library will have this magic behaviour. From
my point of view, that means a bug, even if the only case that triggered it
were invoking the program as /$#!$!$@!#$/54385904353/myprog. If I found a
similar case in any other program I'd report it as a bug, and indeed lots of
bugs do depend on 'extremely unlikely' filenames and so on.
(There is nothing wrong with magic behaviour *if the programmer is fully aware*.
But not if the programmer just loaded some other library which happens to use
Moo which happens to load
strictures.pm which now globally changes the semantics
of the program depending on the filesystem.)
Now, is there a way to keep both sides happy? Although it's generally accepted
that running from a different place in the filesystem should not change the
semantics of a program, that is not true for environment variables. We all
agree that if you set PERL5LIB differently then your Perl code may change, and
this is not considered a bug in the program or in perl itself. So I suggest
making any magic depend on environment variables which are set using testing
but not otherwise. Karen E. suggested also checking $^C to see if running
under 'perl -c'. With the tests she proposed, I think the dependency on the
filesystem can be removed altogether, and everyone will be happy.
--
Ed Avis <
e...@waniasset.com>