I'm completely new to p5p and the Perl source. Hope I'm not in
over my head. :-)
I would like to propose an enhancement to how Perl handles a
certain aspect of I/O. Currently, there are many variables that are
tracked on a filehandle-by-filehandle basis, but there are five,
$, $OUTPUT_FIELD_SEPARATOR
$/ $INPUT_RECORD_SEPARATOR
$\ $OUTPUT_RECORD_SEPARATOR
$: $FORMAT_LINE_BREAK_CHARACTERS
$^L $FORMAT_FORMFEED
which are global across all filehandles. My proposal is to allow
programmers to set them on a per-filehandle basis. The way this
would be accomplished is as follows:
A. Filehandles are globs; the hash value of the filehandle glob is
(generally) unused. As recommended in the docs for IO::Handle, this
hash can be used to store per-object data. The following hash
elements would be (optionally) defined for each filehandle object:
Key
io_handle_ofs $OUTPUT_FIELD_SEPARATOR
io_handle_irs $INPUT_RECORD_SEPARATOR
io_handle_ors $OUTPUT_RECORD_SEPARATOR
io_handle_flbc $FORMAT_LINE_BREAK_CHARACTERS
io_handle_ff $FORMAT_FORMFEED
If any of these entries does not exist in the filehandle glob's hash,
then that indicates that the corresponding global variable is to be
used. If an entry exists, it is to be used instead of the global
variable. This way, if these values have not been explicitly set by
the programmer, no change in behavior will occur. Backwards
compatability, yay.
B. IO::Handle currently has class methods to set (and fetch) each of
the above variables, but only globally. These methods will be
modified to work on a per-object basis, as well as retaining their
existing functionality. Thus,
IO::Handle->output_field_separator($ofs);
would set $, globally, while
$io->output_field_separator($ofs);
would set $io's private copy of $, (that is, ${*$io}{io_handle_ofs}).
A special value, IO::Handle::use_global, may be passed to each of
these five methods to tell the object to "forget" it's own local
variable and to go back to using the associated global variable.
That is,
$io->output_field_separator($io->use_global);
would do
delete ${*$io}{io_handle_ofs};
C. The subroutine IO::Handle::print will be modified to localize
$, and $\ with the filehandle's private variables if they exist.
Likewise, IO::Handle::getline and getlines will localize $/, and
IO::Handle::format_write will localize $: and $^L.
D. The chomp() operator now has no way of knowing what filehandle a
scalar came from, so it has no way of knowing whether a custom $/ was
used or not. So I suggest that it keep its current behavior of using
the global $/. To use a filehandle's custom $/ (if any), a new IO::Handle
method will be created:
$io->chomp($some_scalar);
The above four points can be done in pure Perl, just by modifying
IO::Handle.pm. See below for a link to a first crack at such.
The problem with that is that it only works if you use the object-
oriented filehandle syntax for *everything*:
STDOUT->print("stuff"); # would work
print STDOUT "stuff"; # would not work
print "stuff"; # would not work
$x = $io->getline("stuff"); # would work
$x = <$io>; # would not work
This is inconsistent and confusing, imho. *Possibly* some might
find it acceptable if it is the only way to set per-filehandle
variables in an older perls. For a more complete solution, we
must modify perl itself.
E. In pp_hot.c, pp_print will be modified to check whether the
io object has local copies of $, or $\. If so, it will use them
instead of the globals.
F. Also in pp_hot.c, Perl_do_readline will be modified to check
whether the io object has a local copy of $/. If so, there are
two possibilities I see:
F1: It can pass a new parameter to sv_gets which would be a
SV * to indicate what value to use instead of PL_rs. This
entails modifying all of the calls to sv_gets throughout
the system.
F2: Take the easy way out and localize a change to PL_rs before
the call to sv_gets, and restore its value afterward.
I'm going with the second choice for now, unless wise people think
that it'd be better to change sv_gets and its callers system-wide.
G. I haven't looked into what it'd take to localize $: and $^L on a
per-filehandle basis yet, but I can't imagine that it'd be any harder
than the above. I'll get to it soon.
________________
So there's my proposal. I have coded all of the above so far (except
for the bit about the format variables). You can see what I've done so
far at http://employeeweb.myxa.com/eric/perl/IO/. So far, this is
really just a proof-of-concept, not an actual patch yet.
Here are some things that I'd like to ask those here who are wiser and
more experienced in perl's guts than I am.
1. Is this idea worth pursuing? I think it's eminently useful and
wonderful; but heck, everyone thinks their own idea is great.
2. (related q:) Has this idea been discussed before? Shot down? Is
someone else working on it already?
3. I have made the source changes to v5.6.1 of perl, since that's what
I had handy. I plan to start work on 5.8.0 Real Soon Now. Question:
will there be a v5.6.2? Just curious.
4. Whom do I contact for getting these changes into Perl? Larry Wall?
The Pumpking? Someone else? What's the timeframe for the next
release?
5. I am having an occasional segfault problem due to my changes to
pp_hot.c. I believe it's because of the way I'm saving $/ in
Perl_do_readline, but I can't tell what the problem is because of my
inexperience. See the above url for (some) details on how to
reproduce the problem.
Thank you very much,
--
Eric J. Roode er...@myxa.com
Senior Software Engineer, Myxa Corporation +1(610)234-2623
tr j, j ,j for @japh = (qw b lre h, uJ p, ekca tona, ts reh b, $/.r);
print scalar reverse sort @japh;
> 1. Is this idea worth pursuing? I think it's eminently useful and
> wonderful; but heck, everyone thinks their own idea is great.
I think that we should look at what will be done on Perl 6 about this,
and stay close to it. I like the idea but I'll let the others comment
on it.
> 2. (related q:) Has this idea been discussed before? Shot down? Is
> someone else working on it already?
Search the archives ;-) I don't know.
> 3. I have made the source changes to v5.6.1 of perl, since that's what
> I had handy. I plan to start work on 5.8.0 Real Soon Now. Question:
> will there be a v5.6.2? Just curious.
There may be a 5.6.2. However new features won't be introduced in
maintenance releases. See the perlhack manpage to learn about getting
the current development version of Perl (a.k.a bleadperl).
> 4. Whom do I contact for getting these changes into Perl? Larry Wall?
> The Pumpking? Someone else? What's the timeframe for the next
> release?
This is indeed the right place to ask.
> 5. I am having an occasional segfault problem due to my changes to
> pp_hot.c. I believe it's because of the way I'm saving $/ in
> Perl_do_readline, but I can't tell what the problem is because of my
> inexperience. See the above url for (some) details on how to
> reproduce the problem.
You should really submit a patch : preferred format would be the output of
"diff -u pp_hot.c.orig pp_hot.c". Patches are more readable. And a patch
against 5.8.0 or bleadperl, because I/O has changed a lot between 5.6.x
and 5.8.0, with the introduction of PerlIO. You'll probably have to read
the PerlIO docs.
(Idea : can you write a subclass of IO::Handle, and put it on CPAN, so
people can play with it ? OK, that won't work with regular filehandles,
but that would be installable on older perls without patching the sources.)
Good reading. Nice to see your suggestions are all `doable' in plain perl, but
don't forget that
$^, $~, $=, $-, and $.
are also bound to files handles, but still don't work as expected (sometimes :)
If you change the others to the extended behaviour, people might expect the
format related globals to be better accessible too. (I would). I bet you it is
much harder to get these to do what they should bound to a file handle without
dealing with the perl CORE (the C files).
--
H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/)
using perl-5.6.1, 5.8.0 & 633 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
WinNT 4, Win2K pro & WinCE 2.11. Smoking perl CORE: smo...@perl.org
http://archives.develooper.com/daily...@perl.org/ per...@perl.org
send smoke reports to: smokers...@perl.org, QA: http://qa.perl.org
Thanks. :-)
> don't forget that
>
> $^, $~, $=, $-, and $.
>
> are also bound to files handles, but still don't work as expected (sometimes
+:)
I'm afraid I don't know what you mean. How do they work (or not work)
that's contrary to your expectations? How would you like them to work
better?
> If you change the others to the extended behaviour, people might
> expect the format related globals to be better accessible too. (I
> would). I bet you it is much harder to get these to do what they
> should bound to a file handle without dealing with the perl CORE
> (the C files).
Well, fools rush in... :-)
I'm certainly happy in principle with the idea of making these work
on a per-filehandle basis, but I think both the semantics and the
proposed implementation would need some more discussion.
:A. Filehandles are globs; the hash value of the filehandle glob is
:(generally) unused.
Generally, but we don't require this, so it is perfectly legitimate
for existing code to be making use of the hash. I suspect this
should be implemented instead by extending the struct xpvio.
:B. IO::Handle currently has class methods to set (and fetch) each of
:the above variables, but only globally. These methods will be
:modified to work on a per-object basis, as well as retaining their
:existing functionality.
These methods currently complain if called on a reference, so I think
this change should be acceptable.
:F. Also in pp_hot.c, Perl_do_readline will be modified to check
:whether the io object has a local copy of $/. If so, there are
:two possibilities I see:
:
: F1: It can pass a new parameter to sv_gets which would be a
: SV * to indicate what value to use instead of PL_rs. This
: entails modifying all of the calls to sv_gets throughout
: the system.
You can't do that: you'll break anyone currently calling sv_gets()
in XS code. You can, however, make sv_gets a thin wrapper around
a more general function that just passes PL_rs as the separator.
Doing it this way would be preferable to messing with PL_rs itself.
:1. Is this idea worth pursuing? I think it's eminently useful and
:wonderful; but heck, everyone thinks their own idea is great.
Probably. Certainly the need to relate these attributes to individual
filehandles is clear enough that we have already established all
of those global variables will completely disappear in perl6.
The main problem I anticipate is what happens when you mix use of
the global and per-filehandle mechanisms: if I've got a filehandle
from somewhere, and my code does something like:
local $/ = undef;
my $para = <FH>;
... I expect that to read to the end of the file regardless of where
I got the filehandle from, and I can see substantial potential for
confusion under such circumstances. I don't know if there is a way
to resolve that without going all the way and removing the global
variables, and I'm pretty certain that won't be happening before
perl6.
Now it may be that the implementation that's going to do the right
thing the most often would involve acting as if C< local $/ = undef >
locally sets every individual file handle's IRS, and if so we'd need
some clever magic to avoid having to do so for real.
:2. (related q:) Has this idea been discussed before? Shot down? Is
:someone else working on it already?
Probably; maybe; not as far as I know. You should take a trawl
through the archives of the mailing list for previous conversations:
http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/
:3. I have made the source changes to v5.6.1 of perl, since that's what
:I had handy. I plan to start work on 5.8.0 Real Soon Now. Question:
:will there be a v5.6.2? Just curious.
I believe there will be, but I'm not aware of any promised timeframe.
As mentioned by another responder, changes like these are unlikely
to make it into any maintenance release, and certainly not before
they've been implemented first and received some testing in a
development version of perl. See the perlhack manpage for details
of how to grab the latest sources, and for a lot of other useful
information.
:4. Whom do I contact for getting these changes into Perl? Larry Wall?
:The Pumpking? Someone else? What's the timeframe for the next
:release?
I'm the main person you need to convince, but this list is the place
we discuss such things. I haven't committed to any particular release
date so far, but I'm guessing it'll be somewhere between one and two
years from the point 5.8.0 was released.
:5. I am having an occasional segfault problem due to my changes to
:pp_hot.c. I believe it's because of the way I'm saving $/ in
:Perl_do_readline, but I can't tell what the problem is because of my
:inexperience. See the above url for (some) details on how to
:reproduce the problem.
This is where the fun starts. :)
You should start by reading the perlhack manpage and at least
skimming the perlapi manpage. You can often also get good clues by
looking at other code doing similar things.
Then you're going to need to become familiar with a C-level debugger
such as gdb, which will allow you to investigate core dumps and step
through the code etc. A quick look suggests your first problem is
that perfh_val() is passing an uninitialised SV** val to fill in;
you need instead to define it as SV* val, and pass its address.
I would strongly suggest, however, thrashing out the semantics on
this list first, then discussing the best way to implement it.
Best of luck,
Hugo
These are format (write) related globals. To start in understanding what they
should do (and where we currently know they will fail), start reading write.t
:)
$^ $FORMAT_TOP_NAME current page header format
$~ $FORMAT_NAME current format
$= $FORMAT_LINES_PER_PAGE current page size (in lines)
$- $FORMAT_LINES_LEFT lines left on current page
$% $FORMAT_PAGE_NUMBER current page number
'write' takes an *optional* filehandle, which makes
select STDOUT;
write;
be the same as
$^ = "STDOUT_TOP";
$~ = "STDOUT";
write STDOUT; # Will use $=, $-, and $% to see if it will
# issue a $^L (also a nice candidate to make handle-bound)
but this will do something quite different
select OUTPUT;
$^ = "HEADER";
$~ = "PARAGRAPH";
write; # To OUTPUT
$~ = "FOOTMARK";
write; # To OUTPUT
$~ = "LOG";
write STDOUT; # To STDOUT. But what are my $=, $-, etc?
Working OK (not related to format):
$. $INPUT_LINE_NUMBER the line number on the current input file handle
$| $OUTPUT_AUTOFLUSH unbuffered outputput on current handle
> > If you change the others to the extended behaviour, people might
> > expect the format related globals to be better accessible too. (I
> > would). I bet you it is much harder to get these to do what they
> > should bound to a file handle without dealing with the perl CORE
> > (the C files).
>
> Well, fools rush in... :-)
Call in the fools!