I would like to fix the problem within the perl script
rather then change environment variables for all the users.
I tried $ENV{LANG}="en_US"; (and various system() and `` commands)
in the script, but those didn't work.
Can I make this work from within the script? How?
Is this a perl bug, a RedHat Linux bug, or ...?
Thanks,
Darwin
Darwin O.V. Alonso
Department of Medicinal Chemistry H165(HSB)
University of Washington, Seattle 98195-7610
email: dal...@u.washington.edu
Telephone: (206) 616-2780 FAX 685-3252
--
> I wrote and used several perl scripts that use the perl functions
> "seek" and "read" to get information in and about a binary file.
Is sounds, though, as if you aren't applying binary discipline to
your file operations.
> The scripts WORKED under RedHat 7.3/perl 5.6.1 and earlier.
Some would agree that they worked, others might argue that they gave
an impression of working in practice, despite being incorrect in
principle. At least, that is how I interpret what you have written:
feel free to be more specific if I seem to be misunderstanding you.
> Upon upgrading from to RH8.0(and 9) perl was also upgraded to 5.8.0,
> and the the perl scripts broke
....or as one might put it, "their existing defects came to light"
SCNR.
> I tracked the problem down to
> perl's incorrect positioning in the file after a read,
> and after some web searching found that I could fix the problem if
> I "setenv LANG en_US", before invoking the perl script.
You are evidently hitting the widely-discussed phenomenon that this
version of Perl is sensitive to the locale settings, and RH8, and
later, define a locale with utf-8 in it. The result is that by
default, Perl 5.8.* applies its Unicode semantics to your _text_
files. You need to tell it explicitly if you want to do I/O in
binary. (It's good-practice to do that even when one doesn't have to,
e.g it aids portability of code to other OSes).
> I would like to fix the problem within the perl script
> rather then change environment variables for all the users.
Since you say this is a binary file, I would suggest that setting
binmode() on the respective files is not only the correct thing to do
but also will be practically efficacious. There's no need to tangle
with the details of utf-8.
For background reading, if you nevertheless want to know more, might I
suggest the unicode tutorial which comes with 5.8.* e.g
http://www.perldoc.com/perl5.8.0/pod/perluniintro.html
> Is this a perl bug, a RedHat Linux bug, or ...?
Neither.
hope this helps. If you want more-detailed advice, then might I
recommend posting (a modest amount of) actual code relevant to your
problem?
: > I wrote and used several perl scripts that use the perl functions
: > "seek" and "read" to get information in and about a binary file.
: Is sounds, though, as if you aren't applying binary discipline to
: your file operations.
Which is exactly what the perldocs suggested would be fine in all earlier
versions of perl.
perldoc -f binmode
Systems like Unix ... do not need `binmode()'.
In fact the entire need for binmode was considered in a derogatory, smug
"unix is better", manner, which makes your current comments rather ironic
Binmode has no effect under many sytems, but in MS-DOS and
similarly =>> archaic <<= systems, it may be imperative --
otherwise your .. =>> damaged C library <<= may mangle
your file.
And the ealier docs certainly did not indicate that this would have
anything to do with character sets, the only criteria was quite explicitly
the issue of the number of characters in a line feed
Systems ... that delimit lines with a single character ... do not
need `binmode()'.
It is irksome to see so many threads and people now basically saying "well
if you'd done it right in the first place" when ever this issue comes up.
> On Wed, Jun 11, Darwin O.V. Alonso inscribed on the eternal scroll:
>
>> I wrote and used several perl scripts that use the perl functions "seek"
>> and "read" to get information in and about a binary file.
>
> Is sounds, though, as if you aren't applying binary discipline to your
> file operations.
I have to agree with Mr. Dew-Jones on this one. The crux of the problem
is that either the documentation in the older releases was incorrect
or the semantics were changed. Quoting from the v5.6.1 perlfunc man page:
Attempts to read LENGTH bytes of data into variable SCALAR from the
specified FILEHANDLE. Returns the number of bytes actually read,
"0" at end of file, or undef if there was an error.
Contrast that with the v5.8.0 man perlfunc page:
Attempts to read LENGTH characters of data into variable SCALAR
from the specified FILEHANDLE. Returns the number of characters
actually read, 0 at end of file, or undef if there was an error.
Notice the substitution of "characters" for "bytes". And for any North
American (i.e., USA) or European user who even noticed the rewording
the distinction between the two is almost certainly going to be missed.
It's bad enough that the definition of this function was changed.
Compounding that mistake is the fact the documentation does not state
clearly that the function is now locale sensitive. I say "mistake"
because a function named "read" should not be locale sensitive IMHO.
Neither should it be not sensitive to line endings. It isn't "readline"
or "readstring", it's "read"; as in "read a string of bytes". The
only difference between "read" and "sysread" should be the buffering
performed.
Now, having said that, if the original poster was actually using a
line oriented function (e.g., "<>" or "readline") rather than "read"
then they (arguably) deserve to be surprised by the change in behavior.
> Binmode has no effect under many sytems, but in MS-DOS and
> similarly =>> archaic <<= systems, it may be imperative --
> otherwise your .. =>> damaged C library <<= may mangle
> your file.
Do not forget that for a long time Perl docs were maintained by people
with enough clue to consider Unix a modern OS. :-( No wonder that now
comes the time to eat your hat...
Hope this helps,
Ilya
> Which is exactly what the perldocs suggested would be fine in all earlier
> versions of perl.
Sadly, there was too much of that, indeed.
> perldoc -f binmode
>
> Systems like Unix ... do not need `binmode()'.
>
> In fact the entire need for binmode was considered in a derogatory, smug
> "unix is better", manner,
I think you'll find that I'm on record as pointing out to the unix
bigots that properly-used binmode() did no harm in Unix, and promoted
Perl's justifiable claim to platform portability; and I was saying
that quite a while before I became aware of the implications in terms
of Unicode semantics.
> which makes your current comments rather ironic
I suppose you could say that.
> It is irksome to see so many threads and people now basically saying "well
> if you'd done it right in the first place" when ever this issue comes up.
I'm sorry if you interpreted that way. I was trying to explain the
observations in terms of what Perl (now) does. I can only agree with
you that the documentation was lagging behind what I would have rated
as best practice, until quite recently.
best regards
--
ISO-8859-1 is one of two charsets appropriate for use in
Western Europe (the other is ISO-8859-15). The US has not
been politically part of Europe for nearly 227 years. - Mark Crispin
The problem with utf-8 locale handling in v5.8.0 will be corrected
in v5.8.1. See the following for related discussions:
<http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-01/msg00011.html>
<http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-01/msg00000.html>
<http://www.xray.mpe.mpg.de/mailing-lists/perl-unicode/2003-03/msg00032.html>
--ewh
--
Earl Hood | University of California
eh...@hydra.acs.uci.edu | Irvine
http://www.nacs.uci.edu/indiv/ehood/ | Electronic Loiterer