Thanks,
Brian
I have no idea what -B does in Perl, but a text file is generally
understood to be a file that lacks any other control characters than the
horizontal (CR, HT, SP) and vertical (LF, VT) format effectors. If you
have a decently encoded character set, that means very few characters in
the ranges #x00-#x1f and #x7f-#x9f.
If you have some IBM-based crud page or any one of the usual Microsoft
disasters, there is no way to tell for real, except you would probably
find periodic line breaks with CRLF in text files.
A common and very simple negative test for a text file is if the last
character in teh file is not a line feed.
///
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.
I dunno about a *standard* way, but you could always rewrite Perl's -B
operator. From perlfunc(1):
The "-T" and "-B" switches work as follows. The first block or so of the
file is examined for odd characters such as strange control codes or
characters with the high bit set. If too many strange characters (>30%)
are found, it's a "-B" file, otherwise it's a "-T" file. Also, any file
containing null in the first block is considered a binary file.
(-T, for you non-Perl-ers, tests for text files.)
-- Larry
Sorry. Binary files are defined by having 42% of "strange" characters
in the first 4242 sextets (4+2 bits).
Perl got this wrong. :)
Cheers
--
Marco Antoniotti ========================================================
NYU Courant Bioinformatics Group tel. +1 - 212 - 998 3488
719 Broadway 12th Floor fax +1 - 212 - 995 4122
New York, NY 10003, USA http://bioinformatics.cat.nyu.edu
"Hello New York! We'll do what we can!"
Bill Murray in `Ghostbusters'.
> Sorry. Binary files are defined by having 42% of "strange" characters
> in the first 4242 sextets (4+2 bits).
> Perl got this wrong. :)
Rubbish. Files are binary if more than 17 of the first 23 5-bit bytes
are not legal BAUDOT, with the exception that, if they spell
'EWIGE BLUMENKRAFT FNORD', in which case the file is considered binary
anyway.
--tim