Ignoring for a moment what it would mean for a Unicode file to be
"binary", you could just do this:
#!/usr/bin/ruby
# From the Perl documentation:
#
# The "-T" and "-B" switches work as follows. The
# first block or so of the file is examined for odd
# characters such as strange control codes or char-
# acters with the high bit set. If too many strange
# characters (>30%) are found, it's a "-B" file,
# otherwise it's a "-T" file. Also, any file con-
# taining null in the first block is considered a
# binary file. If "-T" or "-B" is used on a file-
# handle, the current stdio buffer is examined
# rather than the first block. Both "-T" and "-B"
# return true on a null file, or a file at EOF when
# testing a filehandle. Because you have to read a
# file to do the "-T" test, on most occasions you
# want to use a "-f" against the file first, as in
# "next unless -f $file && -T $file".
# I don't know how to get to the stdio buffer...
class File
def self.isBinary(name)
myStat = stat(name)
return false unless myStat.file?
open(name) { |file|
blk = file.read(myStat.blksize)
return blk.size == 0 ||
blk.count("^ -~", "^\r\n") / blk.size > 0.3 ||
blk.count("\x00") > 0
}
end
end
Dir.new('.').each { |entry|
if File.stat(entry).file?
puts "#{entry} #{ File.isBinary(entry) ? 'binary' : 'text' }"
else
puts "#{entry} directory"
end
}
--
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE
It is impossible to determine completely whether a file is binary or
not because binary file is undefined concept. Unix command `file'
does that by heuristics. file command is pragmatically enough well
but not complete.
Now, the following tests if a file includes non ascii printable code
point byte. You can improve this script to detect non latin-1 etc.
However some coding systems is stateful, for example iso-2022, and
this approach does not work for such character coding systems.
#! ruby
NON_ASCII_PRINTABLE = /[^\x20-\x7e\s]/
def nonbinary?(io, forbidden, size = 1024)
while buf = io.read(size)
return false if forbidden =~ buf
end
true
end
# usage: ruby this_script.rb filename ...
ARGV.each do |fn|
begin
open(fn) do |f|
if nonbinary?(f, NON_ASCII_PRINTABLE)
puts "#{fn}: ascii printable"
else
puts "#{fn}: binary"
end
end
rescue
puts "#$0: #$!"
end
end