[QUIZ] hexdump (#171)

Matthew Moss

unread,

Jul 25, 2008, 1:22:10 PM7/25/08

to

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

The three rules of Ruby Quiz 2:

1. Please do not post any solutions or spoiler discussion for this
quiz until 48 hours have passed from the time on this message.

2. Support Ruby Quiz 2 by submitting ideas as often as you can! (A
permanent, new website is in the works for Ruby Quiz 2. Until then,
please visit the temporary website at

<http://splatbang.com/rubyquiz/>.

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem
helps everyone on Ruby Talk follow the discussion. Please reply to
the original quiz message, if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

## hexdump (#171)

_Quiz idea provided by Robert Dober._

This week's quiz should be quick and easy for experienced Rubyists,
and a good lesson for beginners. Your task this week is to write a
utility that outputs a hex dump of the input.

There are a number of hex dump utilities in existence, that go by the
names `hd`, `od`, `hexdump`... I'm sure there are more. Pick one you'd
like to reproduce: If you're on any variety of Unix or BSD (including
Mac OS X), you can get man pages from the command-line to see how they
work. On Windows, if you don't have one installed, you can check out
this [man page for hexdump][1] and use that as a model.

You are not required to implement all the various command-line
switches, but I should be able to run your script on a file and, as a
minimum, see output resembling this (view with fixed-width font for
best results):

0000000 6573 2074 6c68 0a73 7973 746e 7861 6f20
0000010 0a6e 6f63 6f6c 7372 6863 6d65 2065 6564
0000020 6573 7472 0a0a 6573 2074 7865 6170 646e
0000030 6174 0a62 6573 2074 6174 7362 6f74 3d70
0000040 0a32 6573 2074 6873 6669 7774 6469 6874
0000050 323d 220a 6573 2074 6574 7478 6977 7464
0000060 3d68 3836 0a0a 2022 2051 6f63 6d6d 6e61
0000070 2064 6f74 7220 6665 726f 616d 2074 6170
0000080 6172 7267 7061 7368 6120 646e 6c20 7369
0000090 2e74 6e0a 6f6e 6572 616d 2070 2051 7167
00000a0 0a7d 0a0a
00000a4

Your submission should accept input either from a named file (part of
the command-line arguments) or from standard input if no filename is
provided.

Finally, when submitting, make sure to describe what existing hex dump
program you are emulating/reproducing (if any), and what arguments to
your script are needed, if any, to produce the basic output above.

[1]: http://unixhelp.ed.ac.uk/CGI/man-cgi?hexdump+1

--
Matthew Moss <matthe...@gmail.com>

James Gray

unread,

Jul 25, 2008, 4:01:33 PM7/25/08

to

On Jul 25, 2008, at 12:22 PM, Matthew Moss wrote:

> There are a number of hex dump utilities in existence, that go by the
> names `hd`, `od`, `hexdump`... I'm sure there are more. Pick one you'd

> like to reproduce…

xxd is my favorite.

James Edward Gray II

Mikael Høilund

unread,

Jul 25, 2008, 4:56:05 PM7/25/08

to

Will you be accepting golfed solutions? Of course you will. :)

-- a,b=%Q=Z,O^NPO\r4_PV\\PI\x15^-\x0\v=,email=%\%%%c\%115%%# Mikael
Hoilund, CTO
okay=%#;hmm=(0...a.size).map{|i|((a[i]-email[i]+2)%128).# of Meta.io
ApS from
chr}.join;!email.gsub!'o',"%c%c"%[3+?0.<<(2),?G.~@];aha=#############
Denmark
hmm.scan(/#{'(.)'*5}/);!puts(email[1..-12]+aha.shift.zip(*aha).join)#
Ruby <3

Chris Shea

unread,

Jul 25, 2008, 5:37:08 PM7/25/08

to

On Jul 25, 11:22 am, Matthew Moss <matthew.m...@gmail.com> wrote:
> 0000000 6573 2074 6c68 0a73 7973 746e 7861 6f20
> 0000010 0a6e 6f63 6f6c 7372 6863 6d65 2065 6564
> 0000020 6573 7472 0a0a 6573 2074 7865 6170 646e
> 0000030 6174 0a62 6573 2074 6174 7362 6f74 3d70
> 0000040 0a32 6573 2074 6873 6669 7774 6469 6874
> 0000050 323d 220a 6573 2074 6574 7478 6977 7464
> 0000060 3d68 3836 0a0a 2022 2051 6f63 6d6d 6e61
> 0000070 2064 6f74 7220 6665 726f 616d 2074 6170
> 0000080 6172 7267 7061 7368 6120 646e 6c20 7369
> 0000090 2e74 6e0a 6f6e 6572 616d 2070 2051 7167
> 00000a0 0a7d 0a0a
> 00000a4

Is this really what you dumped, Matthew? I was hoping for something a
little more... comprehensible.

Chris

---

es tlh
systnxao
nocolsrhcme eedestr

es txeapdnat
bes tatsbot=p
2es thsfiwtdiht2="
es tettxiwtd=h86

" Qocmmna dotr feroam taparrgpasha dnl si.tn
oneram p Qqg
}

Matthew Moss

unread,

Jul 25, 2008, 11:25:00 PM7/25/08

to

I believe your endianness is off, sir.

Chris Shea

unread,

Jul 26, 2008, 2:14:03 AM7/26/08

to

Whoops. I knew it looked almost like a slightly shuffled
somethingorother.

Chris

Matthew Moss

unread,

Jul 26, 2008, 9:14:02 AM7/26/08

to

On Jul 25, 3:56 pm, Mikael Høilund <mik...@hoilund.org> wrote:
> Will you be accepting golfed solutions? Of course you will. :)

Well, sure... Though in this case, I'd somewhat prefer to see nicely
written solutions that offered up more command-line options, such as
those provided by the various utilities. Things like grouping by 1, 2
or 4 bytes; ASCII display; binary/octal; etc.

But golfed solutions are okay, as usual...

Robert Dober

unread,

Jul 27, 2008, 4:55:20 PM7/27/08

to

Well here goes my reference implementation, in good ol' RQ tradition.
Nothing fancy here just 16 bytes per line
with hexaddresses and ASCII output at the right, like the System V hd command.

http://pastie.org/242020

Robert

--
http://ruby-smalltalk.blogspot.com/

There's no one thing that's true. It's all true.
--
Ernest Hemingway

Mikael Høilund

unread,

Jul 27, 2008, 5:51:04 PM7/27/08

to

Oh hi, I just thought I'd golf a solution. I'm sure other people can
do a much better job than I making a full hexdumping suite, so I just
had some fun. Can't seem to get it lower than 78 characters,
unfortunately.

i=0;$<.read.scan(/.{0,16}/m){puts"%08x "%i+$&.unpack('H4'*8).join('
');i+=16}

Expanded and parenthesified, clarified:

i = 0
ARGF.read.scan(/.{0,16}/m) {
puts(("%08x " % i) + $&.unpack('H4'*8).join(' '))
i += 16
}

ARGF (aliased as $<) is the file handle of all file names given in the
arguments concatenated, STDIN if none — exactly what we need. The
regex to scan matches between 0 and 16 characters (including newline)
greedily. Change it to 1,16 if you don't want the empty line at the end.

Instead of letting the block to scan take an argument, I used a trick
I picked up from the last Ruby Quiz I participated in (Obfuscated
Email), and use $& inside the block, which is the last regex match.
Saves two characters \o/

The unpack returns an array of eight strings, each of four characters,
with the hexadecimal representation of the ASCII value of two
consecutive characters. Fun, fun, fun.

Martin Boese

unread,

Jul 28, 2008, 10:27:08 AM7/28/08

to

I added an ascii column to your solution... now it's about twice the size ;-)

i = 0
$<.read.scan(/.{0,16}/m) {
puts(("%08x " % i) + $&.unpack('H4'*8).join(' ') + ' ['+
$&.split(//).collect { |c| c.inspect[1] == 92 ? '.' :c }.join + ']' )
i += 16

Adam Shelly

unread,

Jul 30, 2008, 5:36:49 PM7/30/08

to

On 7/28/08, Martin Boese <boes...@gmx.de> wrote:
> On Sunday 27 July 2008 22:51:04 Mikael Høilund wrote:
> > Oh hi, I just thought I'd golf a solution. I'm sure other people can
> > do a much better job than I making a full hexdumping suite, so I just
> > had some fun. Can't seem to get it lower than 78 characters,
> > unfortunately.
> >
>
> I added an ascii column to your solution... now it's about twice the size ;-)
>
> i = 0
> $<.read.scan(/.{0,16}/m) {
> puts(("%08x " % i) + $&.unpack('H4'*8).join(' ') + ' ['+
> $&.split(//).collect { |c| c.inspect[1] == 92 ? '.' :c }.join + ']' )
> i += 16
> }
>

I can't resist golf: I got Martin's solution down to 95 bytes (If you
take out the ascii column it's down to 71).

i=0;$<.read.scan(/.{0,16}/m){puts"%08x0 "%i+$&.unpack('H4'*8)*' '+' |
'+$&.tr('^ -~','.');i+=1}

Tricks: *' ' is a shorter version of .join(' ') for arrays,
and $&.tr('^ -~','.') says translate any character not between ' ' and
'~' (32 to 126) to a '.' That saved a ton over the
split/collect/inspect method. (By the way, map and dump save a few
bytes over collect and inspect)

I also did a more full-featured version that supports some command line options

-Adam
----------------------------------------------------
#hexdump utility for RubyQuiz#171
USAGE=<<USAGE

Usage:
#{$0.split(/[\/\\]/)[-1]} [-n length] [-s skip] [-g group] [-w
width] [-a] file

Dumps <length> bytes of <file> in hex format, starting at offset <skip>.
Prints <width> bytes per line in groups of size <group>.
Prints the ascii on the right unless <-a> specified

Default is all bytes of $stdin in 16/2 format.

USAGE
begin
width=16
group=2
skip=0
length=Float::MAX
do_ascii = true
file = $stdin

while (opt=ARGV.shift)
if opt[0]==?-
case opt[1]
when ?n
length=ARGV.shift.to_i
when ?s
skip=ARGV.shift.to_i
when ?g
group = ARGV.shift.to_i
when ?w
width = ARGV.shift.to_i
when ?a
do_ascii = false
else
raise ArgumentError,"invalid Option #{opt}"
end
else
file = File.new(opt)
end
end

n=0
ascii=''
file.read(skip)
file.each_byte{|b|
if n%width == 0
print "%s\n%08x "%[ascii,n+skip]
ascii='| ' if do_ascii
end
print "%02x"%b
print ' ' if (n+=1)%group==0
ascii << "%s"%b.chr.tr('^ -~','.') if do_ascii
break if n>length
}
puts ' '*(((2+width-ascii.size)*(2*group+1))/group.to_f).ceil+ascii
#this is probably the most complicated line
#it pads out the line to get the remaining ascii to align:
# (2+width-ascii.size) is the number of bytes missing (the 2 is for the ' | ')
# *(2*group+1) is the width of a group of bytes with the space
# /group.to_f divides by the number of groups
# .ceil rounds up, otherwise we misalign on partial groups

rescue =>x
puts USAGE, "ERROR: #{x}"
end

Mikael Høilund

unread,

Jul 30, 2008, 7:12:33 PM7/30/08

to

On Jul 30, 2008, at 23:36, Adam Shelly wrote:

> On 7/28/08, Martin Boese <boes...@gmx.de> wrote:
>
> I can't resist golf: I got Martin's solution down to 95 bytes (If you
> take out the ascii column it's down to 71).
>
> i=0;$<.read.scan(/.{0,16}/m){puts"%08x0 "%i+$&.unpack('H4'*8)*' '+' |
> '+$&.tr('^ -~','.');i+=1}

That's pretty neat! I'd totally forgotten about that trick. The way
you handle the counter is ;)-ish ;)

--
# Mikael Høilund
def method_missing(m, a=0) a +
m.to_s[/[a-z]+/].size * 2; end
p What is the meaning of life?

Matthew Moss

unread,

Jul 31, 2008, 2:36:57 PM7/31/08

to

When learning a new programming language, the first thing many coders
do is write the traditional "Hello, world!" program. This generally
provides the bare minimum needed for coding: base program structure,
compilation if needed... In Ruby, this is very bare, as `puts "Hello,
world!"` is sufficient. (See quiz #158 for some non-traditional
versions.)

What also seems a tradition is the question, "What should I program
now?" after "Hello, world!" is output to the console. New coders are
looking for something to try, to expand their skills, without becoming
overwhelmed. Often, I find, the easiest way to do this is to reproduce
an existing program. You can focus on learning the new language and
implementing an existing design, rather than coming up with something
novel.

This week's quiz was chosen with this in mind; it is a good project
for new Rubyists, to dive into the language a bit without drowning.
Hex dump utilities have been around for ages, and there are plenty of
them, so we don't have to think about implementing anything new;
rather, we can focus on learning the Ruby. And writing a hex dump
program let's you deal with files, strings, arrays and output: some of
the basics of any code.

I'm going to look at parts from each of the few solutions, to
highlight some of the things you should know as a Rubyist. If you're
new to Ruby, you might consider trying the quiz first before reading
this summary and the submissions. Then, after reading this summary,
revise and refactor your solution to be leaner and cleaner.

First, let's look at the non-golfed (and slightly modified)
submission _Mikael Hoilund_. It's short, but dense with good
Ruby-isms.

i = 0
ARGF.read.scan(/.{0,16}/m) { |match|
puts(("%08x " % i) + match.unpack('H4'*8).join(' '))
i += 16

}

`ARGF` is a special constant. It isn't a file, but can be treated as
such (as seen above, via the call to the `IO#read` method). It will
sequentially read through all files provided on the command-line or,
if none are provided, will read from standard input. It works together
with `ARGV`, the array of arguments provided to your program,
expecting that all values in `ARGV` are filenames. If you happen to
have a script that also expects command-line options (such as
`--help`), just make sure to process and/or remove them from `ARGV`
before using `ARGF`.

`String#scan` which finds instances of the pattern provided in the
source string. In this case, Mikael is using a regular-expression that
grabs up to 16 characters (i.e. bytes) at a time, including newlines.
(The `m` in the regular-expression indicates a multi-line match, in
which newline characters are treated like any other character, rather
than terminators.)

`String#scan` can return an array of matches, but it can also be used
in block-form, as shown above, the block called once per match with
the matching values passed in argument `match`.

Another trick here is replication. These aren't really "tricks", as
they are standard functions defined on the class, but they can
certainly save typing and keep the code clearer. Try these in `irb`:

> 'H4' * 8
=> "H4H4H4H4H4H4H4H4"

> [1, 2, 3] * 2
=> [1, 2, 3, 1, 2, 3]

`String#unpack` is a powerful function for handling raw data. It uses
a format string (e.g. "H4H4H4H4H4H4H4H4") to decode the raw data. In
this case, `H4` indicates that four nybbles (e.g. two bytes) should be
decoded from the string. Doing that eight times decodes 16 bytes,
which is how much we are reading at a time in Mikael's code above.

`String#unpack` (and the reverse `Array#pack`) can do a lot of work in
short-order. It just takes a bit of practice to understand, and
easy-access to the formats table. (On the command-line, type: `ri
String#unpack`.)

Finally, take a quick look at Mikael's golfed solution. Aside from
squeezing everything together, it makes use of some special globals:
`$<` (equivalent to `ARGF`) and `$&` (evaluates to the current match
from `scan`, eliminating the need for the `match` parameter to the
block). Globals like this can certainly make it more fun to "golf"
(i.e. the deliberate shrinking and obfuscation of a program), but
aren't recommended for clarity.

_Robert Dober_ provides a clean, straightforward solution that needs
little explanation. Make sure to look at the whole of it, while I
examine briefly his `output` method.

require 'enumerator'

BYTES_PER_LINE = 0x10

def output address, line
e = line.enum_for :each_byte
puts "%04x %-#{BYTES_PER_LINE*3+1}s %s" % [ address,
e.map{ |b| "%02x" % b }.join(" "),
e.map{ |b|
0x20 > b || 0x7f < b ? "." : b.chr
}.join ]
end

The most useful bit here is the `enumerator` module, and the
`enum_for` method that returns an `Enumerable::Enumerator` object.
This object provides a number of ways to access the data. Here, Robert
accesses it one byte at a time, having passed the argument
`:each_byte`. Enumerators, of course, are not required to process each
byte of the source string: a couple calls to `each_byte` could have
done that as well. But the enumerator is a convenient package, which
can be used multiple times, can be used as an `Enumerable`, and remove
redundancy, all shown above.

Enumerators also have access to other ways to enumerate... What if you
want to get three objects at a time from a collection? Disjointed or
overlapping? You can use `:each_cons` or `:each_slice` to that effect.

> x = [1, 2, 3, 4, 5]
=> [1, 2, 3, 4, 5]

> x.enum_for(:each_cons, 3).to_a
=> [ [1, 2, 3], [2, 3, 4], [3, 4, 5] ]

> x.enum_for(:each_slice, 3).to_a
=> [ [1, 2, 3], [4, 5] ]

(Note that there are some changes going on with enumerators between
Ruby 1.8.6 and 1.9; here is some good information on the [changes in
Ruby 1.9][1]).

Now we look briefly at _Adam Shelly_'s solution, in particular his
command-line option handling.

width = 16
group = 2
skip = 0
length = Float::MAX

do_ascii = true
file = $stdin

while (opt = ARGV.shift)
if opt[0] == ?-
case opt[1]
when ?n
length = ARGV.shift.to_i
when ?s
skip = ARGV.shift.to_i
when ?g
group = ARGV.shift.to_i
when ?w
width = ARGV.shift.to_i
when ?a
do_ascii = false
else
raise ArgumentError, "invalid Option #{opt}"
end
else
file = File.new(opt)
end
end

`ARGV.shift` is a common pattern. It removes the first item from
`ARGV` and returns it. Doing the assignment and while-loop test in one
motion with `ARGV.shift` is a simple way to look at all the
command-line arguments.

Adam's arguments to his hexdump program are expected to be a single
character preceded by a single dash. The question-mark notation (e.g.
`?n`) returns the integer ASCII value of the character immediately
following. Likewise, single-character array access (e.g. `opt[1]`)
_also_ returns an integer ASCII value. (Note: This also differs in
1.9.) So by checking the first two characters of an argument pulled
from `ARGV` against the dash character and various other options
implemented, Adam can replace the default values provided at the top.

For a quick-and-dirty script, handling options in such a way is simple
and convenient. For more complex option-handling, you would do well to
make use of the [standard `optparse`][2] module, or [third-party
`main`][3].

That's it for this week! Thanks for the submissions; I certainly
learned a few things myself. (I can't believe I didn't know about
`ARGF`...)

[1]: http://eigenclass.org/hiki.rb?Changes+in+Ruby+1.9
[2]: http://www.ruby-doc.org/stdlib/libdoc/optparse/rdoc/classes/OptionParser.html
[3]: http://groups.google.com/group/ruby-talk-google/browse_thread/thread/88bf54ad98a769ca

--
Matthew Moss <matthe...@gmail.com>

Michael Morin

unread,

Aug 8, 2008, 5:35:07 PM8/8/08

to

Matthew Moss wrote:
> ## hexdump (#171)
>
> _Quiz idea provided by Robert Dober._
>
> This week's quiz should be quick and easy for experienced Rubyists,
> and a good lesson for beginners. Your task this week is to write a
> utility that outputs a hex dump of the input.

I did something a little different, I made a module that can be used to
extend IO objects. This means you can extend any File or socket objects
to become hex writers. Since I don't think you can "un-extend" an
object, it would probably be best if you dup the IO object if you need
to switch between hex and normal output.

#!/usr/bin/env ruby
# UziMonkey <uzim...@gmail.com>

module HexWriter
def self.extend_object(o)
class << o
alias_method :old_write, :write
end

super
end

def write(s)
s.each_byte do|b|
if @bytes % 16 == 0 and @address != 0
end_line
new_line
end

write_byte b
end
end

def new_file
@address = 0
new_line
end

def end_line
old_write " " * (16 - @bytes)
old_write " #{@ascii}\n"
end

def new_line
@bytes = 0
@address ||= 0
@ascii = ""

old_write "%08x" % @address
end

def write_byte(b)
old_write " %02x" % b

@ascii << ((b.chr =~ /[[:print:]]/).nil? ? '.' : b.chr)

@bytes += 1
@address += 1
end
end

hex = STDOUT.dup.extend HexWriter
ARGV.each do|f|
puts "File: #{f}"
hex.new_file
hex.write File.read(f)
hex.end_line
end

--
Michael Morin
Guide to Ruby
http://ruby.about.com/
Become an About.com Guide: beaguide.about.com
About.com is part of the New York Times Company