[QUIZ] Quoted Printable (#23)

Ruby Quiz

unread,

Mar 11, 2005, 8:57:10 AM3/11/05

to

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

The quoted printable encoding is used in primarily in email, thought it has
recently seen some use in XML areas as well. The encoding is simple to
translate to and from.

This week's quiz is to build a filter that handles quoted printable translation.

Your script should be a standard Unix filter, reading from files listed on the
command-line or STDIN and writing to STDOUT. In normal operation, the script
should encode all text read in the quoted printable format. However, your
script should also support a -d command-line option and when present, text
should be decoded from quoted printable instead. Finally, your script should
understand a -x command-line option and when given, it should encode <, > and &
for use with XML.

Here are the rules we will use, from the quoted printable format:

1. Bytes with ASCII values from 33 (exclamation point) through 60 (less
than) and values from 62 (greater than) through 126 (tilde) should be
passed through the encoding process unchanged. Note that the -x switch
modifies this rule slightly, as stated above.

2. Other bytes are to be encoded as an equals sign (=) followed by two
hexadecimal digits. For example, when -x is active less than (<) will
become =3C. Use only capital letters for hex digits.

3. The exceptions are spaces and tabs. They should remain unencoded as
long as any non-whitespace character follows them on the line. Spaces
and tabs at the end of a line, must be encoded per rule 2 above.

4. Native line endings should be translated to carriage return-line feed
pairs.

5. Quoted printable lines are limited to 76 characters of length (not
counting the line ending pair). Longer lines must be divided up. Any
line endings added by the encoding process should be proceeded by an
equals sign, so the unecoder will know to remove them. The equals sign
must be the last character on the line, followed immediately by the line
end pair. Such an equals sign does count as a non-whitespace character
for rule 3, allowing preceding spaces and tabs to remain unencoded.
The equals sign must fit inside the 76 character limit.

To unecode, just reverse the process.

Glenn Parker

unread,

Mar 13, 2005, 1:57:53 PM3/13/05

to

Note: I assumed it would be cheating to use the builtin quoted printable
facilities.

I found it somewhat frustrating that String#each_byte does not return
any useful value (see encode_str).

I found it a bit more frustrating that String#chomp! is a greedier than
you might expect, discarding all sorts of potential line endings,
instead of limiting itself to $/.

I would also suggest that adding support for GetoptLong#[] to query
options directly, instead of requiring a full iteration.

#!/usr/bin/env ruby -w

require 'getoptlong'

MaxLength = 76

def main
opts = GetoptLong.new(
[ "-d", GetoptLong::NO_ARGUMENT ],
[ "-x", GetoptLong::NO_ARGUMENT ]
)
$opt_decode = false
$opt_xml = false
opts.each do |opt, arg|
case opt
when "-d": $opt_decode = true
when "-x": $opt_xml = true
end
end

if $opt_decode
decode_input
else
encode_input
end
end

def encode_input
STDOUT.binmode # We need to control the line-endings.
while (line = gets) do
# Note: String#chomp! swallows more than just $/.
line.sub!(/#{$/}$/o, "")
# Encode the entire line.
line.gsub!(/[^\t -<>-~]+/) { |str| encode_str(str) }
line.gsub!(/[&<>]+/) { |str| encode_str(str) } if $opt_xml
line.sub!(/\s*$/) { |str| encode_str(str) }
# Split the line up as needed.
while line.length > MaxLength
split = line.index("=", MaxLength - 4) - 1
split = (MaxLength - 2) if split.nil? or (split > MaxLength - 2)
print line[0..split], "=\r\n"
line = line[(split + 1)..-1]
end
print line, "\r\n"
end
end

def encode_str(str)
encoded = ""
str.each_byte { |c| encoded << "=%02X" % c }
encoded
end

def decode_input
while (line = gets) do
line.chomp!
line.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
if line[-1] == ?=
print line[0..-2]
else
print line, $/
end
end
end

main

--
Glenn Parker | glenn.parker-AT-comcast.net | <http://www.tetrafoil.com/>

James Edward Gray II

unread,

Mar 13, 2005, 2:24:38 PM3/13/05

to

On Mar 13, 2005, at 12:57 PM, Glenn Parker wrote:

> Note: I assumed it would be cheating to use the builtin quoted
> printable facilities.

I must sheepishly admit that I was unaware of of Ruby's converter when
I made the quiz. It was pointed out the me in a private email after I
posted it. The converter isn't a complete solution to the quiz, but it
gets you very close.

Is it cheating to use Ruby features? Never. Feel free, then poke a
little fun at the quiz editor because you're smarter than he is. All
part of the fun.

Sorry for the oversight.

James Edward Gray II

Dave Burt

unread,

Mar 14, 2005, 10:41:06 AM3/14/05

to

Hi,

Testing. I found building a test suite before doing the code really helpful on
this one, to get my head around the intricacies of the encoding. Actually
thinking through the edge cases and working out expected results was necessary
for me to develop this solution.

Now, of course, this would have been a lot easier if I'd just been able to find
the "builtin quoted printable facilities." What builtin quoted printable
facilities?

Anyway, here is my result:
http://www.dave.burt.id.au/ruby/quoted-printable.rb

And the tester:
http://www.dave.burt.id.au/ruby/test-quoted-printable.rb

The testing program generates test methods and test data dynamically.

The public interface to my solution looks like this:

module QuotedPrintable

WHITESPACE = [?\t, ?\ ]
WHITESPACE_REGEXP = /[\t ]/
WHITESPACE_ESCAPED_REGEXP = /=09|=20/

# bytes that do not need to be escaped
PRINTABLES = ((?!..?~).to_a + WHITESPACE) - [?=]

MAX_LINE_WIDTH = 76

NEWLINE = "\r\n"

# additional bytes to escape for safety in an EBCDIC document
EBCDIC_EXCEPTIONS = %w' ! " # $ @ [ \ ] ^ ` { | } ~ '
EBCDIC_PRINTABLES = PRINTABLES - EBCDIC_EXCEPTIONS
# additional bytes to escape for safety in an XML document
XML_EXCEPTIONS = %w' < > & '
XML_PRINTABLES = PRINTABLES - XML_EXCEPTIONS

# Encode self to the quoted-printable transfer encoding
def to_quoted_printable(printables = QuotedPrintable::PRINTABLES)

# Decode self from the quoted-printable transfer encoding
def from_quoted_printable

# Functions that do quoted-printable encoding and decoding
class << self

# Return the quoted-printable escaped representation of the given byte
# (byte must be a Fixnum between 0 and 255)
def encode_byte(byte)

# Return the byte corresponding to the given quoted-printable escape
# sequence as a String. If it's not valid, return nil.
def decode_sequence(escape_sequence)

# Return the given string encoded as quoted-printable, including the
# canonical \r\n line terminators.
def encode_string(string, printables = PRINTABLES)

# Consider the given string quoted-printable encoded, and decode it,
# including translating line terminators to the native default.
def decode_string(string)

# Add quoted-printable conversions to String
class String
include QuotedPrintable # to_quoted_printable, from_quoted_printable
end

Cheers,
Dave

James Edward Gray II

unread,

Mar 14, 2005, 2:06:10 PM3/14/05

to

On Mar 14, 2005, at 9:41 AM, Dave Burt wrote:

> Now, of course, this would have been a lot easier if I'd just been
> able to find the "builtin quoted printable facilities." What builtin
> quoted printable facilities?

Look up the "M" format for Array.pack.

James Edward Gray II

Dave Burt

unread,

Mar 14, 2005, 11:44:36 PM3/14/05

to

>> What builtin quoted printable facilities?
>
> Look up the "M" format for Array.pack.

So here's the cheat solution:

class String
def to_quoted_printable(*args)
[self].pack("M").gsub(/\n/, "\r\n")
end
def from_quoted_printable
self.gsub(/\r\n/, "\n").unpack("M").first
end
end

(Just add my original if __FILE__ block to make it almost quiz-compatible)

And here's how it fares against my test suite:

Loaded suite TC_QuotedPrintable
Started
............FF.FFFFFFF..
Finished in 0.39 seconds.

So it's 10 times the speed of my original one (against random binary data), but
chops lines too early, ends up with 73- instead of 76-character lines. Of
course, this one won't do XML.

Interestingly, if I use a gsub! instead of a loop with sub!s in my soft_break!
method, I get a 5x speedup... and fail the same tests.

Cheers,
Dave

Patrick Hurley

unread,

Mar 15, 2005, 1:28:55 AM3/15/05

to

I am a ruby newbie, so be kind. I wrote the code myself, but blatantly
stole Dave Burt's test cases - thank you. I also found one test case
that breaks my code (and Dave's) that I am not sure what the correct
answer is, but I know mine is wrong:

Consider:
"===
\n"
which will cause a new space to be found at the end of a string - is
it the case that all space at the end of the line is encoded
(increasing size rather needlessly), but simplifying this case? Either
way, I am too tired and have other important stuff to do so I will let
it go.

Please feel free to let me know where I did not do things the "Ruby
way" as I am primarily a C++ and Perl guy, but very interested in
getting better at Ruby.

Thanks
pth

#
# == Synopsis
#
# Ruby Quiz #23
#
# The quoted printable encoding is used in primarily in email, thought it has
# recently seen some use in XML areas as well. The encoding is simple to
# translate to and from.
#
# This week's quiz is to build a filter that handles quoted printable
# translation.
#
# Your script should be a standard Unix filter, reading from files listed on
# the command-line or STDIN and writing to STDOUT. In normal operation, the
# script should encode all text read in the quoted printable format. However,
# your script should also support a -d command-line option and when present,
# text should be decoded from quoted printable instead. Finally, your script
# should understand a -x command-line option and when given, it should encode
# <, > and & for use with XML.
#
# == Usage
#
# ruby quiz23.rb [-d | --decode ] [ -x | --xml ]
#
# == Author
# Patrick Hurley, Cornell-Mayo Assoc
#
# == Copyright
# Copytright (c) 2005 Cornell-Mayo Assoc
# Licensed under the same terms as Ruby.
#

require 'optparse'
require 'rdoc/usage'

module QuotedPrintable
MAX_LINE_PRINTABLE_ENCODE_LENGTH = 76

def from_qp
result = self.gsub(/=\r\n/, "")
result.gsub!(/\r\n/m, $/)
result.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
result
end

def to_qp(handle_xml = false)
char_mask = if (handle_xml)
/[^!-%,-;=?-~\s]/
else
/[^!-<>-~\s]/
end

# encode the non-space characters
result = self.gsub(char_mask) { |ch| "=%02X" % ch[0] }
# encode the last space character at end of line
result.gsub!(/(\s)(?=#{$/})/o) { |ch| "=%02X" % ch[0] }

lines = result.scan(/(?:(?:[^\n]{74}(?==[\dA-F]{2}))|(?:[^\n]{0,76}(?=\n))|(?:[^\n]{1,75}(?!\n{2})))(?:#{$/}*)/);
lines.join("=\n").gsub(/#{$/}/m, "\r\n")
end

def QuotedPrintable.encode
STDOUT.binmode

while (line = gets) do

print line.to_qp
end
end

def QuotedPrintable.decode
STDIN.binmode

while (line = gets) do

# I am a ruby newbie, and I could
# not get gets to get the \r\n pairs
# no matter how I set $/ - any pointers?
line = line.chomp + "\r\n"
print line.from_qp
end
end

end

class String
include QuotedPrintable
end

if __FILE__ == $0

opts = OptionParser.new
opts.on("-h", "--help") { RDoc::usage; }
opts.on("-d", "--decode") { $decode = true }
opts.on("-x", "--xml") { $handle_xml = true }

opts.parse!(ARGV) rescue RDoc::usage('usage')

if ($decode)
QuotedPrintable.decode()
else
QuotedPrintable.encode()
end
end

Dave Burt

unread,

Mar 15, 2005, 2:50:01 AM3/15/05

to

"Patrick Hurley" <phu...@gmail.com> submitted:

>I am a ruby newbie, so be kind. I wrote the code myself, but blatantly
> stole Dave Burt's test cases - thank you. I also found one test case

Quiz tests are for sharing - I think that's established. In any case, you're
welcome to them.

> that breaks my code (and Dave's) that I am not sure what the correct
> answer is, but I know mine is wrong:
>
> Consider:
> "===
> \n"
> which will cause a new space to be found at the end of a string - is
> it the case that all space at the end of the line is encoded
> (increasing size rather needlessly), but simplifying this case? Either
> way, I am too tired and have other important stuff to do so I will let
> it go.

I see no problem. I've added that test case, and both our solutions
pass.

http://www.dave.burt.id.au/ruby/test-quoted-printable.rb

> Please feel free to let me know where I did not do things the "Ruby
> way" as I am primarily a C++ and Perl guy, but very interested in
> getting better at Ruby.

> ...

> /[^!-<>-~\s]/

Bug: "\f" doesn't get escaped (it's part of /\s/). Probably "\r" as well;
that's harder to test on windows.

I see no other problems. Your optparse is better (i.e. shorter) than mine
:). Your

(/(?:(?:[^\n]{74}(?==[\dA-F]{2}))|(?:[^\n]{0,76}(?=\n))|(?:[^\n]{1,75}(?!\n{2})))(?:#{$/}*)/)

makes you look like a Perl 5 junkie, though. Also, you use global
variables - we rubyists shun these: use locals.

Cheers,
Dave

James Edward Gray II

unread,

Mar 15, 2005, 10:15:10 AM3/15/05

to

(from Dave's solution)

if __FILE__ == $0
require 'optparse'

# Look, James, I'm opt-parsing! :)
...

I'm so proud! :D

James Edward Gray II

unread,

Mar 15, 2005, 10:16:22 AM3/15/05

to

(from Patrick's solution--for those who missed it)

while (line = gets) do
# I am a ruby newbie, and I could
# not get gets to get the \r\n pairs
# no matter how I set $/ - any pointers?

...

James Edward Gray II

Patrick Hurley

unread,

Mar 15, 2005, 11:42:47 AM3/15/05

to

Thanks for the kind response.

When I said the test case failed, I meant the actually output our
resulting output encodeing the line has trailing space at the end of a
line. We both escape trailing spaces before we break lines - if the
line breaking moves some code is that not an issue? (the continuation
= might mean that it is not).

Yup there was an issue with masks I fixed that and removed the globals
(my perl just throwing in a $ when in doubt :-) There was also a bug
in the command line driver, which I have fixed. The patched code
follows

> (/(?:(?:[^\n]{74}(?==[\dA-F]{2}))|(?:[^\n]{0,76}(?=\n))|(?:[^\n]{1,75}(?!\n{2})))(?:#{$/}*)/)
> makes you look like a Perl 5 junkie,

I did this to allow the use of a gsub, which is much faster than the
looping solution. The look aheads and general uglyness handle the
special cases. I probably should use /x and space it out and comment,
but when I am in the regexp zone, I know what I am typing <grin>.

require 'optparse'
require 'rdoc/usage'

/[\x00-\x08\x0b-\x1f\x7f-\xff=<>&]/
else
/[\x00-\x08\x0b-\x1f\x7f-\xff=]/
end

# encode the non-space characters
result = self.gsub(char_mask) { |ch| "=%02X" % ch[0] }
# encode the last space character at end of line
result.gsub!(/(\s)(?=#{$/})/o) { |ch| "=%02X" % ch[0] }

lines = result.scan(/(?:(?:[^\n]{74}(?==[\dA-F]{2}))|(?:[^\n]{0,76}(?=\n))|(?:[^\n]{1,75}(?!\n{2})))(?:#{$/}*)/);
lines.join("=\n").gsub(/#{$/}/m, "\r\n")
end

def QuotedPrintable.encode(handle_xml=false)

STDOUT.binmode
while (line = gets) do

print line.to_qp(handle_xml)
end
end

def QuotedPrintable.decode
STDIN.binmode
while (line = gets) do
# I am a ruby newbie, and I could
# not get gets to get the \r\n pairs
# no matter how I set $/ - any pointers?
line = line.chomp + "\r\n"
print line.from_qp
end
end

end

class String
include QuotedPrintable
end

if __FILE__ == $0

decode = false
handle_xml = true

opts = OptionParser.new
opts.on("-h", "--help") { RDoc::usage; }

opts.on("-d", "--decode") { decode = true }
opts.on("-x", "--xml") { handle_xml = true }

opts.parse!(ARGV) rescue RDoc::usage('usage')

if (decode)
QuotedPrintable.decode()
else
QuotedPrintable.encode(handle_xml)
end
end

Dave Burt

unread,

Mar 15, 2005, 3:40:15 PM3/15/05

to

"Patrick Hurley" <phu...@gmail.com> continued:

> Thanks for the kind response.
>
> When I said the test case failed, I meant the actually output our
> resulting output encodeing the line has trailing space at the end of a
> line. We both escape trailing spaces before we break lines - if the
> line breaking moves some code is that not an issue? (the continuation
> = might mean that it is not).

From the RFC (2045, section 6.7):
Any TAB (HT) or SPACE characters
on an encoded line MUST thus be followed on that line
by a printable character. In particular, an "=" at the
end of an encoded line, indicating a soft line break
(see rule #5) may follow one or more TAB (HT) or SPACE
characters.

So it's all good - unescaped tabs and spaces are fine as long as it's got a
printable non-whitespace character after it, and "=" is fine for that.

... Therefore, when decoding a Quoted-Printable
body, any trailing white space on a line must be
deleted, as it will necessarily have been added by
intermediate transport agents.

There's something I think we've all forgotten to do -- strip trailing unescaped
whitespace. I've added the following test:

def test_decode_strip_trailing_space
assert_equal(
"The following whitespace must be ignored: \r\n".from_quoted_printable,
"The following whitespace must be ignored:\n")
end

And the following line to decode_string:
result.gsub!(/[\t ]+(?=\r\n|$)/, '')

>
> Yup there was an issue with masks I fixed that and removed the globals
> (my perl just throwing in a $ when in doubt :-) There was also a bug
> in the command line driver, which I have fixed. The patched code
> follows
>
>> (/(?:(?:[^\n]{74}(?==[\dA-F]{2}))|(?:[^\n]{0,76}(?=\n))|(?:[^\n]{1,75}(?!\n{2})))(?:#{$/}*)/)
>> makes you look like a Perl 5 junkie,
>
> I did this to allow the use of a gsub, which is much faster than the
> looping solution. The look aheads and general uglyness handle the
> special cases. I probably should use /x and space it out and comment,
> but when I am in the regexp zone, I know what I am typing <grin>.

Write-only? No, I'm not in a fantastic position to comment, mine is not that
much shorter.

> ...

> def QuotedPrintable.decode
> STDIN.binmode
> while (line = gets) do
> # I am a ruby newbie, and I could
> # not get gets to get the \r\n pairs
> # no matter how I set $/ - any pointers?

| C:\WINDOWS>ruby
| STDIN.binmode
| gets.each_byte do |b| puts b end
| ^Z
|
| 13
| 10
|
Seems to work for me - that output says I wouldn't need the following line

> line = line.chomp + "\r\n"

Cheers,
Dave

Patrick Hurley

unread,

Mar 15, 2005, 5:14:54 PM3/15/05

to

Thanks for the update on the RFC, guess I should have just read that myself.

Well I don't want to "litter" the news group, but I hate to have
incorrect code out there with my name on it so. If you want follow the
link (http://hurleyhome.com/~patrick/quiz23.rb) to see the fixed code.
Also of note is the now commented (just for Dave) regexp for parsing
long lines, for the curious:

lines = result.scan(/
# Match one of the three following cases
(?:
# This will match the special case of an escape that would generally have
# split across line boundries

(?: [^\n]{74}(?==[\dA-F]{2}) ) |

# This will match the case of a line of text that does not need to split

(?: [^\n]{0,76}(?=\n) ) |

# This will match the case of a line of text that needs to be
split without special adjustment

(?:[^\n]{1,75}(?!\n{2}))
)

# Match zero or more newlines
(?-x:#{$/.}*)/x);

pth

Dave Burt

unread,

Mar 16, 2005, 6:34:39 AM3/16/05

to

"Florian Gross" <fl...@ccan.de> wrote:
> Matthew Moss wrote:
>
>> Here is my partial solution for the Quoted Printable quiz. I'm still
>> pretty new to Ruby, so it took me a while to get what you see here. I
>> think the only thing I didn't get to adding was line length checks.
>
> And here's mine as well. Sorry for being late -- I coded this up on
> Friday and forgot about it until today.
>
> It ought to handle everything correctly (including proper wrapping of
> lines that end in encoded characters) and it does most of the work with
> a few simple regular expressions.
>

Hi Florian,

As always, I'm amazed by your concise code. But your solution seems to be
failing a bunch of my tests (and not just by chopping lines early, which is
allowed):

encoding:
- escapes mid-line whitespace
- escapes '~'
- allows too-long lines (my tests saw up to 104 characters on a line)
- allows unescaped whitespace on the end of a line (as long as it's preceded
by escaped whitespace)
decoding:
- doesn't ignore trailing literal whitespace

Cheers,
Dave

Florian Gross

unread,

Mar 16, 2005, 12:32:32 PM3/16/05

to

Dave Burt wrote:

> Hi Florian,

Moin Dave.

> As always, I'm amazed by your concise code. But your solution seems to be
> failing a bunch of my tests (and not just by chopping lines early, which is
> allowed):

Thanks, I'll have a look.

> encoding:
> - escapes mid-line whitespace

I'm not sure I get this. Am I incorrectly escaping mid-line whitespace
or am I incorrectly not escaping it? And what is mid-line whitespace?

> - escapes '~'

Heh, classic off-by-one. Easily fixed by changing the Regexp. See source
below.

> - allows too-long lines (my tests saw up to 104 characters on a line)

Any hints on when this is happening? I can't see why and when this would
happen.

> - allows unescaped whitespace on the end of a line (as long as it's preceded
> by escaped whitespace)

Fixed. See code below.

> decoding:
> - doesn't ignore trailing literal whitespace

Well, I don't think that's much of an issue as I'm not sure when
trailing whitespace would be prepended to lines, but I've fixed it anyway.

Here's the new code:

> def encode(text, also_encode = "")
> text.gsub(/[\t ](?:[\v\t ]|$)|[=\x00-\x08\x0B-\x1F\x7F-\xFF#{also_encode}]/) do |char|
> char[0 ... -1] + "=%02X" % char[-1]
> end.gsub(/^(.{75})(.{2,})$/) do |match|
> base, continuation = $1, $2
> continuation = base.slice!(/=(.{0,2})\Z/).to_s + continuation
> base + "=\n" + continuation
> end.gsub("\n", "\r\n")
> end
>
> def decode(text, allow_lowercase = false)
> encoded_re = Regexp.new("=([0-9A-F]{2})", allow_lowercase ? "i" : "")
> text.gsub("\r\n", "\n").gsub("=\n", "").gsub(encoded_re) do
> $1.to_i(16).chr
> end
> end

I'll repost the full source when I've sorted out that other problem as well.

Dave Burt

unread,

Mar 16, 2005, 7:14:59 PM3/16/05

to

"Florian Gross" <fl...@ccan.de> responded

> Dave Burt wrote:
>
>> Hi Florian,
>
> Moin Dave.
>
>> As always, I'm amazed by your concise code. But your solution seems to be
>> failing a bunch of my tests (and not just by chopping lines early, which
>> is allowed):
>
> Thanks, I'll have a look.
>
>> encoding:
>> - escapes mid-line whitespace
>
> I'm not sure I get this. Am I incorrectly escaping mid-line whitespace or
> am I incorrectly not escaping it? And what is mid-line whitespace?

Tabs and spaces that are followed by something printable on the same line
should not be escaped; see the following:

5) Failure:
test_encode_12(TC_QuotedPrintable) [(eval):2]:
<"=3D=3D=3D
=\r\n =20\r\n"> expected but was
<"=3D=3D=3D=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
=\r\n=20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20 =20
=20 =20 =20 =20 =20 =20 =20 =20\r\n">.

>> - escapes '~'
>
> Heh, classic off-by-one. Easily fixed by changing the Regexp. See source
> below.

Too easy :)

>> - allows too-long lines (my tests saw up to 104 characters on a line)
>
> Any hints on when this is happening? I can't see why and when this would
> happen.

test_encode_12 also demonstrates this. I fixed it by changing
/[\t ](?:[\v\t ]|$)../ to /[\t ]$../.
This (obviously) fixes the mid-line whitespace as well.

>> - allows unescaped whitespace on the end of a line (as long as it's
>> preceded by escaped whitespace)
>
> Fixed. See code below.
>
>> decoding:
>> - doesn't ignore trailing literal whitespace
>
> Well, I don't think that's much of an issue as I'm not sure when trailing
> whitespace would be prepended to lines, but I've fixed it anyway.

It's not mentioned in the quiz question, although you can infer that it is
illegal from the quiz question. The idea is that if there is trailing
whitespace, it has been added in transit and should be removed (it's not
actually part of the data that was encoded).

Also, this, on line 10: "char[0 ... -1] + ...", seems redundant - with char
as a one-character match, it's an empty string.

> Here's the new code:
>

>> <snip>

>
> I'll repost the full source when I've sorted out that other problem as
> well.

Cheers,
Dave

Ruby Quiz

unread,

Mar 17, 2005, 9:03:21 AM3/17/05

to

It was pointed out, first in private email and later on Ruby Talk, that your
quiz editor isn't quite up on all of Ruby's features. Support for the Quoted
Printable encoding is already in the language. You can access this with the "M"
format specification of Array.pack() and String.unpack(). Dave burt posted a
modification to his solution using these features. Here's that class:

class String
def to_quoted_printable(*args)
[self].pack("M").gsub(/\n/, "\r\n")
end

def from_quoted_printable
self.gsub(/\r\n/, "\n").unpack("M").first
end
end

Ruby's Quoted Printable encoder uses standard Unix line endings, which is why
you see the gsub() translations to the specified carriage-return line-feed pairs
above. That doesn't handle the XML aspect of the quiz, but you can add that
with a few more calls to gsub() at both ends.

Ignoring my knowledge gap, we still have some interesting solutions to discuss.

Let's start with a solution. Here's Glenn Parker's code:

#!/usr/bin/env ruby -w

require 'getoptlong'

MaxLength = 76

def main
opts = GetoptLong.new(
[ "-d", GetoptLong::NO_ARGUMENT ],
[ "-x", GetoptLong::NO_ARGUMENT ]
)
$opt_decode = false
$opt_xml = false
opts.each do |opt, arg|
case opt
when "-d": $opt_decode = true
when "-x": $opt_xml = true
end
end

if $opt_decode
decode_input
else
encode_input
end
end

def encode_input
STDOUT.binmode # We need to control the line-endings.

while (line = gets) do

# Note: String#chomp! swallows more than just $/.
line.sub!(/#{$/}$/o, "")
# Encode the entire line.
line.gsub!(/[^\t -<>-~]+/) { |str| encode_str(str) }
line.gsub!(/[&<>]+/) { |str| encode_str(str) } if $opt_xml
line.sub!(/\s*$/) { |str| encode_str(str) }
# Split the line up as needed.
while line.length > MaxLength

### original code ###
# split = line.index("=", MaxLength - 4) - 1
# split = (MaxLength - 2) if split.nil? or (split > MaxLength - 2)
### BUGFIX: index() can return nil, so don't subtract -JEG2 ###

split = line.index("=", MaxLength - 4)

split = (MaxLength - 2) if split.nil? or ( split - 1 >
MaxLength - 2 )
### END BUGFIX ###

print line[0..split], "=\r\n"
line = line[(split + 1)..-1]
end
print line, "\r\n"
end
end

def encode_str(str)
encoded = ""
str.each_byte { |c| encoded << "=%02X" % c }
encoded
end

def decode_input

while (line = gets) do

line.chomp!
line.gsub!(/=([\dA-F]{2})/) { $1.hex.chr }
if line[-1] == ?=
print line[0..-2]
else
print line, $/
end
end
end

main

Let me talk a little about that shebang line. It doesn't work on my system:

$ chmod +x quoted_printable.rb
$ ./quoted_printable.rb
env: ruby -w: No such file or directory

That's one of the minuses of using the "env ruby" trick. If you don't want to
hardcode the path and still want to enable warnings inside the script, the
following works:

#!/usr/bin/env ruby
$VERBOSE = true # enable warnings

That doesn't have anything to do with the quiz, of course, and you could still
run Glenn's code with "ruby quoted_printable.rb", but having been bitten by that
same problem myself, I wanted to mention it.

Getting back to the code, Glenn pulls in getoptlong, defines a constant to hold
the line length, and then defines a method called main(). main() just parses
command line options (setting the globals $opt_decode and $opt_xml as needed),
then hands off work to either decode_input() or encode_input().

For encoding, encode_input() handles most of the work. It starts by shutting
off line ending translation with a call to binmode(). I believe that's only
needed when your code is running on Windows, but it's still a great habit to
form anytime you're going to muck with raw line endings.

From there, encode_input() loops over STDIN with a line-by-line read. Note that
it performs its own chomp() with a call to sub!(). The author explains why in
his submission email:

I found it a bit more frustrating that String#chomp! is a greedier than
you might expect, discarding all sorts of potential line endings,
instead of limiting itself to $/.

The next three substitutions encode the needed characters on the line. They're
just a combination of simple Regexps and calls to encode_str(). If you glance
down at encode_str(), you can see that it's a very simple byte to hex
translator.

The final while loop in encode_input() breaks up long lines. It looks more
complex above, because I added a bug fix too it. When running tests on the
code, Glenn's script crashed on me. The issue was that String.index() can
return nil and you can't subtract 1 from nil. I just moved the "- 1" down a
line to work around this.

The reason index() is called looking for an "=" is to prevent breaking up an
already encoded character. If there aren't any encoded characters, the line is
split at MaxLength.

This method of breaking up the lines can break lines mid-word. You might want
to consider trying to break them at word boundaries though. A big advantage of
Quoted Printable is that it's really a Base64-like encoding, that keeps plain
text pretty readable. That's why I suggested its use to embed data in XML. To
that end, breaking lines on word boundaries just enhances that characteristic.

Getting back to the code one last time, decode_input() is even easier to follow.
It too is a line-by-line read, with a gsub!() used to unencode and a basic if
statement used to unwrap lines (by dropping the = and not printing a line
ending).

The other solutions are all quite interesting and I do encourage everyone to
check them out. Most submissions modified String to add the conversions.
Matthew Moss also added foreach() style readers to IO. Dave Burt included a
nice set of test cases, used by himself and at least one other person. Good
stuff all around.

My thanks to all who endure my mental lapses, and to those who gently correct
me. I need all the help I can get.

Great news: We have a record four quizzes queued up right now, all of them
including some contribution from others! I'm so pleased. We'll start our run
tomorrow with a quiz for people who know when to Hold'em and when to fold 'em...