Coding challenge: diff two io.Readers

1,526 views
Skip to first unread message

Caleb Spare

unread,
Feb 3, 2015, 10:12:24 PM2/3/15
to golang-nuts
The challenge is to write an elegant implementation of this function:

// Diff compares the contents of two io.Readers.
// The return value of identical is true if and only if there are no errors
// in reading r1 and r2 (io.EOF excluded) and r1 and r2 are
// byte-for-byte identical.
func Diff(r1, r2 io.Reader) (identical bool, err error)

Here's a little harness to test it out:


Background: a few weeks ago I wrote this function as part of a very simple diff package (https://godoc.org/github.com/cespare/diff). My implementation is here:


I believe my version is correct, but I'm not too happy with it. In particular, I dislike the usage of io.ReadFull; if one Reader doesn't have a full buffer's worth of data available, it would be better to read from the other Reader in the meantime rather than waiting on the first one.

-Caleb

Dan Kortschak

unread,
Feb 3, 2015, 11:27:19 PM2/3/15
to Caleb Spare, golang-nuts
Cheapskate version:

http://play.golang.org/p/S2s3e0nj4s

Brian Fallik

unread,
Feb 4, 2015, 12:51:01 AM2/4/15
to Dan Kortschak, Caleb Spare, golang-nuts
Here's a slightly enhanced cheapskate version that operates on longer chunks: http://play.golang.org/p/d__M1HwJjv

brian


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dan Kortschak

unread,
Feb 4, 2015, 1:47:47 AM2/4/15
to Brian Fallik, Caleb Spare, golang-nuts
On Wed, 2015-02-04 at 00:50 -0500, Brian Fallik wrote:
> Here's a slightly enhanced cheapskate version that operates on longer
> chunks: http://play.golang.org/p/d__M1HwJjv
>
Using the default bufio.Reader size, but explicitly, and ensuring each
buffer is completely used on each Read:

http://play.golang.org/p/d7Mtkn0ByS

Still not formally correct since it is allowable for Read to return
without error not having completely filled the slice.

If that's a worry:

http://play.golang.org/p/oTBzjD0cdu

Dan Kortschak

unread,
Feb 4, 2015, 1:51:04 AM2/4/15
to Brian Fallik, Caleb Spare, golang-nuts

Dan Kortschak

unread,
Feb 4, 2015, 2:23:23 AM2/4/15
to Dan Kortschak, Brian Fallik, Caleb Spare, golang-nuts

Egon

unread,
Feb 4, 2015, 2:37:44 AM2/4/15
to golan...@googlegroups.com
http://play.golang.org/p/iuRyQchiK2

(Although, I'm sure there's a nicer way to write the handling of error cases...)

+ Egon

Péter Szilágyi

unread,
Feb 4, 2015, 4:48:46 AM2/4/15
to Egon, golang-nuts

--

roger peppe

unread,
Feb 4, 2015, 4:52:50 AM2/4/15
to Dan Kortschak, Brian Fallik, Caleb Spare, golang-nuts
On 4 February 2015 at 06:47, Dan Kortschak
<dan.ko...@adelaide.edu.au> wrote:
> On Wed, 2015-02-04 at 00:50 -0500, Brian Fallik wrote:
>> Here's a slightly enhanced cheapskate version that operates on longer
>> chunks: http://play.golang.org/p/d__M1HwJjv
>>
> Using the default bufio.Reader size, but explicitly, and ensuring each
> buffer is completely used on each Read:
>
> http://play.golang.org/p/d7Mtkn0ByS
>
> Still not formally correct since it is allowable for Read to return
> without error not having completely filled the slice.

"Not formally correct" is an understatement there - that code
will fail for many readers in practice. That's why we have things
like iotest.OneByteReader.

> If that's a worry:
>
> http://play.golang.org/p/oTBzjD0cdu

That seems almost identical in spirit to Caleb's original version.

It also has some problems (even in your later "fixed conditional version").
In particular, your readComplete function will go into an infinite loop
if it gets a short read (try setting the "size" constant to 3, for example).
Also it doesn't behave correctly when one reader returns EOF
along with the data but the other doesn't (Caleb's original version
also has that issue, I think)

Caleb Spare

unread,
Feb 4, 2015, 5:17:46 AM2/4/15
to roger peppe, Dan Kortschak, Brian Fallik, golang-nuts
Thanks for the suggestions so far, everyone! Thanks especially to Josh
and Roger for reminding me of testing/iotest -- it's great for testing
various reader/writer corner cases.

Roger:

> Also it doesn't behave correctly when one reader returns EOF
> along with the data but the other doesn't (Caleb's original version
> also has that issue, I think)

As far as I can tell, using io.ReadFull shields me from this problem.
See for example this expanded test where I used an
iotest.DataErrReader for one of the inputs. (strings.Reader seems to
return EOF after the data has been fully read, so I think that this is
the case you are talking about)

http://play.golang.org/p/1D8jMXyxkX

-Caleb

Dan Kortschak

unread,
Feb 4, 2015, 5:35:41 AM2/4/15
to roger peppe, Brian Fallik, Caleb Spare, golang-nuts
On Wed, 2015-02-04 at 09:52 +0000, roger peppe wrote:
> It also has some problems (even in your later "fixed conditional
> version").
> In particular, your readComplete function will go into an infinite
> loop if it gets a short read (try setting the "size" constant to 3,
> for example).
> Also it doesn't behave correctly when one reader returns EOF along
> with the data but the other doesn't (Caleb's original version also has
> that issue, I think)

I think this addresses those issues (there is no test in the harness for
the mismatched EOF case, so I have added a trivial example of that):

http://play.golang.org/p/24D3zm5JhM

Dan Kortschak

unread,
Feb 4, 2015, 5:39:11 AM2/4/15
to roger peppe, Brian Fallik, Caleb Spare, golang-nuts
With Caleb's extended tests:

http://play.golang.org/p/2jHdHj9Qvj

roger peppe

unread,
Feb 4, 2015, 5:48:28 AM2/4/15
to Caleb Spare, Dan Kortschak, Brian Fallik, golang-nuts
You're right - I hadn't realised that ReadFull doesn't return
an io.EOF error if it's returned with the data.

However, this means that your code can read twice at EOF,
which isn't ideal, because sometimes EOF is not sticky.

http://play.golang.org/p/5y9PbciHVx

Dan Kortschak

unread,
Feb 4, 2015, 6:03:31 AM2/4/15
to roger peppe, Caleb Spare, Brian Fallik, golang-nuts
On Wed, 2015-02-04 at 10:48 +0000, roger peppe wrote:
> However, this means that your code can read twice at EOF,
> which isn't ideal, because sometimes EOF is not sticky.
>
> http://play.golang.org/p/5y9PbciHVx


http://play.golang.org/p/g-uXGjNr-6

Egon

unread,
Feb 4, 2015, 6:11:17 AM2/4/15
to golan...@googlegroups.com
Also, I was thinking that it might make more sense to write it as Equal rather than Diff. It's clearer and avoids the negative form. I would Diff to return the difference of the two rather than they are identical... i.e. ignoring the error value

if Diff(r1, r2) { print("same contents") }

doesn't really make sense, whereas

if SameContent(r1, r2) { print("same content") }

does...

roger peppe

unread,
Feb 4, 2015, 6:35:59 AM2/4/15
to Egon, golang-nuts
On 4 February 2015 at 11:11, Egon <egon...@gmail.com> wrote:
> Also, I was thinking that it might make more sense to write it as Equal
> rather than Diff. It's clearer and avoids the negative form. I would Diff to
> return the difference of the two rather than they are identical... i.e.
> ignoring the error value
>
> if Diff(r1, r2) { print("same contents") }
>
> doesn't really make sense, whereas
>
> if SameContent(r1, r2) { print("same content") }
>
> does...

+1

FWIW I'd probably do something like this. It's a few lines longer
than Dan's version, but easier to reason about, I think.

http://play.golang.org/p/ZcyJCkOu54
>
> On Wednesday, 4 February 2015 09:37:44 UTC+2, Egon wrote:
>>
>> http://play.golang.org/p/iuRyQchiK2
>>
>> (Although, I'm sure there's a nicer way to write the handling of error
>> cases...)
>>
>> + Egon
>>
>> On Wednesday, 4 February 2015 05:12:24 UTC+2, Caleb Spare wrote:
>>>
>>> The challenge is to write an elegant implementation of this function:
>>>
>>> // Diff compares the contents of two io.Readers.
>>> // The return value of identical is true if and only if there are no
>>> errors
>>> // in reading r1 and r2 (io.EOF excluded) and r1 and r2 are
>>> // byte-for-byte identical.
>>> func Diff(r1, r2 io.Reader) (identical bool, err error)
>>>
>>> Here's a little harness to test it out:
>>>
>>> http://play.golang.org/p/d6XdlVQ0FT
>>>
>>> Background: a few weeks ago I wrote this function as part of a very
>>> simple diff package (https://godoc.org/github.com/cespare/diff). My
>>> implementation is here:
>>>
>>> https://github.com/cespare/diff/blob/master/diff.go#L16-L48
>>>
>>> I believe my version is correct, but I'm not too happy with it. In
>>> particular, I dislike the usage of io.ReadFull; if one Reader doesn't have a
>>> full buffer's worth of data available, it would be better to read from the
>>> other Reader in the meantime rather than waiting on the first one.
>>>
>>> -Caleb
>
Reply all
Reply to author
Forward
0 new messages