Help! Same code, different results

103 views
Skip to first unread message

Tong Sun

unread,
Aug 25, 2017, 11:39:29 AM8/25/17
to golang-nuts
Hi, 

I'm experiencing a very very strange problem now -- the same Go code is producing different results for me
I'm not kidding, I can't believe that myself, so I've spent the past few days going back and forth to verify everything. 
Now, after all these days, the only conclusion that I can make is, albeit how bazzard it is, same code, different results. 

Can someone verify for me what you get please? 


then 

cd go-dedup/fsimilar
go build
find test/sim -type f | ./fsimilar -i -d 12 -vv 

and tell me what's the last output that you got please. 

The problem is that two of my machines produce:

[fsimilar] ## Similar items
 map[Similars:[map[Hash:6184610222622303958 Dist:0 SizeRef:1 Name:GNU - 2001 - Python Standard Library Ext:.pdf Size:1 Dir:test/sim/] map[Name:GNU - Python Standard Library (2001) Ext:.rar Size:1 Dir:test/sim/ Hash:6184610222622303958 Dist:0 SizeRef:1]]].
test/sim/GNU - 2001 - Python Standard Library.pdf
test/sim/GNU - Python Standard Library (2001).rar

But another one, the only one, produce:

[fsimilar] ## Similar items
 map[Similars:[{(eBook) GNU - Python Standard Library 2001 .pdf 1 test/sim/ 15408562819203262167 8 1} {GNU - 2001 - Python Standard Library .pdf 1 test/sim/ 6184610222622303958 0 1} {GNU - Python Standard Library (2001) .rar 1 test/sim/ 6184610222622303958 0 1} {Python Standard Library .zip 1 test/sim/ 6175699711939618002 11 1}]].
test/sim/(eBook) GNU - Python Standard Library 2001.pdf
test/sim/GNU - 2001 - Python Standard Library.pdf
test/sim/GNU - Python Standard Library (2001).rar
test/sim/Python Standard Library.zip

which is what I actually want. 

The rest of the following output are exactly the same across all three machines:

$ go version 
go version go1.8.1 linux/amd64

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 17.04
Release:        17.04
Codename:       zesty

$ git status 
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

I am afraid that you will get the first result. Please let me know. 

Oh, I do get one difference to illustrate how strange things are -- for the same code of:

                verbose(2, "## Similar items\n %v.", m)

The working machine produces (last two line):

[fsimilar] ## Similar items
 map[Similars:[{Improve Soccer Shooting Technique .mp4 10043873 ./Try These Soccer Drills/ 17777808297800170271 0 10043873} {Improve Soccer Shooting Technique .mp4 10043873 ./Top Soccer Training Videos/ 17777808297800170271 0 10043873}]].
[fsimilar] ## Similar items
 map[Similars:[{Soccer Drills For Youth .mp4 11650500 ./Youth Soccer Training Drills/ 18062776733066936110 0 11650500} {Soccer Drills For Youth .mp4 11650500 ./Top Soccer Training Videos/ 18062776733066936110 0 11650500}]].

while the machine with incorrect result produces (last two line):

[fsimilar] ## Similar items
 map[Similars:[map[Ext:.mp4 Size:10043873 Dir:./Try These Soccer Drills/ Hash:17777808297800170271 Dist:0 SizeRef:10043873 Name:Improve Soccer Shooting Technique] map[Size:10043873 Dir:./Top Soccer Training Videos/ Hash:17777808297800170271 Dist:0 SizeRef:10043873 Name:Improve Soccer Shooting Technique Ext:.mp4]]].
[fsimilar] ## Similar items
 map[Similars:[map[Dir:./Youth Soccer Training Drills/ Hash:18062776733066936110 Dist:0 SizeRef:11650500 Name:Soccer Drills For Youth Ext:.mp4 Size:11650500] map[SizeRef:11650500 Name:Soccer Drills For Youth Ext:.mp4 Size:11650500 Dir:./Top Soccer Training Videos/ Hash:18062776733066936110 Dist:0]]].

even though the code is the same and their `go version` reported the same as well. 

The command to produce above is, 

./fsimilar -i test/test1.lst -S -d 6 -vv 2> /tmp/log

Then compare the two logs. 

I've spent the past few days to verify & double check everything, and now my mind is blocked and I'm out of ideas. 
Somebody help, Please. 

THANKS!!

Konstantin Khomoutov

unread,
Aug 25, 2017, 12:10:44 PM8/25/17
to Tong Sun, golang-nuts
On Fri, Aug 25, 2017 at 08:39:29AM -0700, Tong Sun wrote:

> I'm experiencing a *very very* strange problem now -- the same Go code is
> producing different results *for me*.
[...]
> Can someone verify for me what you get please?
>
> go get github.com/go-dedup/fsimilar
>
> then
>
> cd go-dedup/fsimilar
> go build
> find test/sim -type f | ./fsimilar -i -d 12 -vv
>
> and tell me what's the last output that you got please.
>
> The problem is that *two *of my machines produce:
>
> [fsimilar] ## Similar items
> map[Similars:[map[Hash:6184610222622303958 Dist:0 SizeRef:1 Name:GNU -
> 2001 - Python Standard Library Ext:.pdf Size:1 Dir:test/sim/] map[Name:GNU
> - Python Standard Library (2001) Ext:.rar Size:1 Dir:test/sim/
> Hash:6184610222622303958 Dist:0 SizeRef:1]]].
> *test/sim/GNU - 2001 - Python Standard Library.pdf*
> *test/sim/GNU - Python Standard Library (2001).rar*
>
> But another one, the *only one*, produce:
[...]

Two quick points:

* What happens if you copy the "working" binary to the "wrong" machine
and run it there?

Does it work?

I mean, you should rule out possible inconsistencies with build
systems. This means both Go and all the graph of the libraries
your project uses. If you're building on each machine, try doing
clean-room building. This means freshly cloning the Go source code
of, say, the 1.8 release branch and building it. Then copying over
the whole hierarchy of the whole branch of the dependencies from the
"working" machine into a new directory which must be a Go workspace.
Then building there.

* This bit [1] smells bad IMO.

From the quick glance I failed to see where it's used but
why it's there? What happens if you seed PRNG with a constant?

1. https://github.com/go-dedup/fsimilar/blob/master/fsimilarCLICmd.go#L70

Tong Sun

unread,
Aug 25, 2017, 1:51:24 PM8/25/17
to Konstantin Khomoutov, golang-nuts
Ah, good idea. Previously I was copying the bad code into good machine, and the result is the bad code works in the good machine. 

Now I copy the good code into bad machine, and the result is good code doesn't work in bad machine any more. 

Giving that the original source is pulled from the same git, with same status, adding these three things up I'm sure the code are the same now -- the only different is the machine. 

 
  I mean, you should rule out possible inconsistencies with build
  systems.  This means both Go and all the graph of the libraries
  your project uses.  If you're building on each machine, try doing
  clean-room building.  This means freshly cloning the Go source code
  of, say, the 1.8 release branch and building it.  Then copying over
  the whole hierarchy of the whole branch of the dependencies from the
  "working" machine into a new directory which must be a Go workspace.
  Then building there.

All three machines are using the same golang ubuntu packages:

$ apt-cache policy golang-1.8-go
golang-1.8-go:
  Installed: 1.8.1-1ubuntu1
  Candidate: 1.8.1-1ubuntu1
 
I'll try download go binary from go source next. 


* This bit [1] smells bad IMO.

  From the quick glance I failed to see where it's used but
  why it's there?  What happens if you seed PRNG with a constant?

1. https://github.com/go-dedup/fsimilar/blob/master/fsimilarCLICmd.go#L70

Yeah, you are right, it is not used at all. It is copied there for future features that I haven't put in yet. 
So we can rule that out. 



Can someone try it out for me and see what you got please? 


Tong Sun

unread,
Aug 25, 2017, 2:56:29 PM8/25/17
to golang-nuts

Update, 


- all the original source on three machines are pulled from the same git, with same status.
- I copied the bad code into good machine, and compared the two folders, and files are the same
- I ran the bad code in good machine, and the result is the bad code works in the good machine. 
I copied the good code into bad machine, and the result is good code doesn't work in bad machine any more. 

Adding all these things up I'm sure the code are the same -- the only different is the machine/build environment. 

I installed from https://storage.googleapis.com/golang/go1.8.3.linux-amd64.tar.gz to both machines. and Now both machines have go1.8.3:

$ type go 
go is hashed (/opt/bin/go)

$ go version 
go version go1.8.3 linux/amd64


And having updgraded `github.com/fatih/structs` on the good machine, the following is no longer an issue any more. 

Oh, I do get one difference to illustrate how strange things are -- for the same code of:

                verbose(2, "## Similar items\n %v.", m)

The working machine produces (last two line):

[fsimilar] ## Similar items
 map[Similars:[{Improve Soccer Shooting Technique .mp4 10043873 ./Try These Soccer Drills/ 17777808297800170271 0 10043873} {Improve Soccer Shooting Technique .mp4 10043873 ./Top Soccer Training Videos/ 17777808297800170271 0 10043873}]].
[fsimilar] ## Similar items
 map[Similars:[{Soccer Drills For Youth .mp4 11650500 ./Youth Soccer Training Drills/ 18062776733066936110 0 11650500} {Soccer Drills For Youth .mp4 11650500 ./Top Soccer Training Videos/ 18062776733066936110 0 11650500}]].

while the machine with incorrect result produces (last two line):

[fsimilar] ## Similar items
 map[Similars:[map[Ext:.mp4 Size:10043873 Dir:./Try These Soccer Drills/ Hash:17777808297800170271 Dist:0 SizeRef:10043873 Name:Improve Soccer Shooting Technique] map[Size:10043873 Dir:./Top Soccer Training Videos/ Hash:17777808297800170271 Dist:0 SizeRef:10043873 Name:Improve Soccer Shooting Technique Ext:.mp4]]].
[fsimilar] ## Similar items
 map[Similars:[map[Dir:./Youth Soccer Training Drills/ Hash:18062776733066936110 Dist:0 SizeRef:11650500 Name:Soccer Drills For Youth Ext:.mp4 Size:11650500] map[SizeRef:11650500 Name:Soccer Drills For Youth Ext:.mp4 Size:11650500 Dir:./Top Soccer Training Videos/ Hash:18062776733066936110 Dist:0]]].

even though the code is the same and their `go version` reported the same as well. 

The command to produce above is, 

./fsimilar -i test/test1.lst -S -d 6 -vv 2> /tmp/log

Then compare the two logs. 



 However, the good is still good, and the bad is still bad. 


 
I've spent the past few days to verify & double check everything, and now my mind is blocked and I'm out of ideas. 
Somebody help, Please. 


Can someone try the above `find test/sim -type f | ./fsimilar -i -d 12 -vv` out for me and see what you got please? 


I am afraid that you will get the first result. Please let me know. 

THANKS!!

mhh...@gmail.com

unread,
Aug 25, 2017, 3:02:00 PM8/25/17
to golang-nuts
[mh-cbon@pc2 rendez-vous] $ cd ../../go-dedup/fsimilar/
[mh-cbon@pc2 fsimilar] $ find test/sim -type f | ./fsimilar -i -d 12 -vv
[fsimilar]  n='GNU - 2001 - Python Standard Library', e='.pdf', s='1', d='test/sim/'
[fsimilar] +: Simhash of 55d4263ae1a6e6d6 added.
[fsimilar]  n='(eBook) GNU - Python Standard Library 2001', e='.pdf', s='1', d='test/sim/'
[fsimilar] =: Simhash of d5d6363ef9e6e6d7 ignored for 55d4263ae1a6e6d6 (8).
[fsimilar]  n='GNU - Python Standard Library (2001)', e='.rar', s='1', d='test/sim/'
[fsimilar] =: Simhash of 55d4263ae1a6e6d6 ignored for 55d4263ae1a6e6d6 (0).
[fsimilar]  n='Python Standard Library', e='.zip', s='1', d='test/sim/'
[fsimilar] =: Simhash of 55b47e2af1a4a4d2 ignored for 55d4263ae1a6e6d6 (11).
[fsimilar]  n='Audio Book - The Grey Coloured Bunnie', e='.mp3', s='1', d='test/sim/'
[fsimilar] +: Simhash of f8fde9fe7f7dbd5e added.
[fsimilar]  n='PopupTest', e='.java', s='1', d='test/sim/'
[fsimilar] +: Simhash of a0d9070f13c20979 added.
[fsimilar]  n='LayoutTest', e='.java', s='1', d='test/sim/'
[fsimilar] +: Simhash of 37299e9d4e277b87 added.
[fsimilar]  n='ColoredGrayBunny', e='.ogg', s='1', d='test/sim/'
[fsimilar] +: Simhash of 25eade3cd3db679c added.
[fsimilar] ## Similar items
 map
[Similars:[map[Ext:.pdf Size:1 Dir:test/sim/ Hash:6184610222622303958 Dist:0 SizeRef:1 Name:GNU - 2001 - Python Standard Library] map[Size:1 Dir:test/sim/ Hash:6184610222622303958 Dist:0 SizeRef:1 Name:GNU - Python Standard Library (2001) Ext:.rar]]].

test
/sim/GNU - 2001 - Python Standard Library.pdf
test
/sim/GNU - Python Standard Library (2001).rar


[mh-cbon@pc2 fsimilar] $ go version
go version go1
.8 linux/amd64
[mh-cbon@pc2 fsimilar] $ go env
GOARCH
="amd64"
GOBIN
="/home/mh-cbon/gow/bin"
GOEXE
=""
GOHOSTARCH
="amd64"
GOHOSTOS
="linux"
GOOS
="linux"
GOPATH
="/home/mh-cbon/gow"
GORACE
=""
GOROOT
="/home/mh-cbon/.gvm/gos/go1.8"
GOTOOLDIR
="/home/mh-cbon/.gvm/gos/go1.8/pkg/tool/linux_amd64"
GCCGO
="gccgo"
CC
="gcc"
GOGCCFLAGS
="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build580933895=/tmp/go-build -gno-record-gcc-switches"
CXX
="g++"
CGO_ENABLED
="1"
PKG_CONFIG
="pkg-config"
CGO_CFLAGS
="-g -O2"
CGO_CPPFLAGS
=""
CGO_CXXFLAGS
="-g -O2"
CGO_FFLAGS
="-g -O2"
CGO_LDFLAGS
="-g -O2"
[mh-cbon@pc2 fsimilar] $ git log -n 1
commit
10c9f3f50da4f00f7fd874c24aa7c7787dcf275e (HEAD -> master, origin/master, origin/HEAD)
Author: Tong Sun <suntong@cpan.org>
Date:   Thu Aug 24 00:36:37 2017 -0400

   
- [+] add FileT.Similarity()
[mh-cbon@pc2 fsimilar] $

hth

Egon

unread,
Aug 25, 2017, 3:04:35 PM8/25/17
to golang-nuts
Try to find the first place where the processes diverge:

1. maybe find lists files in different order
2. maybe something read input does things in different order
3. maybe some processing uses maps --> hence random order
4. etc...

(of course run with -race, if you already haven't)

Tong Sun

unread,
Aug 25, 2017, 3:08:05 PM8/25/17
to mhh...@gmail.com, golang-nuts
Thanks a lot my friend. 

That's something I'm afraid of. 


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/uE99XUB08hw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tong Sun

unread,
Aug 25, 2017, 3:20:32 PM8/25/17
to golang-nuts, Egon, mhh...@gmail.com
[resent to group, sorry Egon]

On Fri, Aug 25, 2017 at 3:04 PM, Egon <egon...@gmail.com> wrote:
Try to find the first place where the processes diverge:

1. maybe find lists files in different order

Oh, YES! BINGO! THANKS A LOT. 

find test/sim -type f | sort | ./fsimilar -i -d 12 -vv 

gave me the correct output that i want. 

Ah~, my algorithm is so fragile. :(
 
Reply all
Reply to author
Forward
0 new messages