Disco and Python3 issues and resolutions

85 views
Skip to first unread message

Eric Berkowitz

unread,
Nov 8, 2014, 3:21:14 PM11/8/14
to disc...@googlegroups.com
I have for the first time installed Disco (0.5.4) on systems running python 3

As far as I can tell, disco is coded to exhibit some interesting behaviors on python3 systems:

The marshalling of everything to bytes caused some issues that I am wondering if anyone else has addressed and/or resolved.

1.  yielded k,v pairs would crash the job if instanceof(k,'int')=True
This took a while to deal with.  I solved this by editing disco.compat and prepending the following to persistent_hash(input)
if isinstance(input,int):
             return int(md5(byte_of_int(input)).hexdigest(), 16)

2. The remaining is issue is that the values remain marshalled as byte arrays when returned from the jobs so the final aggregation or iteration over the results requires explicit calls to the bytes_to.... functions in disco.compat in order to render them as their original types.

Does anyone else have this experience.

Tim Spurway

unread,
Nov 10, 2014, 8:31:24 PM11/10/14
to disc...@googlegroups.com
Hey Folks,

I am giving a Disco  talk at PyData NYC on November 22 (http://pydata.org/nyc2014/schedule/) and would like to include a slide with some people and organizations using Disco.

Share privately or on-list.  Let me know as much info as you want about the project (size of cluster, amount of data processed, type of processing, etc). 

I'd also like to know who else is coming out to PyData NYC!  If I get > 0 responses, it's grounds for a mini-meetup!

cheers,
tim

Giles Brown

unread,
Mar 20, 2015, 10:18:40 PM3/20/15
to disc...@googlegroups.com
I have just started trying Disco with Python 3 and also found that the persistent_hash definition seems wrong.

It broke for the examples/util/estimate_pi because it was trying to use str_to_bytes on an int and found it had not "encode" attribute.

It seems to me the correct fix is to simply use the built in 'hash' function in the same way that the Python 2 version does.

I'm not clear why it was switched to using an md5 hash?
Reply all
Reply to author
Forward
0 new messages