OverflowError - long int too large to convert to int

198 views
Skip to first unread message

jamii

unread,
Jun 3, 2011, 6:59:59 AM6/3/11
to Disco-development
I'm trying to get started with disco and can't figure out how to read
the results of my job. I seem to be in the wrong timezone to get help
on irc so I'll try here instead.

The code in examples/wordcount.py works fine but whenever I run this
job - https://gist.github.com/1006140 - or pretty much any variation
on it and follow the example I get overflow errors:


root@li317-243:~# disco run poc.ParseDownloads dump:downloads
ParseDownloads@51b:17a09:994dd
root@li317-243:~# disco wait @ | xargs ddfs xcat
long int too large to convert to int

So I poked around a little bit.

root@li317-243:~# disco results @
dir://li317-243/disco/li317-243/a5/ParseDownloads@51b:17a09:994dd/.disco/map-index.txt.gz
root@li317-243:~# cp /usr/local/var/disco/data/li317-243/a5/
ParseDownloads@51b:17a09:994dd/.disco/map-index.txt.gz ./
root@li317-243:~/results# gunzip ./map-index.txt.gz
root@li317-243:~# cat map-index.txt
0 disco://li317-243/disco/li317-243/a5/ParseDownloads@51b:17a09:994dd/.disco/partitions-51b-17e8e-da1b2/part-0
root@li317-243:~# ddfs xcat
disco://li317-243/disco/li317-243/a5/ParseDownloads@51b:17a09:994dd/.disco/partitions-51b-17e8e-da1b2/part-0
long int too large to convert to int
root@li317-243:~# ddfs cat
disco://li317-243/disco/li317-243/a5/ParseDownloads@51b:17a09:994dd/.disco/partitions-51b-17e8e-da1b2/part-0
Killed
root@li317-243:~# cp /usr/local/var/disco/data/li317-243/a5/
ParseDownloads@51b:17a09:994dd/.disco/partitions-51b-17e8e-da1b2/
part-0 ./
root@li317-243:~# head part-0 -n 1
� �v �{� x^��]ɑ ��� �8��,���� �ojA�� @�C ���E �G�j
$�zF R�����ps3��̬�=�$!�x�u�0�o�������?O��a��p9� ��O���x�?���������� ��?
�滿�������q�Ϳ��������������G���� �_._~= ��p8����rկ �O2��� �����?
�o� ]��� ����t-D�{q: ~Ɵ��z~G�8�_~{:\@�̿�r8��Ğ��Li-8��Th��.���: ^����!
�;���< ^�e>./����Ϊ�j�)����i� R��Gj=��^���2��7M����x�� ֗?
g� `��0 N�q�� ��L����z�ȼ +��u����x^ �*G�0SZ�/������z�i���阞
UELI�t�_�>� ���q�(� ɯ�� �?��$L/�|z9.���r�zھ�@�L�����{� .W��z|Y/+t��czJ
Z��t�Q% 5�Z_��|y�I�QT� *fJ
^Croot@li317-243:~# ls -lh part-0
-rw-r--r-- 1 root root 2.8G 2011-06-03 06:53 part-0

I'm guessing that something is trying to index into that file using a
signed machine int, which is why this job files but the example jobs
work fine. I can move to a 64 bit machine for now but I think this
should definitely be considered a bug.

Cheers

Jamie

Jamie Brandon

unread,
Jun 4, 2011, 2:34:29 AM6/4/11
to Disco-development
I tried moving this to a 64 bit server but now I can't even run the tutorial:

root@li317-243:~# wget http://discoproject.org/media/text/bigfile.txt
...
root@li317-243:~# ddfs chunk data:bigtxt ./bigfile.txt
'ascii' codec can't decode byte 0x81 in position 0: ordinal not in range(128)

root@li317-243:~# echo 'foo' > foo
root@li317-243:~# ddfs chunk data:bigtxt ./foo
'ascii' codec can't decode byte 0x81 in position 0: ordinal not in range(128)

This is on Ubuntu 11.04 on a 64 bit linode VPS. Disco is installed
from source. Setup notes are here: https://gist.github.com/1007630

jamii

unread,
Jun 8, 2011, 7:58:55 AM6/8/11
to Disco-development
Ok, the encoding problem was resolved by switching json to simplejson.
With that out of the way everything works on the 64 bit server, which
means that the original problem on the 32 bit is almost certainly due
to the blob size exceeding 2gb. Assuming this is the case it is
probably worth mentioning this issue in the documentation or issuing a
warning within ddfs.

On Jun 4, 2:34 pm, Jamie Brandon <jamii...@googlemail.com> wrote:
> I tried moving this to a 64 bit server but now I can't even run the tutorial:
>
> root@li317-243:~# wgethttp://discoproject.org/media/text/bigfile.txt

Prashanth Mundkur

unread,
Jun 9, 2011, 11:55:34 AM6/9/11
to disc...@googlegroups.com
On Wed, Jun 8, 2011 at 4:58 AM, jamii <jami...@googlemail.com> wrote:
> Ok, the encoding problem was resolved by switching json to simplejson.
> With that out of the way everything works on the 64 bit server, which
> means that the original problem on the 32 bit is almost certainly due
> to the blob size exceeding 2gb. Assuming this is the case it is
> probably worth mentioning this issue in the documentation or issuing a
> warning within ddfs.

Jamii,

Was your 32-bit platform Ubuntu 11.04 as well?

>
> On Jun 4, 2:34 pm, Jamie Brandon <jamii...@googlemail.com> wrote:
>> I tried moving this to a 64 bit server but now I can't even run the tutorial:
>>
>> root@li317-243:~# wgethttp://discoproject.org/media/text/bigfile.txt
>> ...
>> root@li317-243:~# ddfs chunk data:bigtxt ./bigfile.txt
>> 'ascii' codec can't decode byte 0x81 in position 0: ordinal not in range(128)
>>
>> root@li317-243:~# echo 'foo' > foo
>> root@li317-243:~# ddfs chunk data:bigtxt ./foo
>> 'ascii' codec can't decode byte 0x81 in position 0: ordinal not in range(128)
>>
>> This is on Ubuntu 11.04 on a 64 bit linode VPS. Disco is installed
>> from source. Setup notes are here:https://gist.github.com/1007630
>

> --
> You received this message because you are subscribed to the Google Groups "Disco-development" group.
> To post to this group, send email to disc...@googlegroups.com.
> To unsubscribe from this group, send email to disco-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/disco-dev?hl=en.
>
>

Jamie Brandon

unread,
Jun 9, 2011, 12:01:56 PM6/9/11
to disc...@googlegroups.com

Yes, and both were linode VPS' images running the same setup script. There should be no differences between them except the word size. The 32 bit machine worked fine until the output filr exceeded 2gb.

Sudhindra

unread,
Jun 11, 2011, 10:44:31 PM6/11/11
to disc...@googlegroups.com
Hi All,
I have tried investigating the issues I might have with my install and
everytime i have tried this has been result
------------------------
Traceback (most recent call last):
File "count_words.py", line 1, in <module>
from disco.core import Job, result_iterator
ImportError: No module named disco.core
disco@trillian:~/disco/examples/util$ ls
chunk.py count_words.py grep.py wordcount_ddb.py
#count_words.py# dgrep query_ddb.py wordcount.py

disco@trillian:~/disco/examples/util$ python wordcount.py
Traceback (most recent call last):
File "wordcount.py", line 10, in <module>
from disco.core import Job
ImportError: No module named disco.core
------------------------
Most recently I updated my versions of python on both my linux and osx
machine. I seemed to have an older R13 version of erlang and updated
that to R14 latest. Both machines have python 2.6 or better.

Also my ssh setup seems to be working correctly..

------------------------
disco@trillian:~$ cd disco
disco@trillian:~/disco$ ssh localhost erl
Eshell V5.7.4 (abort with ^G)
1>

------------------------

So I still dont get what my installation is lacking.
Any help appreciated.

Thanks

- Sudhindra
PS: thanks for all the recommendations earlier but I seem to get only
weekends to try out things that you suggest and hence my feedback cycle
maybe slower.

Reply all
Reply to author
Forward
0 new messages