can't not run first example ipcount

79 views
Skip to first unread message

月张

unread,
May 12, 2012, 1:42:47 AM5/12/12
to dumbo...@googlegroups.com
python version : 2.7.2
hadoop version: cdh 3u2
========================================

  • python install: ok
  • dumbo install: ok (test by import dumbo)
  • run localhost mode: ok
( dumbo start ipcount.py -input access.log -output ipcounts)
  • run localhost mode cat command: error, log below
 dumbo cat ipcounts | sort -k2,2nr | head -n 5 
Traceback (most recent call last):
  File "/usr/local/bin/dumbo", line 8, in <module>
    load_entry_point('dumbo==0.21.32', 'console_scripts', 'dumbo')()
  File "build/bdist.linux-x86_64/egg/dumbo/__init__.py", line 32, in execute_and_exit
  File "build/bdist.linux-x86_64/egg/dumbo/cmd.py", line 42, in dumbo
    functions respectively.
  File "build/bdist.linux-x86_64/egg/dumbo/cmd.py", line 101, in cat
    
  File "build/bdist.linux-x86_64/egg/dumbo/backends/unix.py", line 114, in cat
TypeError: unsupported operand type(s) for +: 'Options' and 'list'

  • run job on cluster: error , common is 

sudo -u hdfs dumbo start /home/hdfs/ipcount.py -hadoop /opt/hadoop/ -input /user/hdfs/access_log -output /user/hdfs/result

mapreduce job log is:

 java.io.IOException: log:null
R/W/S=341/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=mapred
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Sat May 12 13:27:48 CST 2012
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.io.WritableUtils.writeString(WritableUtils.java:100)
at org.apache.hadoop.typedbytes.TypedBytesOutput.writeString(TypedBytesOutput.java:223)
at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeText(TypedBytesWritableOutput.java:182)
at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.write(TypedByte
 
 
can somebody help me? 

thx

 

Klaas Bosteels

unread,
May 13, 2012, 4:54:19 AM5/13/12
to dumbo...@googlegroups.com
Answers are inline.

On Sat, May 12, 2012 at 7:42 AM, 月张 <hei...@gmail.com> wrote:
python version : 2.7.2
hadoop version: cdh 3u2
========================================

  • python install: ok
  • dumbo install: ok (test by import dumbo)
  • run localhost mode: ok
( dumbo start ipcount.py -input access.log -output ipcounts)
  • run localhost mode cat command: error, log below
 dumbo cat ipcounts | sort -k2,2nr | head -n 5 
Traceback (most recent call last):
  File "/usr/local/bin/dumbo", line 8, in <module>
    load_entry_point('dumbo==0.21.32', 'console_scripts', 'dumbo')()
  File "build/bdist.linux-x86_64/egg/dumbo/__init__.py", line 32, in execute_and_exit
  File "build/bdist.linux-x86_64/egg/dumbo/cmd.py", line 42, in dumbo
    functions respectively.
  File "build/bdist.linux-x86_64/egg/dumbo/cmd.py", line 101, in cat
    
  File "build/bdist.linux-x86_64/egg/dumbo/backends/unix.py", line 114, in cat
TypeError: unsupported operand type(s) for +: 'Options' and 'list'

This sounds like a bug in the unix backend (i.e. the local running mode) caused by a recent refactor. Please file a github issue for this. 
  • run job on cluster: error , common is 

sudo -u hdfs dumbo start /home/hdfs/ipcount.py -hadoop /opt/hadoop/ -input /user/hdfs/access_log -output /user/hdfs/result

mapreduce job log is:

 java.io.IOException: log:null
R/W/S=341/0/0 in:NA [rec/s] out:NA [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=mapred
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Sat May 12 13:27:48 CST 2012
java.io.IOException: Broken pipe
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:282)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.io.WritableUtils.writeString(WritableUtils.java:100)
at org.apache.hadoop.typedbytes.TypedBytesOutput.writeString(TypedBytesOutput.java:223)
at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.writeText(TypedBytesWritableOutput.java:182)
at org.apache.hadoop.typedbytes.TypedBytesWritableOutput.write(TypedByte
Could you paste us the stderr logs for the failing tasks instead. The Java exception typically isn't very informative for Dumbo programs. 
can somebody help me? 

thx

 

--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/dumbo-user/-/sjjMmtQ7bkcJ.
To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.

月张

unread,
May 15, 2012, 4:23:25 AM5/15/12
to dumbo...@googlegroups.com
hello  Klaas  

can you give me some clue.

-heipark

在 2012年5月13日星期日UTC+8下午4时54分19秒,Klaas Bosteels写道:
Answers are inline.

To unsubscribe from this group, send email to dumbo-user+unsubscribe@googlegroups.com.

Klaas Bosteels

unread,
May 20, 2012, 2:28:57 AM5/20/12
to dumbo...@googlegroups.com
So the first issue has been fixed now: https://github.com/klbostee/dumbo/issues/54

For the second one, try clicking on the failed tasks number in the hadoop web interface and then clicking on "last 8kb" in the logs column. This should lead you to the stdout and stderr logs for the tasks, which are usually more informative then the java error.

-K

To view this discussion on the web visit https://groups.google.com/d/msg/dumbo-user/-/B8O5vxIPslUJ.

To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.

月张

unread,
May 22, 2012, 1:32:39 AM5/22/12
to dumbo...@googlegroups.com
hi  Klaas :

thanks for you reply.

first problem has been resolved by update  dumbo from 0.21.32 to 0.21.33.  i just download  0.21.33  zip package and run "python setup.py install" install it.

second problem, indeed, as you say,  " last 8kb" show me "stderr logs /usr/bin/python: module ipcount not found".
so, i add -python option to my submit job command, then everyting is ok.

 /usr/local/Python2.7/bin/dumbo start ipcount.py -hadoop /usr/lib/hadoop -input /user/hdfs/access_log -output result -python '/usr/local/bin/python'  


Through the above two problems,i have another questions :
1. my datanode default python is 2.7.2, but "stderr" log tell me python2.4.3(/usr/bin/python) running the job, why?
2. i must use "dumbo cat" look at the file content ? it's not works to use "hadoop fs -text".

thx.


在 2012年5月20日星期日UTC+8下午2时28分57秒,Klaas Bosteels写道:
So the first issue has been fixed now: https://github.com/klbostee/dumbo/issues/54

For the second one, try clicking on the failed tasks number in the hadoop web interface and then clicking on "  0.21.33 " in the logs column. This should lead you to the stdout and stderr logs for the tasks, which are usually more informative then the java error.

Klaas Bosteels

unread,
Jun 20, 2012, 2:27:14 AM6/20/12
to dumbo...@googlegroups.com
1. If python 2.4 is your /usr/bin/python, isn't that your default python then?

2. Dumbo writes sequence files by default (for efficiency reasons). If you want text then you need to use the option -outputformat text (and then you will be able to use hadoop -text obviously).

-K

To view this discussion on the web visit https://groups.google.com/d/msg/dumbo-user/-/FM5FrhAJ-fAJ.

To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages