--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.
--
You received this message because you are subscribed to the Google Groups "dumbo-user" group.
To post to this group, send email to dumbo...@googlegroups.com.
To unsubscribe from this group, send email to dumbo-user+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/dumbo-user?hl=en.
Hey Seongsu,
Dumbo outputs typed bytes writables to sequence files by default. You
can convert this output to text by means of the "dumbo cat" command or
you can use hadoop streaming's "dumptb" command to convert the
sequence files to typed bytes first and then read it using the
"typedbytes" module (which is what the "dumbo cat" command does under
the hood).
Hope this helps,
-Klaas
The problem is that dumbo cat behaves very differently depending on what backend it uses:
1) When you provide a -hadoop /path/to/hadoop option, it will use the hadoop streaming backend and run the dumptb command under the hood and convert the outputted typed bytes on the fly to text.
2) When you don't provide a -hadoop option, it will use the local unix backend which will expect the output file to be a text file containing python repr strings.
-Klaas
On 24 Oct 2011, at 16:06, Bharath Krishnan wrote:
> Hi Klaas,
>
> I ran the streaming job on Amazon & copied one of the output files to
> my local disk from s3.
>
> Then I just called:
>
> dumbo cat filename
>
> I re-read the thread after I emailed you and figured out a way to make
> it work. (It still is confusing to me why dumbo cat wouldn't work).
>
> If I do the following:
>
> hadoop jar hadoop-streaming.jar dumptb part-00000 > data.tb
>
> and then read the file in python using the typedbytes module, it works.
>
> import typedbytes
> for x in typedbytes.PairedInput(open('dump.tb', 'rb')).reads():
> print x
>
> dumbo cat data.tb does not work though.
>
> Thanks!
>
> -bharath
>
>
> On Mon, Oct 24, 2011 at 10:00 AM, Klaas Bosteels
> <klaas.b...@gmail.com> wrote:
>> Hey bharath,
>>
>> It's normal for your output to be saved as sequence files containing typed bytes writables, but you should be able to print it using the dumbo cat command.
>>
>> The output below seems to suggest that you are somehow running dumbo cat as a map reduce job on hadoop. What command are you executing exactly?
>>
>> -K
>>
>> On 24 Oct 2011, at 15:56, bharath wrote:
>>
>>> Hi Klaas,
>>>
>>> I too have similar problems reading dumbo output. The data file starts
>>> like this:
>>>
>>> SEQ/org.apache.hadoop.typedbytes.TypedBytesWritable/
>>> org.apache.hadoop.typedbytes.TypedBytesWritable?l|?7gB?\[]'?_?
>>> A1920G0009_10_G? ??{0?#c?L2613L0024_LB_??&??u[A1812G0???
>>> eA1812G0019_J6_?͛?㭇12_01_??GBv?,?H7705H0061_QD_??GBv?,?
>>> H7704G0033_82_??
>>>
>>> When I try to convert it to text using dumbo cat, I get a bunch of
>>> warnings followed by an error:
>>>
>>> WARNING: skipping bad input (SEQ/
>>> org.apache.hadoop.typedbytes.TypedBytesWritable/
>>> org.apache.hadoop.typedbytes.TypedBytesWritable?l|?7gB?\[]'?_?
>>> A1920G0009_10_G? ??{0?#c?L???eA1812G0019_J6_?͛?㭇812G0019_01_????
>>> wcmA3726M0012_01_??GBv?,?H7705H0061_QD_??GBv?,?H7704G0033_82_??
>>> ?=?B2324H0034_TB_????]M??L2720L0013_2W_????5A??B8267H0005_01_???P?S?
>>> B2720L0023_01_?ɮ??k?B8263F0004_10_??uB?WA1813G0017_J6_??l(ݣ ?
>>> H3827M0002_9I_??L^?d[M1812G0025_NN??????-M1812G0026_NN_??Wn?
>>> M3612M0073_WI_?ǓU+?pB8267C0003_01_??j????M1810G0056_CW_????
>>> 3S2611L0109_7E_??)
>>> reporter:counter:Dumbo,Bad inputs,1
>>>
>>> Traceback (most recent call last):
>>> File "/usr/local/Cellar/python/2.7.1/bin/dumbo", line 9, in <module>
>>> load_entry_point('dumbo==0.21.30', 'console_scripts', 'dumbo')()
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/__init__.py", line 32, in execute_and_exit
>>> sys.exit(dumbo())
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/cmd.py", line 41, in dumbo
>>> retval = cat(sys.argv[2], parseargs(sys.argv[2:]))
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/cmd.py", line 94, in cat
>>> return create_filesystem(opts).cat(path, opts)
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/backends/unix.py", line 116, in cat
>>> return decodepipe(opts + [('file', path)])
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/cmd.py", line 155, in decodepipe
>>> for output in dumptext(outputs):
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/util.py", line 65, in dumptext
>>> for output in outputs:
>>> File "/usr/local/Cellar/python/2.7.1/lib/python2.7/site-packages/
>>> dumbo/util.py", line 55, in loadcode
>>> yield map(eval, input.split('\t', 1))
>>> File "<string>", line 1
>>> ?
>>> ^
>>> SyntaxError: unexpected EOF while parsing
>>>
>>> Any help is much appreciated!
>>>
>>> Thanks,
>>>
>>> -bharath
-bharath
On Mon, Oct 24, 2011 at 10:19 AM, Klaas Bosteels