ByteInputFormat

16 views
Skip to first unread message

Alagarsamy Rajamannar

unread,
Feb 20, 2014, 10:50:47 AM2/20/14
to twiste...@googlegroups.com
Hi,

   Thanks for such wonderful application...I enjoyed reading your code and learnt a lot. In fact I upgraded to latest SDK and deployed to Azure and tested all the sample apps. It works very well.

I work for a bio-informatics company. We have a unique requirements. Everyday we produce around RAW Gene sequence that doesn't fit with any Formtas like FAST, GFF etc.
So it is pretty much one byte per char. The size will be around 4GB  to 10GB. We produce only one file everyday.

Because the file is not a delimited, it is kind of byte array.

I have scan to the entire file to search for gene pattern listed by scientists....

How can I achieve this using Twister4Azure?

regards,
Raja

Thilina Gunarathne

unread,
Feb 25, 2014, 3:07:09 PM2/25/14
to twiste...@googlegroups.com
Hi Raja,
I'm glad that you liked the project and thanks for trying it out.

Twister4Azure support the InputFormat concept similar to Hadoop. You can write your own custom InputFormat to parse the file and to get the key-value pairs to your map function.

However, based on your description, it seems you need to read the whole file to identify the pattern. In this case, you can use the existing FileInputFormat, which would give your map function a file path to the input file. You can use your tools to process this file from the Map function. Cap3 sequence assembly sample[1] uses this pattern.

thanks,
Thilina

[1] http://twister4azure.codeplex.com/SourceControl/latest#Samples/Cap3/Cap3Worker/Cap3Sample.csproj


--
--
http://salsahpc.indiana.edu/twister4azure/
You received this message because you are subscribed to the Google Groups "Twister4Azure" group.
To unsubscribe from this group, send email to twister4azur...@googlegroups.com
 
---
You received this message because you are subscribed to the Google Groups "Twister4Azure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to twister4azur...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
https://www.cs.indiana.edu/~tgunarat/
http://www.linkedin.com/in/thilina
Reply all
Reply to author
Forward
0 new messages