hadoop blast on Mac

15 views
Skip to first unread message

Jianwu Wang

unread,
Jul 28, 2010, 5:42:15 PM7/28/10
to VSCSE Big Data for Science 2010
Hi there,

I've followed the tutorial at on hadoop blast
(http://salsahpc.indiana.edu/tutorial/hadoop.html). It works well on a
local cluster. It's great.

Now I'm trying to try it on my Mac machine since it looks works
also on Mac (http://salsahpc.indiana.edu/tutorial/hadoopblastex1.html).
The first thing I noticed that the one extra step is needed for 'Install
BLAST DB'. I have to decompress the tar.gz files gotten by command
'$BLAST_HOME/bin/update_blastdb.pl nr'. Otherwise, the blastx command
won't work.

My question is about running Hadoop-blast on my Mac machine. The
steps on http://salsahpc.indiana.edu/tutorial/hadoopblastex3.html are
for linux machine. Such as BlastProgramAndDB.tar.gz won't work for Mac.
Does anyone know what I should do for '2. Prepare for Hadoop-Blast' and
'3. Execute Hadoop-Blast' on Mac? I'm running Hadoop on
Pseudo-Distributed mode.

--

Best wishes

Sincerely yours

Jianwu Wang
wangj...@gmail.com
http://users.sdsc.edu/~jianwu/

Assistant Project Scientist
Scientific Workflow Automation Technologies (SWAT) Laboratory
San Diego Supercomputer Center
University of California, San Diego
San Diego, CA, U.S.A.

Stephen TAK-LON WU

unread,
Jul 28, 2010, 6:16:30 PM7/28/10
to vscse-big-data-...@googlegroups.com
Hi Jianwu,

within BlastProgramAndDB.tar.gz, there is a folder "db" which has a optimized Database use for the tutorial.

So, according to your question, you need to zip your own "BlastProgramAndDB.tar.gz" under Mac environment.

Assuming that :

  • $BLAST_MAC_HOME - directory where you sorted your Blast Program in MAC environment.
    • inside this directory you should have the similar structure as Blast Linux version as

Create a "BlastProgramAndDB_MAC.tar.gz"in MAC OS:

cd ~
mkdir Blast_Linux
cd Blast_Linux
wget http://salsahpc.indiana.edu/tutorial/apps/BlastProgramAndDB.tar.gz
tar -zxvf BlastProgramAndDB.tar.gz .
mv ~/Blast_Linux/db $BLAST_MAC_HOME
cd ~
tar -zcvf BlastProgramAndDB_MAC.tar.gz $BLAST_MAC_HOME

The command lines above are generating a new Blast archive of MAC version.

--------------------------

Then, line 4 Section 2 should change to be

./hadoop fs -copyFromLocal ~/BlastProgramAndDB_MAC.tar.gz BlastProgramAndDB.tar.gz

the above line copy the new Blast archive to HDFS.

After that, all the step will be the same.

Best,
Stephen
--
Stephen Wu
Pervasive Technology Institute
Indiana University, Bloomington

Stephen TAK-LON WU

unread,
Jul 28, 2010, 6:41:59 PM7/28/10
to vscse-big-data-...@googlegroups.com
Jianwu,

Correction as below:

On Wed, Jul 28, 2010 at 6:16 PM, Stephen TAK-LON WU <tak...@indiana.edu> wrote:
Hi Jianwu,

within BlastProgramAndDB.tar.gz, there is a folder "db" which has a optimized Database use for the tutorial.

So, according to your question, you need to zip your own "BlastProgramAndDB.tar.gz" under Mac environment.

Assuming that :

  • $BLAST_MAC_HOME - directory where you sorted your Blast Program in MAC environment.
    • inside this directory you should have the similar structure as Blast Linux version as

Create a "BlastProgramAndDB_MAC.tar.gz"in MAC OS:

cd ~
mkdir Blast_Linux
cd Blast_Linux
wget http://salsahpc.indiana.edu/tutorial/apps/BlastProgramAndDB.tar.gz
tar -zxvf BlastProgramAndDB.tar.gz .
mv ~/Blast_Linux/db $BLAST_MAC_HOME

Change this two lines to  
cd ~
tar -zcvf BlastProgramAndDB_MAC.tar.gz $BLAST_MAC_HOME

cd $BLAST_MAC_HOME
tar -zcvf BlastProgramAndDB_MAC.tar.gz *

As the Hadoop-Blast program needs this structure style.

The command lines above are generating a new Blast archive of MAC version.

--------------------------

Then, line 4 Section 2 should change to be

./hadoop fs -copyFromLocal ~/BlastProgramAndDB_MAC.tar.gz BlastProgramAndDB.tar.gz

./hadoop fs -copyFromLocal $BLAST_MAC_HOME/BlastProgramAndDB_MAC.tar.gz BlastProgramAndDB.tar.gz 

Jianwu Wang

unread,
Jul 28, 2010, 7:06:29 PM7/28/10
to vscse-big-data-...@googlegroups.com, Stephen TAK-LON WU
Hi Stephen,

    Thanks for your reply and correction. I've been able to run it on Mac correctly. :)

Best wishes

Sincerely yours

Jianwu Wang
wangj...@gmail.com
http://users.sdsc.edu/~jianwu/

Assistant Project Scientist
Scientific Workflow Automation Technologies (SWAT) Laboratory
San Diego Supercomputer Center 
University of California, San Diego
San Diego, CA, U.S.A. 

Stephen TAK-LON WU

unread,
Jul 28, 2010, 7:08:26 PM7/28/10
to Jianwu Wang, vscse-big-data-...@googlegroups.com
Cool, good to hear that.
 
Stephen

Reply all
Reply to author
Forward
0 new messages