ERROR in reading the diamond reference database when runing humann2 in parallel

yun li

unread,

May 12, 2017, 3:45:48 PM5/12/17

to HUMAnN Users

Hi all,
I got a problem in runing diamond in parallel. Here is the code I am using for the parallel running of humann2

N=5
i=0

(
for filePrefix in ${s[*]:0:10}
    do
        ((i=i%N)); ((i++==0)) && wait

        (
        cat ${inputPath}${filePrefix}_pe_1.fastq ${inputPath}${filePrefix}_pe_2.fast
q > ${mergedPath}${filePrefix}_merged.fastq
        humann2 --input ${mergedPath}${filePrefix}_merged.fastq --output ${outputPat
h} --taxonomic-profile path/to/metaphlan/result/${filePrefix}.t
xt --remove-temp-output && rm ${mergedPath}${filePrefix}_merged.fastq
        ) &

    done
)

And here is the error I got
....
Error message returned from diamond :
diamond v0.8.38.100 | by Benjamin Buchfink <buch...@gmail.com>
Check http://github.com/bbuchfink/diamond for updates.

#CPU threads: 1
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
#Target sequences to report alignments for: 20
Temporary directory: /path/to/temporary/file/tmp9fKTQ2
Opening the database... [161.025s]
Opening the input file... [0.080548s]
Opening the output file... [0.014562s]
Loading query sequences... [118.413s]
Running complexity filter... [701.401s]
Building query histograms... [19.2526s]
Allocating buffers... [0.000668s]
Loading reference sequences... [800.433s]
Error: Error reading file /path/to/refdb/uniref90_ec_1.1/uniref90.ec_filtered.1.1.dmnd
...

I actually used the same code to run the program half year ago, and everything worked well. However, After I updated the programs and rerun the program I got this error. But if I run the program for one sample only, there is no problem at all.

Thanks,
Yun

Eric Franzosa

unread,

May 13, 2017, 6:44:39 PM5/13/17

to humann...@googlegroups.com

Looks like you're trying to read the protein database from:

/path/to/refdb/uniref90_ec_1.1/uniref90.ec_filtered.1.1.dmnd

Which doesn't look like a real path? You can manually specify the actual path to the protein db in your humann2 call or set it permanently using humann2_config.

Thanks,

Eric

yun li

unread,

May 15, 2017, 10:57:59 AM5/15/17

to HUMAnN Users

It is the real path. I just masked the really path name before posting the email. The problem is, when I run the humann2 one-by-one there is no problem at all. It only happens when I run it in parallel.

Thanks,
Yun

Eric Franzosa

unread,

May 16, 2017, 4:43:00 PM5/16/17

to humann...@googlegroups.com

Just to confirm, if you run on a single sample _today_ you don't get the error, but you do when multiple samples are run in a loop? What about running the loop with only one sample (just to confirm that the process is entirely the same)?

We routinely run multiple samples against the same database in parallel, so I'd be really surprised if this was some sort of "collision" issue.

Thanks,

Eric

yun li

unread,

May 19, 2017, 1:56:21 PM5/19/17

to HUMAnN Users

So below is my test code. If I assign N=1, no problem at all. However, if I assign N=5, which run 5 humann2 instance at a time, I will get error " Error reading file /path/to/refdb/uniref90_ec_1.1/uniref90.ec_filtered.1.1.dmnd". Very weired!

Eric Franzosa

unread,

May 19, 2017, 4:34:01 PM5/19/17

to humann...@googlegroups.com, Lauren McIver

Hi Yun,

Lauren (cc'ed) and I just chatted about your situation, and it seems to be an issue with your workflow/environment (e.g. an inadvertent race condition) and not HUMAnN2 per se. If you'd like to reply to me and Lauren individually, we can try to help sort out the issue? Any additional details you can provide in that reply (log files, how you execute your script, etc.) would be helpful.