Can I use assembled contigs as input to humann2?

coral9...@gmail.com

unread,

Sep 24, 2018, 2:06:30 PM9/24/18

to HUMAnN Users

My question is just as stated in the subject. The background for my question is the following. I have metagenomic shotgun sequences, i.e. whole genome sequencing data. For the samples I have, they are from two backgrounds, say type A and type B. I did pool assembly using all the samples and run some binning algorithms to merge the contigs into several larger bins. I also got the relative abundance for these bins for each sample. I have done some tests on the relative abundance of the bins and identified some bins such that the relative abundance is significantly different comparing the samples in group A and B. Now, my collaborators want me to do function annotation on the bins that showed differential abundance. Thus, I wonder the question if I could directly use contigs or bins as input for humann2. An obvious reason for not doing this is that the contigs lost the copy number information, which is actually contained in the relative abundance I got. So, I wonder if any comments you have about my problem.

Eric Franzosa

unread,

Sep 24, 2018, 2:25:07 PM9/24/18

to humann...@googlegroups.com

The short answer is "no". :-)

If you were to annotate your assembled samples against HUMAnN2's UniRef90/50 databases (or to KOs or ECs by other means), you could in principle use HUMAnN2 to reconstruct corresponding pathways. We do this occasionally to compare mapped vs. assembled samples. However, there is no mode in HUMAnN2 that will take assembled contigs as input.

Thanks,

Eric

coral9...@gmail.com

unread,

Sep 24, 2018, 2:32:40 PM9/24/18

to HUMAnN Users

Hi Eric,

Thank you very much for your fast response! I sort of expected this already, but your answer was still very interesting to me. Could you please expand your answer a bit more? What do you mean by "mapped samples" (because you said "compare mapped vs. assembled samples")? Also, in particular, how do I "use HUMAnN2 to reconstruct corresponding pathways". I am sorry if this is too basic of a question. I am new to humann2. What I have done is that I used gene prediction and annotation tools like prokka to annotate the genes. I have not done so yet, but I know that could use Minpath to find the pathways using these predicted genes. If I were to "use HUMAnN2 to reconstruct corresponding pathways", will this be any different compared to the prokka-minpath analysis I just described?

coral9...@gmail.com

unread,

Sep 24, 2018, 2:38:26 PM9/24/18

to HUMAnN Users

On Monday, September 24, 2018 at 2:25:07 PM UTC-4, Eric Franzosa wrote:

I also have a side question. I have set "--threads 20" in calling the humann2, as following,
humann2 --nucleotide-database ~/raw_data/humann2_db_ChocoPhlAn/chocophlan/ \
--protein-database ~/raw_data/humann2_db_ChocoPhlAn/uniref/ \
--input ../60-trimcat.fq --output 60-full \
--metaphlan ~/softwares/metaphlan2 \
--threads 20 >> 60-full-error.log 2>&1

but it seems that bowtie2 is still running on a single thread. I wonder if I am using this in an incorrect way.

Eric Franzosa

unread,

Sep 24, 2018, 2:45:27 PM9/24/18

to humann...@googlegroups.com

To your first questions:

* By "mapped" I mean "directly mapping reads to reference sequences" (i.e. what HUMAnN2 is designed for). Assembling reads into contigs is a complementary approach to mapping and is particularly advantageous for identifying new sequences.

* HUMAnN2 can be used as a generic engine for pathway reconstruction. If you give it a file of reaction abundances (e.g. from mapping), it will quantify the pathways in which those reactions participate. If you give it a file of "dummy" abundances (e.g. "RXN123<tab>1") to indicate reaction presence/absence, it will quantify pathway presence/absence.

To your second question:

* I believe certain versions of the Bowtie2 indexing program will only ever use one thread. If you're only seeing one thread during mapping (bowtie2-align) then there may be something in your setup preventing multithreading.

Thanks,

Eric

coral9...@gmail.com

unread,

Sep 24, 2018, 3:04:55 PM9/24/18

to HUMAnN Users

Hi Eric,

Thank you for the detailed answers!
Just to clarify, by "reaction abundances ", you mean that I have a file of P rows and N columns with with the (p,n)-th element stating the abundance (say, cpm) of the p-th annotated gene in n-th sample?
If the idea is indeed this, could you please point to me a page or tutorial for me to construct and extract abundance of the pathways? Thanks!

Xinlian

heathe...@gmail.com

unread,

Oct 8, 2018, 1:49:23 AM10/8/18

to HUMAnN Users

I am trying to do similar things, meaning, to use HUMAnN2, actually more the database used by HUMAnN2 for pathway reconstruction. The problem I run into is that I have most of my CDS (predicted) linked to a UniRef ID, however, only 20% of those UniRef IDs are linked to a reaction (through metacyc_reactions_level4ec_only.uniref). Is this normal?

Thanks ahead,
Heather

Eric Franzosa

unread,

Oct 10, 2018, 1:11:14 PM10/10/18

to humann...@googlegroups.com

That's consistent with my experience. You might also have success trying MetaCyc's Pathway Tools for annotation, since annotation of assemblies more closely resembles genome annotation than profiling shotgun sequencing reads. That would also avoid the need to go through UniRefs to get to reaction annotations (in case there is information loss in that step).