I'm attempting to transition our lab's human gut shotgun metagenomics pipeline from MOCAT to NGLess. Currently, one thing we do is concatenate the .faa files for all samples resulting from the last of our 3 steps (-rtf, -a, -gp) and mine for enzymes of interest with blastp. We then use featureCounts to process a gff from Prodigal along with the sam resulting from post-MOCAT bowtie2 processing of paired reads, yielding a table of all genes with sample-specific barcodes. This featureCounts.csv is then subsetted by the headers in our resultant enzymes.faa - all to yield the counts of all genes which code for proteins of interest, for each sample. How should I approach this?
What I have tried/would like to do for each sample:
- use NGLess to write an orf.fna and out.sam
- Run the .fna through Prodigal to generate .gff and .faa
- Run the .gff and .sam through featureCounts
- mine the concatenated .faa files
- subset each sample's .csv to those headers contained in .faa
- concatenate featureCounts.csv
I appeared to successfully create an .fna and .sam with an .ngl script. Code is attached. NGLess ran fine but featureCounts threw an error with respect to the .sam's headers. Possibly a featureCounts formatting requirement? I will compare the format of the .sam files from our MOCAT pipeline to the ngless .sam to see if I can spot any differences. Any thoughts on what may be causing such a problem? The only mention of featureCounts and NGLess I've been able to find out there is this post by Luis:
I mainly want to run with featureCounts in order to compare results to the current pipeline. From there, if you know of a better way for me to get this result within NGLess or with a different tool, please let me know!
Thanks in advance for any help.