NGLess -> featureCounts?

49 views
Skip to first unread message

Josh Sekela

unread,
Jul 15, 2022, 8:31:00 PM7/15/22
to NGLess
Hello,

I'm attempting to transition our lab's human gut shotgun metagenomics pipeline from MOCAT to NGLess. Currently, one thing we do is concatenate the .faa files for all samples resulting from the last of our 3 steps (-rtf, -a, -gp) and mine for enzymes of interest with blastp. We then use featureCounts to process a gff from Prodigal along with the sam resulting from post-MOCAT bowtie2 processing of paired reads, yielding a table of all genes with sample-specific barcodes. This featureCounts.csv is then subsetted by the headers in our resultant enzymes.faa - all to yield the counts of all genes which code for proteins of interest, for each sample. How should I approach this?

What I have tried/would like to do for each sample:
- use NGLess to write an orf.fna and out.sam
- Run the .fna through Prodigal to generate .gff and .faa
- Run the .gff and .sam through featureCounts
- mine the concatenated .faa files
- subset each sample's .csv to those headers contained in .faa
- concatenate featureCounts.csv

I appeared to successfully create an .fna and .sam with an .ngl script. Code is attached. NGLess ran fine but featureCounts threw an error with respect to the .sam's headers. Possibly a featureCounts formatting requirement? I will compare the format of the .sam files from our MOCAT pipeline to the ngless .sam to see if I can spot any differences. Any thoughts on what may be causing such a problem? The only mention of featureCounts and NGLess I've been able to find out there is this post by Luis:

I mainly want to run with featureCounts in order to compare results to the current pipeline. From there, if you know of a better way for me to get this result within NGLess or with a different tool, please let me know!

Thanks in advance for any help.
Josh

gut-demo_mod.ngl

Luis Pedro Coelho

unread,
Jul 18, 2022, 12:44:43 PM7/18/22
to Josh Sekela, NGLess List
Dear Josh,

I have no idea why this would be a problem, but you can replace this step
 
- Run the .gff and .sam through featureCounts

with an NGLess version

ngless "1.4"
input = samfile("my-mapped.sam") # BAM also works
counts = count(input,
                      gff_file="my-gff.gff",
                      features=["gene"]) # adapt as needed
write(counts, ofile="featuresCount.csv")


Does this make sense?

HTH
Luis

Josh Sekela

unread,
Jul 19, 2022, 3:11:00 PM7/19/22
to NGLess
Luis, thanks so much for getting back to me! This is very helpful. I implemented your suggestion, although after generating my .gff through prodigal, an error results when running the script you provided:

[Mon 18-07-2022 23:59]: Script OK. Starting interpretation...
Exiting after fatal error while loading and running script
Data Error (the input data did not conform to NGLess' expectations)
Could not parse GFF line: "k141_708\tProdigal_v2.6.3\tCDS\t1451\t1576\t-0.7\t-\t0\tID=702_4;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.444;conf=55.61;score=0.98;cscore=-3.53;sscore=4.51;rscore=-0.37;uscore=1.76;tscore=1.96;"

Notably, the script proceeds to completion if I pass a truncated version of this .gff (only including the first 90 lines). The output does not exactly seem "correct", but the script does run. Any idea what may be causing this formatting issue?

I couldn't upload the full .gff to this post due to the filesize, so here is a publicly accessible link to the 2 files in my Google Drive.


Let me know if there is anything else I can do to help you help me. Thank you again!

Josh
test_short.gff

Luis Pedro Coelho

unread,
Jul 19, 2022, 7:27:40 PM7/19/22
to Josh Sekela, NGLess List
Hi again Josh,

I think we can make NGLess compute the same as featureCounts, but the defaults are not the same (the count function is very flexible http://ngless.embl.de/Functions.html#count)


Data Error (the input data did not conform to NGLess' expectations)
Could not parse GFF line: "k141_708\tProdigal_v2.6.3\tCDS\t1451\t1576\t-0.7\t-\t0\tID=702_4;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.444;conf=55.61;score=0.98;cscore=-3.53;sscore=4.51;rscore=-0.37;uscore=1.76;tscore=1.96;"

It seems that NGLess was (1) not very helpful with the error messages and (2) assumed that the score was never negative.

I fixed both these issues on git and will release a new version (1.4.2) with this fix by tomorrow

Relevant commit is:


Thanks for the report
Luis

Luis Pedro Coelho

unread,
Jul 19, 2022, 8:27:20 PM7/19/22
to NGLess List
If you use docker, you can run version 1.4.2 already:

docker run nglesstoolkit/ngless:1.4.2 ngless --version

Conda &c will be up soon.

HTH,
Luis

Luis Pedro Coelho | Fudan University | https://luispedro.org

--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.

Josh Sekela

unread,
Jul 21, 2022, 4:00:37 PM7/21/22
to NGLess
Much appreciated, I just updated our Conda install.
Reply all
Reply to author
Forward
0 new messages