Writing a mapped readset to fastq - only 1 file written?

Josh Sekela

unread,

Nov 13, 2022, 2:33:12 PM11/13/22

to NGLess

Hi,

I have been using NGLess in my metagenomics pipeline for several projects. As a part of my NGLess script, I write the post-processing readset to fq.gz with the following code:

input = as_reads(mapped)

write(input, ofile=sample+'/'+sample+'_reads.fq.gz')

Previously, this has resulted in 2 fq.gz files for each sample: _reads.pair.1.fq.gz, and _reads.pair.2.fq.gz.

This most recent time I have ran NGLess (v1.4.2), I have only 1 processed file for each sample: _reads.fq.gz.

I don't believe I have changed my NGLess version or anything between these runs. What do you think may be causing this discrepancy? How can I ensure that I produce 2 processed reads per sample?

My only thought is that I had one single sample which only had 1 of the raw reads. Perhaps NGLess saw this and reverted to only generating 1 read per sample, even for the samples which contained both? This seems unlikely to me as I thought all runs were being conducted independently.

Lastly; am I fine to simply proceed with this single read for generating SAM files and read counts? I will have to retool some of the commands in my pipeline to accommodate this but I don't think it will pose a problem. I would prefer to re-run NGLess and generate both reads if possible, even if it means throwing out the sample with the missing 2nd raw read - not sure how that happened.

Thanks in advance for all thoughts and help! Let me know if I can provide any additional information.

Josh

Luis Pedro Coelho

unread,

Nov 13, 2022, 11:18:58 PM11/13/22

to NGLess List

Hi Josh,

In retrospect, it's maybe "too clever", but the difference would normally be expected if some samples are single-end only and others include paired-end data.

Could it be that is what is happening in your case?

Best

Luis

Luis Pedro Coelho | Fudan University | https://luispedro.org

https://orcid.org/0000-0002-9280-7885

--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/4608b613-8507-4777-94e6-adcae547983fn%40googlegroups.com.

Josh Sekela

unread,

Nov 14, 2022, 10:51:01 AM11/14/22

to NGLess

Luis,

Yes, I think that is likely the case here. However, I tried to troubleshoot this and ran into a hangup. I re-ran one of the samples with paired-end data after deleting the ngless-locks and ngless-stats folders. I ran this on a new sample .txt listing only that single sample. However, this run still produced only 1 processed read. I don't mind excluding the single sample that was single-end only, but how can I prevent ngless from continuing to use that "mixed data" setting?

Thanks,

Josh

Luis Pedro Coelho

unread,

Nov 14, 2022, 9:15:46 PM11/14/22

to Josh Sekela, NGLess List

Unfortunately, this is not possible atm, but I agree that this would be a good setting to have and I will implement that.

Best

Luis

Luis Pedro Coelho | Fudan University | https://luispedro.org

https://orcid.org/0000-0002-9280-7885

To view this discussion on the web visit https://groups.google.com/d/msgid/ngless/4423e83f-3f09-4cf8-b5bc-af2739a6e93bn%40googlegroups.com.

Reply all

Reply to author

Forward