Reading fasta files (rather than fastq): "Number of lines in FastQ file is not multiple of 4! EOF found"

19 views
Skip to first unread message

Vadim Puller

unread,
Mar 10, 2021, 3:53:08 AM3/10/21
to NGLess
Dear NGLess team,

We have been using NGLess for analyzing simulated data. The data come from different simulators: some in fastq and other in fasta format (more precisely as .fq.gz and .fa.gz files), and we assumed by default that NGLess functions `fastq` and `paired` are capable of recognizing the file format. It is only with the most recent batch of fasta files that we run into an error message "Number of lines in FastQ file is not multiple of 4! EOF found".

I have seen a related discussion on your github page https://github.com/ngless-toolkit/ngless/issues/115?_pjax=%23js-repo-pjax-container, and the issue is easily remedied by adding extra lines to a file or converting it to a fastq format (while adding fake quality scores). We however would like to be on the safe side and ask for a few clarifications:

1. Are functions `fastq` and `paired` capable of recognizing fasta format? (The error message seems to give a clear answer, but the previous batches of data in fasta format were processed without any error messages, despite not containing some elements of fastq format, such as + lines with the quality scores.)
2. If these functions treat fasta as if it were fastq, would they still really treat every record or only every second one? (since fastq format has 4 lines per record, while fasta has only 2) The results obtained with our previous fasta files seem sensible, but we would appreciate a definitive statement from you.

Sincerely,
Vadim.

Luis Pedro Coelho

unread,
Mar 11, 2021, 11:10:35 PM3/11/21
to Vadim Puller, NGLess List
Dear Vadim,

In principle, yes, the functions should only read fastq files. I would expect that most of the uses would quickly trigger an error downstream. Frankly, I am more surprised that it worked than anything else. What exact downstream steps were you taking?

Best
Luis

Luis Pedro Coelho | Fudan University | http://luispedro.org
--
You received this message because you are subscribed to the Google Groups "NGLess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ngless+un...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages