modifying the FASTQ>FASTA script

237 views
Skip to first unread message

Jeswin

unread,
Mar 26, 2015, 12:46:19 PM3/26/15
to diy...@googlegroups.com
Hi all,
Once again, thanks for helping me with my python issue. Let me just
point out that my programming skills are very low (script kiddie at
best) and I don't have much time nowadays to sit down and really learn
python. I know the basics.

Anyway, my boss wanted to convert FASTQ to FASTA. The only way is thru
a script and I settled with python. I found the script online that
works:
================================================
#!/usr/bin/env python

#Takes a single FASTQ file and splits to .fasta + .qual files
import sys
from Bio import SeqIO

if len(sys.argv) == 1:
print "Please specify a single .fastq file to convert."
sys.exit()

filetoload = sys.argv[1]
basename = filetoload

#Chop the extension to get names for output files
if basename.find(".") != -1:
basename = '.'.join(basename.split(".")[:-1])

SeqIO.convert(filetoload, "fastq", basename + ".fasta", "fasta")
SeqIO.convert(filetoload, "fastq", basename + ".qual", "qual")
================================================

I'm thinking about adding 2 features to it for the convenience for my
colleagues. I know you all don't like leading people step by step, so
I'm fine if you all can point me in the right direction (simplest and
fastest solutions).

[1] I would like to add something that shows progress {bar, text,
etc.} of the conversion so that people on the computer know it's
working and not frozen. Maybe read the size of the output file every
30 seconds (first the FASTA file, then a QUAL file)?

[2] I am not sure if the script can process more than one file
(sequentially) in the argument. I just ran "fastq_to_fasta.py
file1.fastq". I am wondering if I can do: "fastq_to_fasta.py
file1.fastq file2.fastq"? Basically, I am not sure if python can do
that?

BTW, I got the script from:
http://nebc.nerc.ac.uk/nebc_website_frozen/nebc.nerc.ac.uk//tools/code-corner/scripts/sequence-formatting-and-other-text-manipulation

Thanks

--
In necessariis unitas, in dubiis libertas, in omnibus caritas.
-Marco Antonio Dominis

Gavin Scott

unread,
Mar 26, 2015, 1:37:31 PM3/26/15
to diy...@googlegroups.com
Not to distract from your Python explorations, but at some point you
might find it interesting to explore what you can do with the free
Galaxy bioinformatics workflow service:

http://galaxyproject.org/

Going through their tutorial is worthwhile and enlightening.

G.
> --
> -- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
> Learn more at www.diybio.org
> ---
> You received this message because you are subscribed to the Google Groups "DIYbio" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to diybio+un...@googlegroups.com.
> To post to this group, send email to diy...@googlegroups.com.
> Visit this group at http://groups.google.com/group/diybio.
> To view this discussion on the web visit https://groups.google.com/d/msgid/diybio/CAAhF0RKiRhWGYQ%2B%2Bu06xZ3mqTf9eF4UD0Oh2w2aOd0cOXQAWDw%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Jeswin

unread,
Mar 28, 2015, 2:07:07 PM3/28/15
to diy...@googlegroups.com
I tried to add some kind of indicator but, they always make my the
conversion script 10x slow. So I just put simple print warnings, cause
that's the best I could do. One thing I have been told is to use
os.path because it is an acutal basename function. Also added a prompt
for user to check there is enough room on HDD, cause it seems they
don't delete or move files to archive. I compared the outputs of the
original with my modified script and I didn't find any differences.

Can someone check my modified script to make sure I didn't miss
something? Anything I can do better? This is the best I can do with my
abilities. Its for use with python 2.7 (thru Anaconda on windows).
Right now, I tested it on my linux machine; hopefully it works on
windows when I get back to work next week.

=======================CODE============================
#!/usr/bin/env python

#Takes a single FASTQ file and splits to .fasta + .qual files
import sys
import os.path
from Bio import SeqIO

#Disk space Checking
print "Warning! Please check that there is at least 50GB of Free Disk
Space per file conversion."
DiskCheck = raw_input('Is there enough space? (yes or no) ')
if DiskCheck == 'yes':

if len(sys.argv) == 1:
print "Please specify a single .fastq file to convert."
sys.exit()

filetoload = sys.argv[1]
basename = filetoload

#BETTER WAY: Chop the extension to get names for output files
basename, extension = os.path.splitext(os.path.basename(filetoload))

print "\nWorking on", basename
print "Don't close this window."

SeqIO.convert(filetoload, "fastq", basename + ".fasta", "fasta")

#QUAL file creation disabled
#SeqIO.convert(filetoload, "fastq", basename + ".qual", "qual")

print "\nDone converting", basename, "to FASTA format."

elif DiskCheck == 'no':
print "\nMake room on disk, then run script again"
sys.exit()
=======================CODE============================
> To view this discussion on the web visit https://groups.google.com/d/msgid/diybio/CA%2BcsFZiNfGqANizvXbzBcAbjx13H8FoM01KP1uQVV0WLO%2Bp9sA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.



Reply all
Reply to author
Forward
0 new messages