Running GNUMAP with FASTQ

63 views
Skip to first unread message

Casey

unread,
Nov 11, 2013, 4:26:44 PM11/11/13
to gnumap...@googlegroups.com
I'm running GNUMAP with FASTQ file of 900k 100bp reads as the input, i.e.:
mpiexec -n 5 -f machinesfile ./gnumap --MPI_largemem --fast -g hg19.fa -o output -a .9 -p -v 1 --illumina reads.fq

It's very strange since it said no match found:
#Finished.
# Total Time: 1483.28 seconds.
# Found 900027 sequences.
# Sequences matched: 0
# Sequences not matched: 900027
# Output written to output_0.sam

Did I do something wrong with the command?

Please advice. Thanks a lot.
Casey

Nathan Clement

unread,
Nov 11, 2013, 6:10:03 PM11/11/13
to gnumap...@googlegroups.com
I can't really tell you anything without knowing a lot about your data. One thing you could try as a sanity check is decrease the alignment score flag to its lowest: -a 0. If that still doesn't turn anything up, there's something else wrong.


--
 
---
You received this message because you are subscribed to the Google Groups "gnumap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gnumap-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Casey

unread,
Nov 11, 2013, 7:49:10 PM11/11/13
to gnumap...@googlegroups.com
Maybe this is the reason: the FASTQ file that I'm using is actually a simulated file, generated by wgsim. wgsim cuts out the sequences from the FASTA file and add artificial quality values (all values are I). Now that I realize GNUMAP uses only the quality values (am I right?) and ignore the reads/sequences then all of the reads must be meaningless to the program.

Please correct me if I'm wrong.

Casey

Nathan Clement

unread,
Nov 11, 2013, 8:07:05 PM11/11/13
to gnumap...@googlegroups.com
You are correct. If the quality scores are very poor, GNUMAP cannot align anything. If you send the first 10 or so lines of your file, I can look at it and tell you if there is anything obvious.


As a side, I don't think you want to include the --illumina flag. Illumina used to have a different kind of fastq output (-64 instead of -33), but they don't do that anymore.

Nathan

Casey

unread,
Nov 11, 2013, 8:18:15 PM11/11/13
to gnumap...@googlegroups.com
Below is sample from the input, as you can see, the quality values are constant (as it is artificial):

@0_2336334_2336433_0:1:0_0:1:0_0/1
GAAACATCAAATGGAGGCGGGAAATAGGCTGGGGCCGAGCTGAGGGGCTGAACACAGCAGTGACCGTGGGTCAGCAGGGCGCCTGCCCAGCAGGCCCCCC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@0_7158910_7159009_0:1:0_0:1:0_1/1
GCACCAGTCCAGGGCTCATGTCCCTGGCACAAGAGCTGAGGGTTGGCCTCCATCCCACCCCTCCTCACTTCTTGGGGCCAGGAGTGCAAGTTCCCGTTGT
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@0_57967035_57967134_0:0:0_0:0:0_2/1
TCTATTATTTCTAGTGCCTCTATTGGGTGTTTAGGTGGGTTCTCCTGACTTAACCTGGGCTCACTCAGGCAGGTGCTTTCAGCTGGAGGCTCAGCTGAGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@0_114699484_114699583_0:0:0_0:0:0_3/1
AGCACTCTAGGATTTTACATGAGGGAGAAAAGAGGAGGCTGGGGCAAAGGGAGGAGGGAGGCAGCCTTTCCTGGCTCTGAAGCTGAGGGTGGTTTTACAG
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@0_43326360_43326459_0:0:0_0:0:0_4/1
GTGCTCAGTGGCCCCCAGGCATTCAAATTATGTTGGCTCCCCATTAGCACCCCAAATTAGGTGACAAATACCAGTCCCTTAGGCAGTTCTCCCAAAAGGC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
@0_10552185_10552284_0:3:0_0:3:0_5/1
CTTATAATTTAAAAGGTACTAAAGGGTTTATGATGAAAGAGACTTATTTGGCACTGTTTACTGTCTGAAGTTTTTCCCTCTGGTAAATACTAGCGATTCA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

I'm running with some real data right now, but it's much bigger so I guess I will let you know when I wake up tomorrow :P

Casey

Casey

unread,
Nov 11, 2013, 8:22:27 PM11/11/13
to gnumap...@googlegroups.com
And btw, I ran with the simulated input & "-a 0" and GNUMAP didn't align anything too. just FYI


On Tuesday, November 12, 2013 2:07:05 AM UTC+1, Nathan wrote:

Nathan Clement

unread,
Nov 12, 2013, 6:14:14 PM11/12/13
to gnumap...@googlegroups.com
I ran your data against the human chr21, and with -a 0 it returned matches for all 6 sequences. You might need to check the format of your input genome.

Casey

unread,
Nov 13, 2013, 10:32:10 AM11/13/13
to gnumap...@googlegroups.com
I removed the --fast flags then it returned matches
Reply all
Reply to author
Forward
0 new messages