input sequence error

13 views
Skip to first unread message

changjin

unread,
Mar 4, 2011, 8:58:24 PM3/4/11
to gnumap-users
hi,

I have a sample set of bisulfite-treated sequence reads. Gnumap
reported "there is an error in the input sequence read file". I am
wondering if you can check the sequence reader call in gnumap. Or,
there is something wrong in input reads.


input read file
---------------------
@HWI-ST141_0318:7:4:18131:146671#0/1
AAAGTAGGGGAAGAAATTTTTGCGGTAGAGGGGATTTAGAAAAATGTTGTAGAAAAAAGCGAGGTATTTATAATATTTTTTTCGTTATAAGTTTTGTGTTG
+HWI-ST141_0318:7:4:18131:146671#0/1
fffedggeeggggggggggggggggeggdgfffcegggggggegggeggegggegfdcfgedfe
\eeeeegggggggggfggggggeggdggdggeeeefa
@HWI-ST141_0318:7:2:19077:163792#0/1
TATATTTGTATAGTAGAAGAATGTATAAATTAGGGTTGTGGGGGAGGCGTAAAGAAGGAAGATTTTGGTTGCGGGATGTAGGGATGTTGGGACGGAATTTG
+HWI-ST141_0318:7:2:19077:163792#0/1
gggggggggegegefecfgeeggdegcebggdgggaggaeegecbcecgcd\`ec
\ddadgddegdad_abdddc^cba_de_[aaab_dcaZdSV\bb_a
@HWI-ST141_0318:7:1:4464:167283#0/1
TGATTAGTAGCGTTTATGTTGTAGAAAGGTTTAGAAGATATTAATGGAAAGGGTTAATTGAATTAGTTAATTAGGAAGTTTATTAGTGAATTTGGAGAGTA
+HWI-ST141_0318:7:1:4464:167283#0/1
gggggggggggggggggfggfgdgefggfcgggggggggggggeggebegeeecedddedd`ggbe_ecdged`ecce`egagdcecdcceee[eaV_^[Q
@HWI-ST141_0318:7:3:2081:105369#0/1
GTTGTTAGTGTGATAGGATATATTTGTATAGTAGAAGAATGTATAAATTAGGGTTGTGGGGGAGGCGTAAAGAAGGAAGATTTTGGTTGCGGGATGTAGGG
+HWI-ST141_0318:7:3:2081:105369#0/1
gggggfdfbfdfdedffdadccfffdbcdbfcdeadecdegccea`adYadedZa``c_c\ZVUZVa
\aa]bZT^]OV]X`Y``BBBBBBBBBBBBBBBBB
@HWI-ST141_0318:7:2:16003:26480#0/1
TTTAAGATAATGAGATATTAGTGTAAAGGTTTTGTGGTAGGAAGGTTTGGTATGTATAAGGAAGAGAAAAGAGTTAAATATAGTGGGTTAGGGGATAGGAG
+HWI-ST141_0318:7:2:16003:26480#0/1
gggdfgfgcgggagdggggafdfece_ecegfggggggafd
\bbd_cccd]Y`c]]bb[^aSYa_a[``Z^U^_bU[\X\`BBBBBBBBBBBBBBBBBBBB
@HWI-ST141_0318:7:4:8019:53385#0/1
ATAGAGATATTTTTTAAAAAAAAAAAAAGTGGTTTTTTTGGTTTGAGATAGGGGAGGAAGGGATGAATAGGGAGAAGGATTTTTAGGGTAGTGAAAGAATT
+HWI-ST141_0318:7:4:8019:53385#0/1
eede\d_dddgggggfgfcggggcedddZJ`_X^^eeccb_^`ccTaJ[X`c`cMd_Xcc``I
\MXXZHNVKVOFOLJRSddacadcS^BBBBBBBBBBBB
@HWI-ST141_0318:7:1:6544:28867#0/1
TATGGGAAGAGAAGTGGATGAAGGAAGAAAATTAGAGGAGAATTTAGTGTAGATAGAGAGAATCGATGAGGGGTTTTTTGAGTTATGAAAATGTGAAAGAT
+HWI-ST141_0318:7:1:6544:28867#0/1
ffffffcffdccac`eeZeeVeeeddbdd_c^eaedecaeaacceadbcYeaddZ]V]VaX`bZba
\`_]^]DJZVVXULVdad[``\`X`b`]b[bZcbW
@HWI-ST141_0318:7:5:14257:28527#0/1
TTAGAGGTTTTTATTAGTGTGGTATGTATAGGAAACATAAGTAGTTGGGTATGGTTGGAGTATAGTATATTGGGGGAAGTGGTAGGAGATGGGATTTGAGT
+HWI-ST141_0318:7:5:14257:28527#0/1
ggggdggcgggggggegegeffceegeeeggggggg[cgdeaaeadccb[addd^ddd
\dS_^U`[_a[bdadbddNbbY\aM^BBBBBBBBBBBBBBBBB
-------------------




input script
----------
Command Line Arguments: /fslgroup/fslg_genome/software/gnumap_MPI/bin/
gnumap -g /fslhome/hongcj92/fsl_groups/fslg_maker/compute/bisulfite/
genome/chr20.fa -o /fslhome/h
ongcj92/fsl_groups/fslg_maker/compute/bisulfite/seqfiles/7795X1/gnumap/
test/debug_seq -p -a .9 -c 1 /fslhome/hongcj92/fsl_groups/fslg_maker/
compute/bisulfite/seqfiles/7
259X1/debug_seq.txt -m 14 -j 14 -S /fslhome/hongcj92/fsl_groups/
fslg_maker/compute/bisulfite/scripts/pwm.CtoT -b -0 --illumina
------------------


log file
----------------------
This is GNUMAP, Version 2.2.4, for public and private use.
[0/1] gen_size=0, my_start=0, my_end=0
Size of genome: 63025520
[0/-] Converting to Vector...
[0/-] Trying to create hash of size 268435456
Trying to create entire array of size 475975560 (59496945 total
positions of size 8)
[0/-] Finished create hash.
[0/-] Stats: Total hashes is 59496945, Longest hash is 220279,
shortest is 1, and average is 0.221643
[0/-] Trying to create a new genome with a size of
63025520...Success!
[0/-] Trying to malloc 7878190 elements for positions
array...Success!
[0/-] Finished Vector Conversion
Matching 1 file(s):
Bad file is /fslhome/hongcj92/fsl_groups/fslg_maker/compute/bisulfite/
seqfiles/7259X1/debug_seq.txt
ERROR(inc/SeqManager.h:313):
!!src/SeqReader.cpp:182 Invalid File
In file /fslhome/hongcj92/fsl_groups/fslg_maker/compute/bisulfite/
seqfiles/7259X1/debug_seq.txt
Reads per processor: 1024

#Finished.
# Total Time: 28.9787 seconds.
# Found 0 sequences.
# Sequences matched: 0
# Sequences not matched: 0
# Output written to /fslhome/hongcj92/fsl_groups/fslg_maker/compute/
bisulfite/seqfiles/7795X1/gnumap/test/debug_seq.sam
---------------------

Nathan Clement

unread,
Mar 5, 2011, 1:02:57 AM3/5/11
to gnumap...@googlegroups.com
That error comes when the program couldn't open your sequence file. I'm guessing it's an issue on your end. Does the file have read permissions? Another alternative might be that the file doesn't exist. Is it typed correctly? Unfortunately, the input script you've given has line breaks that I'm guessing aren't in the actual program, so I can't look at it any further.

Let me know if these suggestions don't help.

Nathan

W. Evan Johnson

unread,
Mar 5, 2011, 12:21:23 PM3/5/11
to gnumap...@googlegroups.com, Spencer Clement
Nathan,

The script was written by Spencer--Spencer will you let Nathan know where the .sh file is?

Evan

changjin hong

unread,
Mar 5, 2011, 2:07:24 PM3/5/11
to gnumap...@googlegroups.com
Nathan,

Yes, I found the input file was not in the directory in the command. I
made a mistake. It works now. Also, it will be great if gnumap reports
"input sequence file does not exist in the directory or unknown
directory or file" for this case, Thanks, Nathan

Actually, the sample reads were captured from a big seq read file,
because a log file from the original experiment reports "Found error
...atteming to resolve...". Refer to [a1]
For a run command, refer to [a2]
I am wondering if this is related to some buffering option or do you
have any clue?

[a2]===============
Command Line Arguments:
/fslgroup/fslg_genome/software/gnumap_MPI/bin/gnumap -g
/fslhome/hongcj92/fsl_groups/fslg_maker/compute/bisulfite/genome/chr20.fa -o
/fslhome/hongcj92/fsl_groups/fslg_maker/compute/bisulfite/seqfiles/7795X1/gnumap/1_7_1

-p -a .9 -c 1 /fslhome/hongcj92

/fsl_groups/fslg_maker/compute/bisulfite/seqfiles/7795X1/7795X1_101213_SN141_0318_B80EE9ABXX_7_1.txt
-m 14 -j 25 -S /fslhome/hongcj92
/fsl_groups/fslg_maker/compute/bisulfite/scripts/pwm.CtoT -b -0 --illumina

[a1]===============
--Found error...atteming to resolve...
--ERROR at sequence
GGATGAAGGAAGAAAATTAGAGGAGAATTTAGTGTAGATAGAGAGAATCGATGAGGGGTTTTTTGAGTTATGAA
AATGTGAAAGAT (name[s] formatted incorrectly)
Trying to recover...
Reads per processor: 1024
--Found error...atteming to resolve...
--ERROR at sequence TGGGGGAAGTGGTAGGAGATGGGATTTGAGT (name[s] formatted
incorrectly)
Trying to recover...
Reads per processor: 1024
--Found error...atteming to resolve...
--ERROR at sequence 8:7:2:16003:26480#0/1 (name[s] formatted incorrectly)
Trying to recover...
Reads per processor: 1024
--Found error...atteming to resolve...
--ERROR at sequence
ggdcce_dbdda^cddddadbZebccebbcc`_]cadaX_aaS^ZaTa\X\aaS_]`bVZ_[ZUTTZV`]_`\a
daac^Z_cBBBBBBBBBBB (name[s] formatted incorrectly)
Trying to recover...
Reads per processor: 1024
--Found error...atteming to resolve...
--ERROR at sequence 141_0318:7:4:18131:146671#0/1 (name[s] formatted
incorrectly)
Trying to recover...
Reads per processor: 1024
===============

Nathan Clement

unread,
Mar 5, 2011, 3:49:53 PM3/5/11
to gnumap...@googlegroups.com
The error you're seeing with the "Found error...attempting to resolve..." occurs when your sequence file is not formatted correctly. My guess is that you either have missing lines or something of the wrong format. Can you tell me what format it is in? Perhaps you could attach a file containing a sample of sequences.

For your information, the specifications for fastq format can be found here: http://maq.sourceforge.net/fastq.shtml

Nathan

W. Evan Johnson

unread,
Mar 5, 2011, 4:15:00 PM3/5/11
to gnumap...@googlegroups.com
Format should be illumina fastq (.qseq).

Nathan Clement

unread,
Mar 5, 2011, 4:55:48 PM3/5/11
to gnumap...@googlegroups.com
Unfortunately, GNUMAP doesn't support qseq (different from Illumina fastq), which is the cause of the error.

Is qseq the standard for Illumina data? If so, I'll implement it in GNUMAP.

In the meantime, here's a link to a Python script that can convert between the two. It's just a function and looks like it could be done much simpler, but I'll leave that to someone better with Python.


Nathan

W. Evan Johnson

unread,
Mar 5, 2011, 5:02:37 PM3/5/11
to gnumap...@googlegroups.com
Okay, maybe it's not the same. I thought illumine qseq was the same as illumina fastq.
Reply all
Reply to author
Forward
0 new messages