I have a subset of an indexed MiSeq run that was obtained by extracting files from Tablet that were aligned against a reference sequence using BWA/samtools. The resulting files is in FASTA format and as near as I can tell unpaired. I'm attempting to preform a denovo assembly of this small (3300 kb) FASTA dataset with SOAPdenovo but SOAP cannot read the FASTA file and crashes in a loop trying to import the data to pregraph. This a small scale run to work out the bugs prior to scaling up to a larger data set.
1) My data appears as:
>M00542_7_000000000-A3C86_1_2103_28825_15020_pos=33_len=89
GTGGATTCACAATCCACTGCCTTGATCCACTTGGCTACATCCGCCCCTTATCCAGCTAAAGGATTTTTTTCTTTTTTCC
ATTGATCATT
>M00542_7_000000000-A3C86_1_1102_26169_15631_pos=125_len=90
CTATTTATTCTGACCTCCGTACTTCGATCGAGATATTGGACATAGAATGCCACTCTTTAAAAAGGAAAAAAGGAGTAAT
CAGCTGTGACA
...
up to ~14,000 reads
2) My config file is:
#maximal read length
max_rd_len=260
[LIB]
#average insert size
avg_ins=420
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#in which order the reads are used while scaffolding
rank=1
#fasta file (unpaired ends)
f=/home/fasta/1700-sorted_bam.fasta
3) and my output error looks like:
Version 2.04: released on July 13th, 2012
Compile Apr 25 2013 16:59:53
********************
Pregraph
********************
Parameters: pregraph -s soap-fasta.config -K 63 -R -o testfasta
In soap-fasta.config, 1 lib(s), maximum read length 260, maximum name length 256.
8 thread(s) initialized.
Import reads from file:
/home/fasta/1700-sorted_bam.fasta
--- 100000000th reads.
--- 200000000th reads.
--- 300000000th reads.
...
--- 108400000000th reads.
...and on until I "kill -9" the run.