I'm working with a HiSeq run and the latest version of ABySS 1.2.6 and
is crashing with this problem:
Reading `nippo_52-1.fa'...
Finding overlaps of exactly k-1 bp...
V=3970578 E=4887678 E/V=1.23097
Degree: ▂█▅
01234
0: 16% 1: 49% 2-4: 35% 5+: 0.00013% max: 5
Finding overlaps of fewer than k-1 bp...
V=3970578 E=4927497 E/V=1.2
Degree: ▂█▅
01234
0: 16% 1: 49% 2-4: 35% 5+: 0.007% max: 61
Bubbles: 52691 Popped: 47245 Too long: 0 Too many: 1 Dissimilar: 5449
The minimum coverage of single-end contigs is 1.34746.
The minimum coverage of merged contigs is 3.74419.
Consider increasing the coverage threshold parameter, c, to 3.74419.
Reading from standard input...
Reading target `nippo_52-3.fa'...
Read 369284666 bases, 1828614 contigs, 1828614 scaffolds from
`nippo_52-3.fa'. Expecting 276025352 k-mer.
Reading target `nippo_52-3.fa'...
Read 100000 contigs. Hash load: 13122012 / 1073741824 = 0.0122208 using
627 MB.
Read 200000 contigs. Hash load: 26243276 / 1073741824 = 0.024441 using
1.24 GB.
Read 300000 contigs. Hash load: 39132132 / 1073741824 = 0.0364446 using
1.87 GB.
Read 400000 contigs. Hash load: 52019007 / 1073741824 = 0.0484465 using
2.46 GB.
Read 500000 contigs. Hash load: 65062681 / 1073741824 = 0.0605943 using
3.05 GB.
Read 600000 contigs. Hash load: 77949832 / 1073741824 = 0.0725964 using
3.63 GB.
Read 700000 contigs. Hash load: 91125950 / 1073741824 = 0.0848677 using
4.19 GB.
Read 800000 contigs. Hash load: 104137968 / 1073741824 = 0.096986 using
4.75 GB.
Read 900000 contigs. Hash load: 117216551 / 1073741824 = 0.109166 using
5.3 GB.
Read 1000000 contigs. Hash load: 130183408 / 1073741824 = 0.121243 using
5.86 GB.
Read 1100000 contigs. Hash load: 143063707 / 1073741824 = 0.133238 using
6.39 GB.
Read 1200000 contigs. Hash load: 156129839 / 1073741824 = 0.145407 using
6.93 GB.
Read 1300000 contigs. Hash load: 169002481 / 1073741824 = 0.157396 using
7.47 GB.
Read 1400000 contigs. Hash load: 182058947 / 1073741824 = 0.169556 using
8.03 GB.
Read 1500000 contigs. Hash load: 195152207 / 1073741824 = 0.18175 using
8.58 GB.
Read 1600000 contigs. Hash load: 208166192 / 1073741824 = 0.19387 using
9.12 GB.
Read 1700000 contigs. Hash load: 221105979 / 1073741824 = 0.205921 using
9.67 GB.
Read 1800000 contigs. Hash load: 250061182 / 1073741824 = 0.232888 using
10.9 GB.
Read 1828614 contigs. Hash load: 276024987 / 1073741824 = 0.257068 using
12 GB.
Found 365 (0.000132234%) duplicate k-mer.
Reading
`/lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/5142_1_1.fastq'...
Reading
`/lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/5142_1_2.fastq'...
Reading
`/lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/5982_1_1.fastq'...
Reading
`/lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/5982_1_2.fastq'...
Read 3 alignments. Hash load: 3 / 5 = 0.6 using 0 B.
Read 6 alignments. Hash load: 6 / 11 = 0.545455 using 0 B.
Read 12 alignments. Hash load: 12 / 23 = 0.521739 using 0 B.
Read 24 alignments. Hash load: 24 / 47 = 0.510638 using 0 B.
Read 48 alignments. Hash load: 48 / 97 = 0.494845 using 0 B.
Read 98 alignments. Hash load: 98 / 199 = 0.492462 using 0 B.
Read 200 alignments. Hash load: 200 / 409 = 0.488998 using 0 B.
Read 410 alignments. Hash load: 410 / 823 = 0.498177 using 0 B.
Read 890 alignments. Hash load: 824 / 1741 = 0.473291 using 135 kB.
Read 2474 alignments. Hash load: 1742 / 3739 = 0.4659 using 401 kB.
error: duplicate read ID `HS18_5982:1:1101:1106:1969/1'
warning: the seed-length should be at least twice k: k=52, s=100
nippo_52-3.hist: No such file or directory
make: *** [nippo_52-3.dist] Error 1
make: *** Deleting file `nippo_52-3.dist'
farm2-head2[as9]71: more nippo_52-3.hist
nippo_52-3.hist: No such file or directory
The commands issued were:
AdjList -v -k52 -m30 nippo_52-1.fa >nippo_52-1.adj
PopBubbles -v -k52 -p0.9 -g nippo_52-3.adj nippo_52-1.fa nippo_52-1.adj
>nippo_52-1.path
MergeContigs -k52 -o nippo_52-3.fa nippo_52-1.fa nippo_52-1.adj
nippo_52-1.path
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
nippo_52-1.path nippo_52-1.fa >nippo_52-indel.fa
KAligner -v -i -j8 -k52
/lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/*.fast*
nippo_52-3.fa \
|ParseAligns -v -k52 -h nippo_52-3.hist \
|sort -snk3 -k4 \
|gzip >nippo_52-3.sam.gz
gunzip -c nippo_52-3.sam.gz \
|DistanceEst -v -j8 -k52 -s100 -n10 -o nippo_52-3.dist nippo_52-3.hist
I tried looking in the ABySS list and only found problems with
duplicated ids when converting to ACE files... Any ideas are welcome.
Cheers.
--
Alejandro Sanchez-Flores
Team133 Parasite Genomics
Wellcome Trust Sanger Institute
Cambridge, UK.
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Cheers.
Please report the output of...
grep HS18_5982:1:1101:1106:1969 /lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/{5142,5982}_1_{1,2}.fastq
head /lustre/scratch103/sanger/as9/ABYSS_results/NIPPO/genomic/reads/mouse_deriv/{5142,5982}_1_{1,2}.fastq
Cheers,
Shaun
Mystery solved... It was indeed duplicated in the reads /2 file. The
HiSeq runs here at Sanger are now stored as BAM files and there was a
problem with the pipeline that generates the fastq reads...
Problem fix now... is the kind of things that sometime you think "That
can't be wrong..."
Cheers.