pe1-3.hist is empty...

110 views
Skip to first unread message

Shuiquan

unread,
May 15, 2012, 4:41:42 PM5/15/12
to ABySS
I keep having the same error with ABySS 1.3.3 saying:
error: the histogram `pe1-3.hist' is empty
make: *** [pe1-3.dist] Error 1
make: *** Deleting file `pe1-3.dist'

The program works fine on other sequencing data; so I think it is the
problem of the raw data; maybe ABySS does not recognize the identifier
of the reads? But the identifier looks OK to me, ending with :1 and :
2. Moreover, the raw data are definitely in pairs with some read
mapping tests.

The following test is running with a small subset of the original
data.

The command: abyss-pe k=41 n=10 v=-v name=CF50_minor lib='pe1'
pe1='Pair.fasta' &>abyss.log

Reads are like:
>110110_ccCPB772:3:78:3213:15827:0:1
TATAATCATTAAAAGTTCCTGTAAATAAATACTAGGAACTTTTAGTATGTTATATTGACCTTAATCTAAATATAGT
>110110_ccCPB772:3:78:3213:15827:0:2
AAATTTCTAAAGCTTGTTCTCCGGTATCTGGTTGAGATACTATTAGATTATCTATATCTACACCTAATGCTTTAGC
>110110_ccCPB772:3:4:19035:7244:0:1
TCTTATATTGAATAAACTTCATATATGCATATAAAAAAGCAACCATTTGAGATATAACTGTAGCCACAGCTGCTCC
>110110_ccCPB772:3:4:19035:7244:0:2
AAATGGAAACGGGCAATTTTTAACAGAAAACATATGGAAGCTATTATTAAGATTTTCTATACCAGCCATACTTTCG
>110110_ccCPB772:3:98:17808:9960:0:1
TGGCTGTATACCTATACCTCTTAATTCTTTAACTGAATGTTGAGTAGGCTTGGTTTTCAATTCTCCTGATTTTTTT
>110110_ccCPB772:3:98:17808:9960:0:2
TTTATGAATACAAAATATATATTTGTAACAGGGGGAGTAGTATCTTCATTAGGAAAGGGAATAACAGCTGCTTCAT
.....


The log:
which: no mpirun in (/usr/local/amos-3.0.0/bin:/usr/local/MUMmer3.22:/
usr/local/qt/bin:/usr/lib64/qt-3.3/bin:/usr/NX/bin:/usr/kerberos/bin:/
usr/local/bin:/bin:/usr/bin:/usr/local/bin:/home/shuiquan/bin)
ABYSS -k41 -q3 -v --coverage-hist=coverage.hist -s CF50_minor-
bubbles.fa -o CF50_minor-1.fa Pairs.fasta
ABySS 1.3.3
ABYSS -k41 -q3 -v --coverage-hist=coverage.hist -s CF50_minor-
bubbles.fa -o CF50_minor-1.fa Pairs.fasta
Reading `Pairs.fasta'...
Read 100000 reads. Hash load: 3155366 / 4355707 = 0.724 using 191 MB
Read 200000 reads. Hash load: 6016173 / 8844859 = 0.68 using 367 MB
Read 300000 reads. Hash load: 8643483 / 8844859 = 0.977 using 496 MB
Read 388401 reads. Hash load: 10801889 / 17961079 = 0.601 using 674 MB
`Pairs.fasta': discarded 11599 reads shorter than 41 bases
Loaded 10801889 k-mer
Hash load: 10801889 / 11200489 = 0.964 using 620 MB
Minimum k-mer coverage is 23
Coverage: 23 Reconstruction: 194
Coverage: 9.59 Reconstruction: 5189
Coverage: 3.32 Reconstruction: 392229
Coverage: 1.73 Reconstruction: 1896724
Coverage: 1.41 Reconstruction: 10801889
Coverage: 1 Reconstruction: 10801889
Using a coverage threshold of 1...
The median k-mer coverage is 1
The reconstruction is 10801889
The k-mer coverage threshold is 1
Setting parameter e (erode) to 2
Setting parameter E (erodeStrand) to 0
Setting parameter c (coverage) to 2
Generating adjacency
Finding adjacent k-mer: 1000000
Finding adjacent k-mer: 2000000
Finding adjacent k-mer: 3000000
Finding adjacent k-mer: 4000000
Finding adjacent k-mer: 5000000
Finding adjacent k-mer: 6000000
Finding adjacent k-mer: 7000000
Finding adjacent k-mer: 8000000
Finding adjacent k-mer: 9000000
Finding adjacent k-mer: 10000000
Added 21098990 edges.
Eroding tips
Eroded 8686217 tips.
Eroded 0 tips.
Hash load: 2115672 / 2144977 = 0.986 using 548 MB
Pruning tips shorter than 1 bp...
Removed 1211 marked k-mer.
Pruned 1211 k-mer in 1211 tips.
Pruning tips shorter than 2 bp...
Removed 2634 marked k-mer.
Pruned 2634 k-mer in 2500 tips.
Pruning tips shorter than 4 bp...
Removed 8497 marked k-mer.
Pruned 8497 k-mer in 4701 tips.
Pruning tips shorter than 8 bp...
Removed 32375 marked k-mer.
Pruned 32375 k-mer in 9796 tips.
Pruning tips shorter than 16 bp...
Removed 137242 marked k-mer.
Pruned 137242 k-mer in 21569 tips.
Pruning tips shorter than 32 bp...
Removed 543936 marked k-mer.
Pruned 543936 k-mer in 44304 tips.
Pruning tips shorter than 41 bp...
Removed 584952 marked k-mer.
Pruned 584952 k-mer in 32717 tips.
Pruning tips shorter than 41 bp...
Removed 129 marked k-mer.
Pruned 129 k-mer in 15 tips.
Pruning tips shorter than 41 bp...
Pruned 116813 tips in 8 rounds.
Hash load: 804696 / 834181 = 0.965 using 555 MB
Marked 1057 edges of 499 ambiguous vertices.
Removing low-coverage contigs (mean k-mer coverage < 2)
Found 804440 k-mer in 10801 contigs before removing low-coverage
contigs.
Removed 314308 k-mer in 4104 low-coverage contigs.
Split 577 ambigiuous branches.
Hash load: 490388 / 520241 = 0.943 using 555 MB
Eroding tips
Eroded 60 tips.
Eroded 0 tips.
Hash load: 490328 / 520241 = 0.943 using 555 MB
Pruning tips shorter than 1 bp...
Removed 47 marked k-mer.
Pruned 47 k-mer in 47 tips.
Pruning tips shorter than 2 bp...
Removed 67 marked k-mer.
Pruned 67 k-mer in 55 tips.
Pruning tips shorter than 4 bp...
Removed 128 marked k-mer.
Pruned 128 k-mer in 60 tips.
Pruning tips shorter than 8 bp...
Removed 199 marked k-mer.
Pruned 199 k-mer in 63 tips.
Pruning tips shorter than 16 bp...
Removed 255 marked k-mer.
Pruned 255 k-mer in 44 tips.
Pruning tips shorter than 32 bp...
Removed 215 marked k-mer.
Pruned 215 k-mer in 17 tips.
Pruning tips shorter than 41 bp...
Removed 72 marked k-mer.
Pruned 72 k-mer in 3 tips.
Pruning tips shorter than 41 bp...
Pruned 289 tips in 7 rounds.
Hash load: 489345 / 520241 = 0.941 using 555 MB
Popping bubbles
Removed 10 bubbles.
Removed 10 bubbles
Marked 229 edges of 108 ambiguous vertices.
Left 256 unassembled k-mer in circular contigs.
Assembled 488620 k-mer in 6366 contigs.
Removed 10312544 k-mer.
The signal-to-noise ratio (SNR) is -13.2 dB.
AdjList -v -k41 -m30 CF50_minor-1.fa >CF50_minor-1.adj
Reading `CF50_minor-1.fa'...
Finding overlaps of exactly k-1 bp...
V=12732 E=335 E/V=0.0263
Degree: ?
01234
0: 98% 1: 0.98% 2-4: 0.7% 5+: 0% max: 4
Finding overlaps of fewer than k-1 bp...
V=12732 E=379 E/V=0.0298
Degree: ?
01234
0: 98% 1: 1.3% 2-4: 0.7% 5+: 0% max: 4
abyss-filtergraph -v -k41 -g CF50_minor-2.adj CF50_minor-1.adj
>CF50_minor-1.path
Loading graph from file: CF50_minor-1.adj
Graph stats before:
V=12732 E=379 E/V=0.0298
Degree: ?
01234
0: 98% 1: 1.3% 2-4: 0.7% 5+: 0% max: 4
Removing shim contigs from the graph...
Pass 1: Checking 50 contigs.
Pass 2: Checking 8 contigs.
Shim removal stats:
Removed: 25 Too Complex: 31 Tails: 6308 Too Long: 2 Self Adjacent: 0
Parallel Edges: 0
Graph stats after:
V=12682 E=329 E/V=0.0259
Degree: ?
01234
0: 98% 1: 1.1% 2-4: 0.48% 5+: 0.032% max: 17
PopBubbles -v -j2 -k41 -p0.9 -g CF50_minor-3.adj CF50_minor-1.fa
CF50_minor-2.adj >CF50_minor-2.path
Reading `CF50_minor-2.adj'...
V=12682 E=329 E/V=0.0259
Degree: ?
01234
0: 98% 1: 1.1% 2-4: 0.48% 5+: 0.032% max: 17
Reading `CF50_minor-1.fa'...
Bubbles: 3 Popped: 2 Scaffolds: 0 Complex: 0 Too long: 0 Too many: 0
Dissimilar: 1
V=12628 E=271 E/V=0.0215
Degree: ?
01234
0: 99% 1: 0.74% 2-4: 0.45% 5+: 0.032% max: 17
MergeContigs -v -k41 -o CF50_minor-3.fa CF50_minor-1.fa
CF50_minor-2.adj CF50_minor-2.path
Reading `CF50_minor-2.adj'...
Read 12682 vertices. Using 1.77 MB of memory.
Reading `CF50_minor-1.fa'...
Read 6341 sequences. Using 3.31 MB of memory.
Reading `CF50_minor-2.path'...
Read 25 paths. Using 3.31 MB of memory.
The minimum coverage of single-end contigs is 2.
The minimum coverage of merged contigs is 2.05455.
Consider increasing the coverage threshold parameter, c, to 2.05455.
n n:200 n:N50 min N80 N50 N20 max sum
6314 237 49 200 223 331 1573 10022 89422 CF50_minor-3.fa
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}'
\
CF50_minor-2.path CF50_minor-1.fa >CF50_minor-indel.fa
ln -sf CF50_minor-3.fa CF50_minor-unitigs.fa
abyss-map -v -j2 -l41 Pairs.fasta CF50_minor-3.fa \
|abyss-fixmate -v -h pe1-3.hist \
|sort -snk3 -k4 \
|DistanceEst -v -j2 -k41 -l41 -s200 -n10 -o pe1-3.dist pe1-3.hist
Reading from standard input...
Reading `CF50_minor-3.fa'...
Reading `CF50_minor-3.fa'...
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Read 832 kB in 6314 contigs.
Using 8.22 MB of memory and 9.88 B/bp.
Read 3 alignments. Hash load: 3 / 5 = 0.6 using 135 kB.
Read 6 alignments. Hash load: 6 / 11 = 0.545455 using 135 kB.
Read 12 alignments. Hash load: 12 / 23 = 0.521739 using 135 kB.
Read 24 alignments. Hash load: 24 / 47 = 0.510638 using 135 kB.
Read 48 alignments. Hash load: 48 / 97 = 0.494845 using 135 kB.
Read 98 alignments. Hash load: 98 / 199 = 0.492462 using 135 kB.
Read 200 alignments. Hash load: 200 / 409 = 0.488998 using 135 kB.
Read 410 alignments. Hash load: 410 / 823 = 0.498177 using 135 kB.
Read 824 alignments. Hash load: 824 / 1741 = 0.473291 using 270 kB.
Read 1742 alignments. Hash load: 1742 / 3739 = 0.4659 using 569 kB.
Read 3740 alignments. Hash load: 3740 / 7517 = 0.497539 using 975 kB.
Read 7518 alignments. Hash load: 7518 / 15173 = 0.495485 using 2 MB.
Read 15174 alignments. Hash load: 15174 / 30727 = 0.493833 using 3.62
MB.
Read 30728 alignments. Hash load: 30728 / 62233 = 0.493757 using 7 MB.
Read 62234 alignments. Hash load: 62234 / 126271 = 0.492861 using 14
MB.
Read 126272 alignments. Hash load: 126272 / 256279 = 0.492713 using
28.4 MB.
Read 256280 alignments. Hash load: 256280 / 520241 = 0.492618 using
57.4 MB.
Mapped 46076 of 400000 reads (11.5%)
Mapped 46076 of 400000 reads uniquely (11.5%)
Read 400000 alignments
Mateless 400000 100%
Unaligned 0
Singleton 0
FR 0
RF 0
FF 0
Different 0
Total 400000
error: the histogram `pe1-3.hist' is empty
make: *** [pe1-3.dist] Error 1
make: *** Deleting file `pe1-3.dist'


Shuiquan

unread,
May 15, 2012, 5:57:40 PM5/15/12
to ABySS
I have figured out why. It is the identifier. After I changed the
suffix from ":1" to "/1", it worked fine.

However, I remember the previous version of ABySS works also with ":1"
suffix, right?

Shaun Jackman

unread,
May 16, 2012, 1:21:50 PM5/16/12
to Shuiquan, ABySS
Hi Shuiquan,

Yes, you’re right that the identifier must end in /1 and /2. Previous versions of ABySS handled a variety of extensions, but the community has seemed to settle on the standard of /1 and /2, and ABySS (abyss-fixmate) supports only that suffix now.

Cheers,
Shaun
Reply all
Reply to author
Forward
0 new messages