Re: Error Executing Abyss

117 views
Skip to first unread message

Tony Raymond

unread,
Aug 24, 2012, 1:32:49 PM8/24/12
to ricardom, abyss...@googlegroups.com
Hi Richardo,

Looks like the problem happened earlier in the execution of the ABySS. "error: the histogram `contig.fa-3.hist' is empty" suggests that the alignments in the previous stage couldn't be mated, or that there were no reads that aligned to the same contig. Please make sure that the read names in each file are labeled <read_id>/1 and <read_id>/2 for forward/reverse pairs.

If that doesn't fix your problem, please rerun abyss-pe with the verbose option (all the 'v's in v=-v should be lowercase) and send me the complete log file from your run.

Thanks,
Tony

On 2012-08-24, at 10:07 AM, ricardom wrote:

> Hi group,
>
>
> I'm starting to use abyss for genome assembly and i have problems to run the first test in my Ubuntu desktop. I installed correctly the abyss program - and the additional packages (boost and google sparsehash) - and checked the input files, but every race abyss return the following error:
>
>
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> gunzip -c contig.fa-3.sam.gz \
> |DistanceEst -j2 -k64 -s200 -n10 -o contig.fa-3.dist contig.fa-3.hist
> error: the histogram `contig.fa-3.hist' is empty
> make: *** [contig.fa-3.dist] Error 1
> make: *** Deleting file `contig.fa-3.dist'
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> I'm using the following the command line: abyss-pe k=64 name=contig.fa V=-v in='Carol8_CGATGT_L007_R2_001.fastq Carol8_CGATGT_L007_R2_002.fastq'
>
>
> Could you help me to solve this problem?
>
>
>
> Kind Regards,
>
> Ricardo.

Tony Raymond

unread,
Aug 24, 2012, 8:35:03 PM8/24/12
to Ricardo Milanez Fonseca, abyss-users
Hi Ricardo, (cc'd to abyss-users)

Your original reads actually look fine to me. ABySS will parse the read ID and the other portion (1:N:0…) properly. The issue is different than I had originally guessed.

The first stage of the assembly is removing most of kmers in your data as though they are errors, and then only a handful of reads are able to map back to the assembly. Do you know how much coverage you have? i.e. what is the read length, and expected genome size? 

Thanks,
Tony

On 2012-08-24, at 5:10 PM, Ricardo Milanez Fonseca wrote:

Hi Tony,

I have reviewed the files one more time and fixed headers.
Now, the files containing the final "/1" and "/2" on each end of the header read.
This is a little example of both:

out-Carol8_CGATGT_L007_R1_001.fa
>HWI-ST1054:100:d0abmacxx:7:1201:1247:2107 1:N:0:CGATGT/1
CCATGATTTAACTACCTCCCCCTGGGTACCTCCCACAATATATNNTNNNTCCAGGAGATACATTTCAACTTGAGA
>HWI-ST1054:100:d0abmacxx:7:1201:1207:2139 1:N:0:CGATGT/1
GGCAATTACTGATATAGGATTTTCCACTTTATGGCTTTGCCTTNNANNNAACATCATATGAATGAAATCATGCAA
>HWI-ST1054:100:d0abmacxx:7:1201:1224:2213 1:N:0:CGATGT/1
GCTCTGCTGTGTGTTCCTGTCAGCTAGGAGAGTTGAAGGTGTGNNGNNNTCAGCGAGTTAGAAGCAGGTATCTGC

out-Carol8_CGATGT_L007_R2_001.fa
>HWI-ST1054:100:d0abmacxx:7:1201:1247:2107 2:N:0:CGATGT/2
AAGTAACTACCTTGCTTTTGATTTTACAGGCTCATAGACAGAAGGATTTCTTTGTTTCAGATGAGACTTTGGACT
>HWI-ST1054:100:d0abmacxx:7:1201:1207:2139 2:N:0:CGATGT/2
ACTCTTCCANAATTGTTTAGGTAATNGGCGAGAATTAGCGAAGATAACAGCTACTAAGTGGTAGTAATTTGTCCT
>HWI-ST1054:100:d0abmacxx:7:1201:1224:2213 2:N:0:CGATGT/2
CTTAGCTGACACGAACACACAGCAGAGCAGGTTGAAAGATGCGTTTCTCCTGTGTTTTAGACGAAGTCTTACTGC


However, the error remains for analysis.
This "text" below is the output of the program, printed on the terminal.
In the last lines is the error, very similar to what I reported in the previous message.


P.s.1: In attached, is the file config.log
P.s.2: command line used: abyss-pe k=64 name=contig.fa v=-v in='out-Carol8_CGATGT_L007_R1_001.fa out-Carol8_CGATGT_L007_R2_001.fa'
P.s.3: the following files were created by this analysis:

-rw-r--r-- 1 root root    344742 Aug 23 22:02 contig.fa-1.adj
-rw-r--r-- 1 root root    946980 Aug 23 22:02 contig.fa-1.fa
-rw-r--r-- 1 root root     11809 Aug 23 22:02 contig.fa-1.path
-rw-r--r-- 1 root root    362764 Aug 23 22:02 contig.fa-2.adj
-rw-r--r-- 1 root root      1323 Aug 23 22:02 contig.fa-2.path
-rw-r--r-- 1 root root    358630 Aug 23 22:02 contig.fa-3.adj
-rw-r--r-- 1 root root    726335 Aug 23 22:02 contig.fa-3.fa
-rw-r--r-- 1 root root         0 Aug 23 22:04 contig.fa-3.hist
-rw-r--r-- 1 root root     24535 Aug 23 22:04 contig.fa-3.sam.gz
-rw-r--r-- 1 root root     15424 Aug 23 22:02 contig.fa-bubbles.fa
-rw-r--r-- 1 root root       379 Aug 23 22:02 contig.fa-indel.fa



Could you help me?

Thanks,
Ricardo.






---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


root@ricardom:/usr/local/abyss-1.3.4# abyss-pe k=64 name=contig.fa v=-v in='out-Carol8_CGATGT_L007_R1_001.fa out-Carol8_CGATGT_L007_R2_001.fa'
ABYSS -k64 -q3 -v --coverage-hist=coverage.hist -s contig.fa-bubbles.fa  -o contig.fa-1.fa out-Carol8_CGATGT_L007_R1_001.fa out-Carol8_CGATGT_L007_R2_001.fa 
ABySS 1.3.4
ABYSS -k64 -q3 -v --coverage-hist=coverage.hist -s contig.fa-bubbles.fa -o contig.fa-1.fa out-Carol8_CGATGT_L007_R1_001.fa out-Carol8_CGATGT_L007_R2_001.fa
Reading `out-Carol8_CGATGT_L007_R1_001.fa'...
Read 100000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 200000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 300000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 400000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 500000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 600000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 700000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 800000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 900000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1100000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1200000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1300000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1400000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1500000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1600000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1700000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1800000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 1900000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2100000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2200000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2300000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2400000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2500000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2600000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2700000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2800000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 2900000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3100000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3200000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3300000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3400000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3500000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3600000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3700000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3800000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 3900000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 4000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 4000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
`out-Carol8_CGATGT_L007_R1_001.fa': discarded 4000000 reads containing non-ACGT characters
warning: `out-Carol8_CGATGT_L007_R1_001.fa': contains no usable sequence
Reading `out-Carol8_CGATGT_L007_R2_001.fa'...
Read 100000 reads. Hash load: 1131580 / 1073741824 = 0.00105 using 306 MB
Read 200000 reads. Hash load: 2253559 / 1073741824 = 0.0021 using 343 MB
Read 300000 reads. Hash load: 3370074 / 1073741824 = 0.00314 using 378 MB
Read 400000 reads. Hash load: 4481505 / 1073741824 = 0.00417 using 413 MB
Read 500000 reads. Hash load: 5586539 / 1073741824 = 0.0052 using 448 MB
Read 600000 reads. Hash load: 6686917 / 1073741824 = 0.00623 using 482 MB
Read 700000 reads. Hash load: 7789678 / 1073741824 = 0.00725 using 516 MB
Read 800000 reads. Hash load: 8887924 / 1073741824 = 0.00828 using 549 MB
Read 900000 reads. Hash load: 9980480 / 1073741824 = 0.0093 using 582 MB
Read 1000000 reads. Hash load: 11070892 / 1073741824 = 0.0103 using 615 MB
Read 1100000 reads. Hash load: 12159144 / 1073741824 = 0.0113 using 647 MB
Read 1200000 reads. Hash load: 13249088 / 1073741824 = 0.0123 using 679 MB
Read 1300000 reads. Hash load: 14336331 / 1073741824 = 0.0134 using 711 MB
Read 1400000 reads. Hash load: 15420355 / 1073741824 = 0.0144 using 743 MB
Read 1500000 reads. Hash load: 16504724 / 1073741824 = 0.0154 using 774 MB
Read 1600000 reads. Hash load: 17587329 / 1073741824 = 0.0164 using 805 MB
Read 1700000 reads. Hash load: 18666123 / 1073741824 = 0.0174 using 836 MB
Read 1800000 reads. Hash load: 19747749 / 1073741824 = 0.0184 using 867 MB
Read 1900000 reads. Hash load: 20826324 / 1073741824 = 0.0194 using 897 MB
Read 2000000 reads. Hash load: 21903748 / 1073741824 = 0.0204 using 928 MB
Read 2100000 reads. Hash load: 22980661 / 1073741824 = 0.0214 using 959 MB
Read 2200000 reads. Hash load: 24054471 / 1073741824 = 0.0224 using 990 MB
Read 2300000 reads. Hash load: 25128047 / 1073741824 = 0.0234 using 1.02 GB
Read 2400000 reads. Hash load: 26200894 / 1073741824 = 0.0244 using 1.05 GB
Read 2500000 reads. Hash load: 27274943 / 1073741824 = 0.0254 using 1.09 GB
Read 2600000 reads. Hash load: 28348567 / 1073741824 = 0.0264 using 1.12 GB
Read 2700000 reads. Hash load: 29418424 / 1073741824 = 0.0274 using 1.15 GB
Read 2800000 reads. Hash load: 30488873 / 1073741824 = 0.0284 using 1.18 GB
Read 2900000 reads. Hash load: 31557624 / 1073741824 = 0.0294 using 1.22 GB
Read 3000000 reads. Hash load: 32625096 / 1073741824 = 0.0304 using 1.25 GB
Read 3100000 reads. Hash load: 33691224 / 1073741824 = 0.0314 using 1.28 GB
Read 3200000 reads. Hash load: 34755795 / 1073741824 = 0.0324 using 1.31 GB
Read 3300000 reads. Hash load: 35820790 / 1073741824 = 0.0334 using 1.34 GB
Read 3400000 reads. Hash load: 36883621 / 1073741824 = 0.0344 using 1.37 GB
Read 3500000 reads. Hash load: 37937848 / 1073741824 = 0.0353 using 1.4 GB
Read 3600000 reads. Hash load: 38992687 / 1073741824 = 0.0363 using 1.43 GB
Read 3700000 reads. Hash load: 40050320 / 1073741824 = 0.0373 using 1.46 GB
Read 3800000 reads. Hash load: 41111197 / 1073741824 = 0.0383 using 1.49 GB
Read 3900000 reads. Hash load: 42169971 / 1073741824 = 0.0393 using 1.52 GB
Read 4000000 reads. Hash load: 43225443 / 1073741824 = 0.0403 using 1.55 GB
Read 4000000 reads. Hash load: 43225443 / 1073741824 = 0.0403 using 1.55 GB
`out-Carol8_CGATGT_L007_R2_001.fa': discarded 2211 reads containing non-ACGT characters
Loaded 43225443 k-mer
Hash load: 43225443 / 134217728 = 0.322 using 1.53 GB
Minimum k-mer coverage is 59
Coverage: 59 Reconstruction: 4676
Coverage: 10.7 Reconstruction: 39905
Coverage: 4 Reconstruction: 191436
Coverage: 2.45 Reconstruction: 1042286
Coverage: 1.41 Reconstruction: 43225443
Coverage: 1 Reconstruction: 43225443
Using a coverage threshold of 1...
The median k-mer coverage is 1
The reconstruction is 43225443
The k-mer coverage threshold is 1
Setting parameter e (erode) to 2
Setting parameter E (erodeStrand) to 0
Setting parameter c (coverage) to 2
Generating adjacency
Finding adjacent k-mer: 1000000
Finding adjacent k-mer: 2000000
Finding adjacent k-mer: 3000000
Finding adjacent k-mer: 4000000
Finding adjacent k-mer: 5000000
Finding adjacent k-mer: 6000000
Finding adjacent k-mer: 7000000
Finding adjacent k-mer: 8000000
Finding adjacent k-mer: 9000000
Finding adjacent k-mer: 10000000
Finding adjacent k-mer: 11000000
Finding adjacent k-mer: 12000000
Finding adjacent k-mer: 13000000
Finding adjacent k-mer: 14000000
Finding adjacent k-mer: 15000000
Finding adjacent k-mer: 16000000
Finding adjacent k-mer: 17000000
Finding adjacent k-mer: 18000000
Finding adjacent k-mer: 19000000
Finding adjacent k-mer: 20000000
Finding adjacent k-mer: 21000000
Finding adjacent k-mer: 22000000
Finding adjacent k-mer: 23000000
Finding adjacent k-mer: 24000000
Finding adjacent k-mer: 25000000
Finding adjacent k-mer: 26000000
Finding adjacent k-mer: 27000000
Finding adjacent k-mer: 28000000
Finding adjacent k-mer: 29000000
Finding adjacent k-mer: 30000000
Finding adjacent k-mer: 31000000
Finding adjacent k-mer: 32000000
Finding adjacent k-mer: 33000000
Finding adjacent k-mer: 34000000
Finding adjacent k-mer: 35000000
Finding adjacent k-mer: 36000000
Finding adjacent k-mer: 37000000
Finding adjacent k-mer: 38000000
Finding adjacent k-mer: 39000000
Finding adjacent k-mer: 40000000
Finding adjacent k-mer: 41000000
Finding adjacent k-mer: 42000000
Finding adjacent k-mer: 43000000
Added 79486199 edges.
Eroding tips
Eroded 42113885 tips.
Eroded 0 tips.
Hash load: 1111558 / 4194304 = 0.265 using 1.49 GB
Pruning tips shorter than 1 bp...
Removed 6343 marked k-mer.
Pruned 6343 k-mer in 6343 tips.
Pruning tips shorter than 2 bp...
Removed 12545 marked k-mer.
Pruned 12545 k-mer in 11821 tips.
Pruning tips shorter than 4 bp...
Removed 41594 marked k-mer.
Pruned 41594 k-mer in 22456 tips.
Pruning tips shorter than 8 bp...
Removed 148222 marked k-mer.
Pruned 148222 k-mer in 43274 tips.
Pruning tips shorter than 16 bp...
Removed 494780 marked k-mer.
Pruned 494780 k-mer in 84803 tips.
Pruning tips shorter than 32 bp...
Removed 74987 marked k-mer.
Pruned 74987 k-mer in 5214 tips.
Pruning tips shorter than 64 bp...
Removed 50074 marked k-mer.
Pruned 50074 k-mer in 1819 tips.
Pruning tips shorter than 64 bp...
Removed 405 marked k-mer.
Pruned 405 k-mer in 24 tips.
Pruning tips shorter than 64 bp...
Pruned 175754 tips in 8 rounds.
Hash load: 282608 / 1048576 = 0.27 using 1.49 GB
Marked 40609 edges of 15319 ambiguous vertices.
Removing low-coverage contigs (mean k-mer coverage < 2)
Found 282350 k-mer in 15109 contigs before removing low-coverage contigs.
Removed 24285 k-mer in 2608 low-coverage contigs.
Split 4523 ambigiuous branches.
Hash load: 258323 / 1048576 = 0.246 using 1.49 GB
Eroding tips
Eroded 50 tips.
Eroded 0 tips.
Hash load: 258273 / 1048576 = 0.246 using 1.49 GB
Pruning tips shorter than 1 bp...
Removed 103 marked k-mer.
Pruned 103 k-mer in 103 tips.
Pruning tips shorter than 2 bp...
Removed 147 marked k-mer.
Pruned 147 k-mer in 97 tips.
Pruning tips shorter than 4 bp...
Removed 340 marked k-mer.
Pruned 340 k-mer in 118 tips.
Pruning tips shorter than 8 bp...
Removed 854 marked k-mer.
Pruned 854 k-mer in 179 tips.
Pruning tips shorter than 16 bp...
Removed 1769 marked k-mer.
Pruned 1769 k-mer in 182 tips.
Pruning tips shorter than 32 bp...
Removed 2896 marked k-mer.
Pruned 2896 k-mer in 163 tips.
Pruning tips shorter than 64 bp...
Removed 3978 marked k-mer.
Pruned 3978 k-mer in 107 tips.
Pruning tips shorter than 64 bp...
Removed 31 marked k-mer.
Pruned 31 k-mer in 4 tips.
Pruning tips shorter than 64 bp...
Pruned 953 tips in 8 rounds.
Hash load: 248155 / 1048576 = 0.237 using 1.49 GB
Popping bubbles
Removed 42 bubbles.
Removed 42 bubbles
Marked 26643 edges of 9686 ambiguous vertices.
Left 258 unassembled k-mer in circular contigs.
Assembled 244867 k-mer in 8820 contigs.
Removed 42977288 k-mer.
The signal-to-noise ratio (SNR) is -22.4 dB.
AdjList -v -k64 -m50 contig.fa-1.fa >contig.fa-1.adj
Reading `contig.fa-1.fa'...
Finding overlaps of exactly k-1 bp...
V=17640 E=31732 E/V=1.8
Degree: ▄██▃▄
        01234
0: 16% 1: 29% 2-4: 55% 5+: 0% max: 4
Finding overlaps of fewer than k-1 bp...
V=17640 E=31984 E/V=1.81
Degree: ▃██▂▃
        01234
0: 15% 1: 30% 2-4: 55% 5+: 0% max: 4
abyss-filtergraph -v -k64 -g contig.fa-2.adj contig.fa-1.adj >contig.fa-1.path
Loading graph from file: contig.fa-1.adj
Graph stats before:
V=17640 E=31984 E/V=1.81
Degree: ▃██▂▃
        01234
0: 15% 1: 30% 2-4: 55% 5+: 0% max: 4
Removing shim contigs from the graph...
Pass 1: Checking 4920 contigs.
Pass 2: Checking 211 contigs.
Shim removal stats:
Removed: 2231 Too Complex: 3677 Tails: 2402 Too Long: 508 Self Adjacent: 2 Parallel Edges: 0
Graph stats after:
V=13178 E=27522 E/V=2.09
Degree: ▇█▇▅▆
        01234
0: 20% 1: 23% 2-4: 53% 5+: 4.5% max: 15
PopBubbles -v -j2 -k64 -p0.9  -g contig.fa-3.adj contig.fa-1.fa contig.fa-2.adj >contig.fa-2.path
Reading `contig.fa-2.adj'...
V=13178 E=27522 E/V=2.09
Degree: ▇█▇▅▆
        01234
0: 20% 1: 23% 2-4: 53% 5+: 4.5% max: 15
Reading `contig.fa-1.fa'...
Bubbles: 6 Popped: 0 Scaffolds: 0 Complex: 3 Too long: 0 Too many: 2 Dissimilar: 1
V=13064 E=27408 E/V=2.1
Degree: ▇█▇▅▇
        01234
0: 20% 1: 22% 2-4: 53% 5+: 4.5% max: 15
MergeContigs -v -k64 -o contig.fa-3.fa contig.fa-1.fa contig.fa-2.adj contig.fa-2.path
Reading `contig.fa-2.adj'...
Read 13178 vertices. Using 1.25 MB of memory.
Reading `contig.fa-1.fa'...
Read 6589 sequences. Using 2.24 MB of memory.
Reading `contig.fa-2.path'...
Read 46 paths. Using 2.24 MB of memory.
The minimum coverage of single-end contigs is 2.
The minimum coverage of merged contigs is 2.
n n:200 n:N50 min N80 N50 N20 max sum
6532 223 68 200 243 383 642 1279 81080 contig.fa-3.fa
awk '!/^>/ {x[">" $1]=1; next} {getline s} $1 in x {print $0 "\n" s}' \
contig.fa-2.path contig.fa-1.fa >contig.fa-indel.fa
ln -sf contig.fa-3.fa contig.fa-unitigs.fa
abyss-map -v -j2 -l64   out-Carol8_CGATGT_L007_R1_001.fa out-Carol8_CGATGT_L007_R2_001.fa contig.fa-3.fa \
|abyss-fixmate -v  -h contig.fa-3.hist \
|sort -snk3 -k4 \
|DistanceEst -v -j2 -k64 -l64 -s200 -n10   -o contig.fa-3.dist contig.fa-3.hist
Reading `contig.fa-3.fa'...
Using 623 kB of memory and 95.3 B/sequence.
Reading `contig.fa-3.fa'...
Building the suffix array...
Reading from standard input...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Read 718 kB in 6532 contigs.
Using 6.9 MB of memory and 9.61 B/bp.
Read 1000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 2000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 3000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 4000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 5000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 6000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 7000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Read 8000000 alignments. Hash load: 0 / 2 = 0 using 348 kB.
Mapped 337429 of 8000000 reads (4.22%)
Mapped 337429 of 8000000 reads uniquely (4.22%)
Read 8000000 alignments
Mateless         0
Unaligned  3662571  91.6%
Singleton   337429  8.44%
FR               0
RF               0
FF               0
Different        0
Total      4000000
error: the histogram `contig.fa-3.hist' is empty
make: *** [contig.fa-3.dist] Error 1
make: *** Deleting file `contig.fa-3.dist'

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Tony Raymond

unread,
Aug 28, 2012, 2:46:06 PM8/28/12
to Ricardo Milanez Fonseca, abyss-users
Hi Ricardo, (cc'd to abyss-users)

I calculate coverage like so:
coverage = #reads * read_length / genome_size

In this case it looks like you are assembling about 0.2x coverage, which is much too low for an assembly (we usually shoot for 30x), but it depends if you are attempting to assemble the entire genome. If you are trying to assembly the entire genome, you'll probably want at least 100 fold more data to get a reasonable assembly out. Also, I missed this the first time I looked at your log: 
...
Read 4000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
Read 4000000 reads. Hash load: 0 / 1073741824 = 0 using 269 MB
`out-Carol8_CGATGT_L007_R1_001.fa': discarded 4000000 reads containing non-ACGT characters
warning: `out-Carol8_CGATGT_L007_R1_001.fa': contains no usable sequence
...

This means that ABYSS was unable to use any sequence from the first file. This happens if there are no reads with at lease k bases of usable sequence in them. This can happen due to ambiguity codes (i.e. 'N') in the middle of all reads or the reads being too small for the kmer value chosen.

Hope that helps,
Tony

On 2012-08-28, at 11:04 AM, Ricardo Milanez Fonseca wrote:

Hi Tony, i'm sorry for the delay.
I do not know the exactly coverage that we have to this genome, but it's less than 30x. 
The reads length is 75bp and the expected genome size is around 3Gbps.

Kind regards,

Ricardo.




On 24 August 2012 21:56, Ricardo Milanez Fonseca <ricard...@gmail.com> wrote:
Hey Tony, 

I'm sorry, but i don't have these informations right now. 
If this details are crucial for you, we could continue talking on monday, when i'll have all informations about these sequences.
I know that this genome belongs a primate (non-human), generated by our group. 
So, this is a large genome.

Take care Tony,
Ricardo.
Reply all
Reply to author
Forward
0 new messages