Interpretation of result

1,024 views
Skip to first unread message

sc...@ncsu.edu

unread,
Jan 23, 2014, 6:05:47 PM1/23/14
to abyss...@googlegroups.com
Hello, I've gained results of Abyss.
However, I am in trouble with interpreting them.

1. This table is output of stats.

 

n

n:500

n:N50

min

N80

N50

N20

E-size

max

sum

contigs

128239

5402

498

500

32250

82060

183593

114516

758070

149.7e6

sdcaffolds

126745

4356

408

500

39439

101140

220518

142070

939060

149.6e6





  - What is the big difference between contigs and scaffolds?
    (Seeing the contig file and scaffold file, it does not seem that there exists a big difference between them.)
  - What are n:500, n:N50, E-size, and sum?

2. This table is output of tablet (genome viewer) of which some numbers are different from abyss.
   

Description

Value

Total number of contigs

128,239

Average contig length

1,317

Total number of reads

37,088,213 ( = 21,005,534*2 - 936,337 – 3050181 )

Average reads per contig

289

N50

70,023

N90

244

Assembly file size

555.01 MB

  
  - Why are they different?
   Abyss vs Tablet:
   (N50) 82,060 vs 70,023
   (If E-size stands for average, average contig length) 114516 vs 1317

Thank you in advance.



Tony Raymond

unread,
Jan 24, 2014, 8:01:32 PM1/24/14
to sc...@ncsu.edu, abyss...@googlegroups.com
Hi,

I’ve added answers in your quoted text below.

Cheers,
Tony

Contigs are Unitigs merged using paired-end information (typically, we require there be a unique path between unitigs to make the merges), and Scaffolds are Contigs merged with either paired-end or mate-pair information (here we are only concerned with order and orientation, and fill gaps with N’s, AKA scaffolds).

    (Seeing the contig file and scaffold file, it does not seem that there exists a big difference between them.)
An N50 increase from 82kb to 101kb isn’t nothing. I would expect a greater increase if you used mate-pair data, but this is inline with using just paired-end data for scaffolding.

  - What are n:500, n:N50, E-size, and sum?
n:500 - number of contigs over 500bp long
n:N50 - number of contigs longer than the N50 length
E-size - expected size of a contig choosing a random base from the assembly.
Sum - number of ACGT characters found in contigs longer than 500bp.


2. This table is output of tablet (genome viewer) of which some numbers are different from abyss.
   

Description

Value

Total number of contigs

128,239

Average contig length

1,317

Total number of reads

37,088,213 ( = 21,005,534*2 - 936,337 – 3050181 )

Average reads per contig

289

N50

70,023

N90

244

Assembly file size

555.01 MB

  
  - Why are they different?
Looks like they are considering all contigs when calculating the statistics instead of all contigs >500bp long, which is what abyss-fac does by default. Using 'abyss-fac -s1 …’ disables the length filter and should give you the same results.

   Abyss vs Tablet:
   (N50) 82,060 vs 70,023
   (If E-size stands for average, average contig length) 114516 vs 1317

Thank you in advance.




--
You received this message because you are subscribed to the Google Groups "ABySS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to abyss-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages