Effective genome size for rat and pig

Abhishek Singh

unread,

Aug 2, 2016, 5:06:52 AM8/2/16

to MACS announcement

Hi,

I am working with rat and pig genome.

I need to know what shall I use as the effective genome size for rat and pig as these are not mentioned in the help section of macs2.

Thank you.

Best regards
Abhishek

Ian

unread,

Aug 17, 2016, 4:16:04 AM8/17/16

to MACS announcement

A rule of thumb is that 20-25% of the genome is unmappable due to repetitive sequence, but this does depend on the genome!

John Urban

unread,

Aug 17, 2016, 7:52:41 AM8/17/16

to macs-ann...@googlegroups.com

To add to that -- you can probably use the sum of the sequence lengths or 0.8*sum_lengths (for example) for genome size (G) without it making a big difference in peak calling. If I recall correctly,G is used to calculate the global lambda as well as to calculate the number of redundant reads allowed to pile up at one site. Global lambda is proportional to number_reads/G -- so slightly smaller effective genome size estimates will result in a slightly more conservative set of peaks (or slightly narrower width of their coordinates) since it will result in a bigger global lambda (but only where global lambda would be chosen over local lambdas). As for calculating the maximum number of duplicated tags at a site, using a genome size between 0.75*G to G will almost certainly give the same result -- and if not, then only a difference of 1 read per site. If you are doing "--keep-dup 1" (I believe this is default), then genome size will not affect how many are kept at all -- so it would only affect the global lambda.

Reply all

Reply to author

Forward