Fragment size vs insert size

717 views
Skip to first unread message

Iolan

unread,
Mar 29, 2017, 9:27:02 AM3/29/17
to HiC-Pro
Hi,

I am wondering if anyone can provide a clear definition of what is considered to be Fragment size and what is considered to be the Insert size. I am asking because, the plot of the Fragment size distribution in hic_results/pic/experiment/plotHiCFragmentSize seems to be made from this bit of code: with the comment histogram of insert size.

> ## Histogram of insert size
> allvalidpairs <- list.files(path=hicDir, pattern=paste0("^[[:print:]]*\\.validPairs$"), full.names=TRUE)
> stats_per_validpairs<- lapply(allvalidpairs, read.csv, sep="\t", as.is=TRUE, header=FALSE, row.names=1, nrow=100000)
> lv <- sapply(stats_per_validpairs, "[", 7)
> lv <- lapply(lv, function(x){as.numeric(x[which(x!="None" & ! is.na(x))])})
> allhist <- lapply(lv, hist, breaks=c(seq.int(from=0, to=1500, by=10), Inf), plot=FALSE)
> allcounts <- Reduce("+", lapply(allhist, "[[", "counts"))
> if (max(allcounts)>0){
+   mids <- allhist[[1]]$mids
+   mat<-data.frame(allcounts=allcounts, mids=mids)
+   mat[dim(mat)[1],2]<-1505
+   print(allcounts)
+   p2 <- plotDistanceHist(mat, sampleName, n=100000*length(allvalidpairs))
+   ggsave(filename=file.path(picDir, paste0("plotHiCFragmentSize_",sampleName,".pdf")), p2, width=7, height=5)
+ }

Thank you very much!

Best,
Ioana

nservant

unread,
Mar 29, 2017, 10:17:03 AM3/29/17
to HiC-Pro
Hi Loana

Indeed, both terms are commonly used in NGS, and usually refer to the same thing,
In HiC-Pro, this is the size of the DNA fragments before sequencing. Basically, the distance between the first bases of R1/R2.
But I agree that the term "fragment" might be confusing, as it could be related to the "restriction fragments" after digestion ...
Hope it helps
nicolas

Iolan

unread,
Mar 29, 2017, 3:33:45 PM3/29/17
to HiC-Pro
Thank you very much for your clarification. I thought it's worth mentioning it due to the Fragment and Insert size options in the config file and wanted to make sure we refer to the same thing.

Best,
Ioana

nservant

unread,
Mar 30, 2017, 4:01:38 AM3/30/17
to HiC-Pro
Yes, regarding the configuration file, "fragment" means "restriction fragment", after digestion. And "insert" means distance between R1/R2.
Best

Angelo Chan

unread,
Apr 5, 2017, 1:44:36 AM4/5/17
to HiC-Pro
Hi Nicholas,

So I noticed that with a 4-cutter restriction enzyme, the majority of the "fragment sizes" (from the results) sit between in a normal distribution in the 100 and 600 size range (the cutoffs I used for the "insert" min and max, also the default cutoffs), but with data generated by a 6-cutter, the "fragment sizes" are considerably bigger. 

I am unsure how the "fragment sizes" graphed by the histogram are calculated. Is it the distance from R1 to the digestion site of one fragment, added to the distance from R2 to the digestion site of the second fragment?



Regards,

Angelo

nservant

unread,
Apr 5, 2017, 3:53:02 AM4/5/17
to HiC-Pro
Yes, the insert size is calculated as the distance from R1 to the digestion site of one fragment, added to the distance from R2 to the digestion site of the second fragment.
I'm not sure to understand why the cutter would have an impact on the insert size distribution.
For sure, it does have an impact on the size of restriction fragments. But then I though that the ligation products were sheared before biotin pull down, in order to have insert sizes in the range of what we expect for PE sequencing (ie. around 300bp in average) ?

Angelo Chan

unread,
Apr 5, 2017, 4:02:49 AM4/5/17
to HiC-Pro
The data I'm looking at is with an older protocol, so they might not have that sonication step. It was pretty confusing for me as well. I'll rerun it with more relaxed cutoffs and share my results with everyone.

lixin.b...@gmail.com

unread,
Dec 12, 2018, 6:34:32 AM12/12/18
to HiC-Pro
Dear Angelo,
   I'm also used 4-cutter restriction enzyme in-situ HiC data, does your Hi-C data also in-situ? I am not sure how to set the MIN_FRAG_SIZE and MAX_FRAG_SIZE parameter. Can you give me some suggestions?

Best wishes

在 2017年4月5日星期三 UTC+8下午1:44:36,Angelo Chan写道:
Reply all
Reply to author
Forward
0 new messages