HiC contact map is too sparse

1,271 views
Skip to first unread message

Robin van der Weide

unread,
Mar 14, 2016, 11:32:32 AM3/14/16
to 3D Genomics
Hi there,

I made a hic-file using juicebox_tools pre of a Hi-C experiment.
When I try to use this file in HICCUP, it states that my data is too sparse.

What is the minimum amount of valid reads to use HICCUP? I now have  ~100M valid pairs

Thanks,

Robin

Neva Durand

unread,
Mar 14, 2016, 11:35:36 AM3/14/16
to Robin van der Weide, 3D Genomics
The minimum is 300M HiC contacts.

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/9c9922ca-b155-4667-8ffd-eeadf385c2a0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Robin van der Weide

unread,
Mar 14, 2016, 11:45:12 AM3/14/16
to 3D Genomics, robin...@gmail.com
Thanks, is there any way of circumventing/lowering this threshold? 

Neva Durand

unread,
Mar 14, 2016, 12:36:04 PM3/14/16
to Robin van der Weide, 3D Genomics
No; results are not meaningful for maps that aren't dense enough. 
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/19a66081-247d-4468-a18c-260ddf8f9222%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Robin van der Weide

unread,
Mar 15, 2016, 5:39:37 AM3/15/16
to 3D Genomics, robin...@gmail.com
Ok, is this treshold also in place if I only generate intra-contact .hic-files with pre -d?
This, because there would be a lot less reads in the file.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.

Neva Durand

unread,
Mar 15, 2016, 7:39:22 AM3/15/16
to Robin van der Weide, 3D Genomics
It's proportional to the chromosome, so should work fine with -d


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/26c800a7-ebbd-4e9e-b824-c587ad560586%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Vera

unread,
Mar 15, 2016, 9:26:09 AM3/15/16
to 3D Genomics, robin...@gmail.com
..related question:
I am trying to run arrowhead on .hic files that have a combined total number of contacts of 382million, but I still get the error message "HiC contact map is too sparse to run Arrowhead, exiting."

This map is for human hg19 - I am wondering if the 300M minimum contacts holds for any genome or if you can see any other reason why I get the error message? (or how to work around it)
Thanks,
Vera
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.

Neva Durand

unread,
Mar 15, 2016, 9:55:43 AM3/15/16
to Vera, 3D Genomics, Robin van der Weide
Right now the threshold is actually for the MAPQ >= 30 maps, since Arrowhead and HiCCUPs are only run on those.  This might explain why the threshold is failing.

The threshold is internally measured as the density of the matrices; we don't directly count the number of HiC contacts.  It empirically corresponds to 300M contacts of an hg19 map, but should scale appropriately with single chromosomes or different genomes.

We are discussing whether or not to allow the threshold to be overridden.

Best
Neva



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Muhammad Shamim

unread,
Jun 30, 2016, 1:52:26 AM6/30/16
to 3D Genomics, vera.b...@gmail.com, robin...@gmail.com
Btw there is now a --ignore_sparsity flag to force Arrowhead/HiCCUPS to run regardless of map density (but the user should keep in mind that not many loops/domains may be called).


On Tuesday, March 15, 2016 at 8:55:43 AM UTC-5, Neva Durand wrote:
Right now the threshold is actually for the MAPQ >= 30 maps, since Arrowhead and HiCCUPs are only run on those.  This might explain why the threshold is failing.

The threshold is internally measured as the density of the matrices; we don't directly count the number of HiC contacts.  It empirically corresponds to 300M contacts of an hg19 map, but should scale appropriately with single chromosomes or different genomes.

We are discussing whether or not to allow the threshold to be overridden.

Best
Neva
On Tue, Mar 15, 2016 at 9:26 AM, Vera wrote:
..related question:
I am trying to run arrowhead on .hic files that have a combined total number of contacts of 382million, but I still get the error message "HiC contact map is too sparse to run Arrowhead, exiting."

This map is for human hg19 - I am wondering if the 300M minimum contacts holds for any genome or if you can see any other reason why I get the error message? (or how to work around it)
Thanks,
Vera



On Tuesday, March 15, 2016 at 11:39:22 AM UTC, Neva Durand wrote:
It's proportional to the chromosome, so should work fine with -d

To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.

Shawn Bai

unread,
Jan 13, 2023, 1:04:01 PM1/13/23
to 3D Genomics
Hi, Neva. Now I downloaded a .hic from from 4DN. Is there a way to know how many valide pairs in this file? Thanks!

Truong Nguyen

unread,
Jan 22, 2024, 6:31:08 PM1/22/24
to 3D Genomics
You just simply type -> wc -l input file.hic <-. You will see the number of lines output. That is the valid pairs of your hic file. But the valid pairs in the hic file are much lower compared to the file.pairs. You can use the same command to check the valid pairs in your pairs file. Hope it helps.

Ragini Mahajan

unread,
Jan 22, 2024, 6:46:46 PM1/22/24
to 3d-ge...@googlegroups.com
Hi Shawn, 

You can use the following command and it will print out stats for the .hic file, those should tell you the read depth, hi-c contacts, etc. 

head -n 30 <.hic file>



--
Ragini Mahajan
PhD student | Onuchic Group / Aiden Lab
Biochemistry & Cell Biology | Dept of BioSciences
Rice University, Houston, TX
Reply all
Reply to author
Forward
0 new messages