Interchromosomal KR normalization of IMR90 Hi-C maps missing

523 views
Skip to first unread message

Louis Cammarata

unread,
Jan 30, 2022, 10:53:32 PM1/30/22
to 3D Genomics
Dear Aiden Lab,

I am currently using IMR90 interchromosomal data at 250kb resolution from Rao et al. 2014. I downloaded the data from the corresponding GEO repository (GSE63525). I would like first of all to thank you for this amazing piece of work and all of the great resources it contains.

According to section II.b.4 of the Extended Experimental Procedures of the paper, I would like to normalize the interchromosomal matrices using KR normalization done on the entire genome-wide contact matrix without intrachromosomal contact matrices (i.e., only interchromosomal contact matrices).

The normalization vector should be called [CHROM]_[RESOLUTION].INTERKRnorm. However, when I download the IMR90 data, this normalization vector is not present, while it is present for other cell types (GM12878, for example).

Could you please tell me whether it is possible to access the .INTERKRnorm normalization files for IMR90 if you have them available? If you do not have them available, could it be possible for you to share the code that you used to obtain the .INTERKRnorm normalization for GM12878 so that I can run it on the IMR90 data? I have tried using Juicer to do this, but to my understanding it only allows to normalize interchromosomal matrices one by one, as opposed to all at once.

I thank you very much for your consideration and your help on this!
Best,
Louis


Muhammad Shamim

unread,
Jan 30, 2022, 11:00:48 PM1/30/22
to 3D Genomics
If you're ok with GRCh38/hg38 files, ENCODE has these datasets reprocessed and IMR-90 should include INTER_SCALE (replaces INTER_KR) for lower resolutions: https://www.encodeproject.org/experiments/ENCSR852KQC/
If it needs to be hg19, you could rebuild the files with EMT, which should add INTER_SCALE normalization: https://github.com/sa501428/hic-emt

Best,
- Muhammad S Shamim

Moshe Olshansky

unread,
Jan 31, 2022, 8:00:58 PM1/31/22
to 3D Genomics
Hi Muhammad,

I think that their GW matrices include INTRA as well.

Louis Cammarata

unread,
Jan 31, 2022, 10:29:08 PM1/31/22
to 3D Genomics
Thank you Muhammad and Moshe for your responses here! I am basically interested in looking at the proximity of a set of genes (belonging to different chromosomes) in Hi-C. This is why I would like to use the INTERKR.norm at resolution 250kb. I asked the question to Muhammad offline, but I feel that it is maybe better to mention it here in case someone is wondering about this in the future.

It turns out that I need hg19 files, so I will have to rebuild the files using EMT. However, I am not sure that I fully grasp the different steps to get the INTERKR.norm files for each chromosome. My understanding is that I should

1. Obtain the .hic files for each chromosome pair
For this part, my understanding is that I can download the processed hg19 .hic data for replicates 1 and 2 from https://www.encodeproject.org/experiments/ENCSR852KQC/ (file with reference ENCFF184QHM). 

2. Stitch the files for each chromosome pair (excluding intrachromosomal pairs) using Hi C-EMT as follows.
For this part, there is something that I am not understanding properly. For now, I have only one big combined .hic file, ENCFF184QHM.hic. This file, if I understand correctly, contains both intrachromosomal and interchromosomal interactions. What hic_emt.jar function should I use to remove the intrachromosomal interactions from this file? I thought at first that I could stitch together all interchromosomal maps using the stitch command, but since I only have one big .hic file here, I am not sure how to proceed.

3. Normalize the files with KR normalization using the following Juicer command (from https://github.com/aidenlab/juicer/wiki/Data-Extraction)
java   -jar.  juicer_tools.jar.  dump   norm.  KR   [combined_interX_file.hic]   [chromosome]    BP   250000.  [KRnorm_file.txt]
Something I am not sure about here is whether I can do this command for the entire combined interchromosomal matrix at once in order to get the INTERKR.norm vector. It seems that the command requires a [chromosome] argument, which I would like to not specify in order to process the full matrix at once. Could you please tell me how I should proceed here?

Thank you very much for your help!
Best,
Louis

Muhammad Shamim

unread,
Jan 31, 2022, 10:31:04 PM1/31/22
to 3D Genomics
You can just download the hg19 .hic file and use the excise tool in EMT. Specify the highest resolution you are interested in. No need to separately call addnorm. INTER_SCALE and GW_SCALE (replaces INTER_KR and GW_KR) normalizations will automatically be built up to 25kb.

Louis Cammarata

unread,
Feb 2, 2022, 5:04:21 PM2/2/22
to 3D Genomics
Thank you for clarifying that Muhammad! I would like to ask you one more question, as I am having trouble running the excise function on the data. Basically, I have cloned the HiC-EMT repo from https://github.com/sa501428/hic-emt and then followed the instructions from the Readme. But the excise command does not run, and I am not sure what the hic_emt.jar archive refer to as I haven't found this archive within the directory (the when I try to run he command, it tells me that that the main class hic_emt.jar cannot be found). Would you have any advice on how to proceed to run the function after I cloned the repo onto my server? Thank you very much!
Best,
Louis

Muhammad Shamim

unread,
Feb 2, 2022, 5:06:02 PM2/2/22
to 3D Genomics
You don't need to clone the repo.
Just download the jar from the releases: https://github.com/sa501428/hic-emt/releases

Louis Cammarata

unread,
Feb 3, 2022, 2:37:42 AM2/3/22
to 3D Genomics
Hi Muhammad,
Thanks, this makes sense! I was able to download the .jar and to run it, however it does not output the normalization vectors. In the output folder, I obtain two files:
- custom.hic
- custom.mnd.txt

Looking into the second file, it contains the HiC data in spare matrix format, where the columns are [chromosome1, locus1, chromosome2, locus2, contact_value]. I compared these values with the .RAWobserved values from the GEO, and they are the same, which shows that no normalization has been implemented. Digging a bit more into the code, I see that in the hic_emt.jar, the emt/main/Excision.class file has a field /NormalizationHandler;^H^@-^A^@^DNONE. I tried to replace it by /NormalizationHandler;^H^@-^A^@^KstrINTER_KR, but the program then does not run.

Could you please let me know how I should proceed here? I am not very familiar with JAVA, and I am not sure what I should be editing in the code in order to obtain the INTER_KR normalization. I was hoping  of obtaining a file containing the INTER_KR normalization constants for each locus in each chromosome, like the ones you provided for GM12878 on the GEO. Thank you very much for your help on this!
Best,
Louis

Muhammad Shamim

unread,
Feb 3, 2022, 2:41:38 AM2/3/22
to 3D Genomics
The custom.hic file should have all the data, including normalized contacts.
Have you tried viewing it in juicebox or printing out the data with straw?
You may want to go through this tutorial here: https://aidenlab.gitbook.io/juicebox/

- Muhammad S Shamim

On Thursday, February 3, 2022 at 1:37:42 AM UTC-6  wrote:
Hi Muhammad,
Thanks, this makes sense! I was able to download the .jar and to run it, however it does not output the normalization vectors. In the output folder, I obtain two files:
- custom.hic
- custom.mnd.txt

Looking into the second file, it contains the HiC data in spare matrix format, where the columns are [chromosome1, locus1, chromosome2, locus2, contact_value]. I compared these values with the .RAWobserved values from the GEO, and they are the same, which shows that no normalization has been implemented. Digging a bit more into the code, I see that in the hic_emt.jar, the emt/main/Excision.class file has a field /NormalizationHandler;^H^@-^A^@^DNONE. I tried to replace it by /NormalizationHandler;^H^@-^A^@^KstrINTER_KR, but the program then does not run.

Could you please let me know how I should proceed here? I am not very familiar with JAVA, and I am not sure what I should be editing in the code in order to obtain the INTER_KR normalization. I was hoping  of obtaining a file containing the INTER_KR normalization constants for each locus in each chromosome, like the ones you provided for GM12878 on the GEO. Thank you very much for your help on this!
Best,
Louis

On Wednesday, February 2, 2022 at 5:06:02 PM UTC-5  wrote:
You don't need to clone the repo.
Just download the jar from the releases: https://github.com/sa501428/hic-emt/releases

On Wednesday, February 2, 2022 at 4:04:21 PM UTC-6  wrote:
Thank you for clarifying that Muhammad! I would like to ask you one more question, as I am having trouble running the excise function on the data. Basically, I have cloned the HiC-EMT repo from https://github.com/sa501428/hic-emt and then followed the instructions from the Readme. But the excise command does not run, and I am not sure what the hic_emt.jar archive refer to as I haven't found this archive within the directory (the when I try to run he command, it tells me that that the main class hic_emt.jar cannot be found). Would you have any advice on how to proceed to run the function after I cloned the repo onto my server? Thank you very much!
Best,
Louis

Louis Cammarata

unread,
Feb 4, 2022, 6:23:05 PM2/4/22
to 3D Genomics
Hi Muhammad,
Thank you for your response! I have tried loading the data into Juicebox, but unfortunately as importing the custom.hic file results in an error. Basically, here is the pipleine I have been using:

- Download the file GSE63525_IMR90_combined.hic from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525
- Download the v1.1 release of hic_emt.jar from https://github.com/sa501428/hic-emt/releases
- Run the following command with resolution 2,500,000 (instead of 250,000, to save time): java -Xmx5g -jar hic_emt.jar excise -r 2500000 GSE63525_IMR90_combined_30.hic output_folder/
- I obtain two files in the output folder: custom.hic (2.1M) and custom.mnd.txt (20M)
- Looking at custom.mnd.txt, there does not need to be any normalization array included, and trying to load custom.hic into Juicebox yields an error. I tried to load directly GSE63525_IMR90_combined_30.hic into Juicebox and it worked, which makes me think that the custom.hic file may have a problem. 

However, when I downloaded locally GSE63525_IMR90_combined_30.hic and opened it into Juicebox, it seems that this files does contain a so-called "inter balanced" normalization. Is this normalization the same as INTER_KR? If it is the case, I could probably get the norm vectors for each chromosome directly from this .hic file. Thank you very much!

Best,
Louis

Muhammad Saad Shamim

unread,
Feb 4, 2022, 6:34:42 PM2/4/22
to 3D Genomics
A few things:
- custom.mnd.txt contains ~raw data binned at the specified resolution, so in your case it's a dump of the contacts at 250kb (or 2.5MB depending on the flag used) without any normalization. It's designed as an input for pre and would not contain any normalization info at all. So that is 100% expected behavior.
- EMT will build the custom.hic file using the latest version 9 hic format, which is not compatible with older versions of Juicebox. You can use aidenlab.org/juicebox or download a newer version of Juicebox Desktop from Github Releases e.g. https://github.com/aidenlab/Juicebox/releases/tag/v.2.13.07. If you still get a bug opening the file, please create a bug report on Github Issues under the EMT repo.
- Yes you could get the norm vectors directly from the file. But why do you want the normalization vector? If I understood your use case correctly, you'll want the normalized contacts directly. That you can get directly in Juicebox (if just a few loci) or via straw (github.com/aidenlab/straw).



--
You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/5Aj_HdLNhKo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/6c410ca9-4b3a-4c42-b4b0-cbada1f99bc3n%40googlegroups.com.

Louis Cammarata

unread,
Feb 8, 2022, 11:40:37 PM2/8/22
to 3D Genomics
Hi Muhammad,
Thanks again very much for your reply, and sorry for replying a bit late. This makes a lot of sense, I just did as you said and it now works well. I just had a last question regarding the normalizations that you would recommend, as I am not sure which one to use based on what I read from Rao et al. 2014.
- Would you recommend INTER_BALANCED, GENOMEWIDE_BALANCED, or are these normalizations roughly equivalent?
- If I use  GENOMEWIDE_BALANCED, I am allowed to compare the Hi-C values between intra and interchromosomal maps, while I cannot do this if I use INTER_BALANCED, right? So should I just use GENOMEWIDE_BALANCED if I want to compare values within and between inter- and intra-chromosomal maps?

Thank you very much for all your help, I really appreciate it!
Best,
Louis
Reply all
Reply to author
Forward
0 new messages