Converting .hic format to format required for HiTC

205 views
Skip to first unread message

Helen

unread,
Apr 8, 2019, 11:57:54 AM4/8/19
to 3D Genomics
I am interested in using the directionality index method to call TADs on my .hic matrices created with Juicer. I want to use HiTC but this requires a specific format. Is there a tool which I can use to convert the format? Alternatively does anyone have any advice about how best to achieve this?

Thank you for your help.

Best wishes,

Helen

Neva Durand

unread,
Apr 8, 2019, 12:14:28 PM4/8/19
to Helen, 3D Genomics
Hello,

It looks to me like it will take directly the format we output with Juicer Tools dump or Straw. 


In particular if you are using R, you might have a look at Straw in R.

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/f37813bf-b060-4796-9cd5-9e0ba82b42a6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Helen

unread,
Apr 8, 2019, 12:49:14 PM4/8/19
to 3D Genomics
Hi Neva,

Thank you for getting back to me, I think the format it is expecting is "triplet sparse format". I think that should look like this.

What is the relationship between the sparse matrix outputted by Juicer dump/straw and the triple sparse format here?

Best wishes,

Helen

Neva Durand

unread,
Apr 8, 2019, 1:27:44 PM4/8/19
to Helen, 3D Genomics
We output via Dump and Straw in a triplet sparse format. It looks the same to me but it might be best to ask to authors of HiTC.

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Helen

unread,
Apr 8, 2019, 6:30:41 PM4/8/19
to 3D Genomics
Hi Neva, 

Thank you. I just want to check that I understand the output of dump correctly. 

When I dump a matrix in dense format I get the whole matrix where every column and row can be labeled with the genomic bin coordinates? e.g for bin size 50kb the row and column labels would be:

chr1 0 50000
chr1 50000 100000
chr1 100000 150000
chr1 150000 200000
chr1 200000 250000
chr1 250000 300000
etc.

Where as if I output in sparse format I only get the information for none zero cells in the matrix e.g.

3000000 3000000 5961.3994
3000000 3050000 2094.4873
3050000 3050000 7540.7075
3000000 3100000 1124.7317
3050000 3100000 1941.9103
3000000 3150000 844.2783

However, I am confused about what the numbers in columns 1 and 2 correspond to, I assume that column 1 denotes the matrix column and column 2 denotes the matrix row? Do the numbers correspond to the start or end coordinate of the column/row label?

In the HiTC format I think they seem to have replaced the genomic bin coordinates with numbers so that e.g.

chr1 0 50000  1
chr1 50000 100000  2
chr1 100000 150000  3
chr1 150000 200000  4
chr1 200000 250000  5
chr1 250000 300000  6
etc. 

and the matrix is:
2	2	18.823747
2	1653	4.566572
2	2523	3.511228
3	84	26.194900
4	4	8.937747
4	5	8.464398
etc. 

Thank you for your help. 

Best wishes, 

Helen

On Monday, April 8, 2019 at 6:27:44 PM UTC+1, Neva Durand wrote:
We output via Dump and Straw in a triplet sparse format. It looks the same to me but it might be best to ask to authors of HiTC.

On Mon, Apr 8, 2019 at 12:49 PM Helen <h....@har.mrc.ac.uk> wrote:
Hi Neva,

Thank you for getting back to me, I think the format it is expecting is "triplet sparse format". I think that should look like this.

What is the relationship between the sparse matrix outputted by Juicer dump/straw and the triple sparse format here?

Best wishes,

Helen

On Monday, April 8, 2019 at 4:57:54 PM UTC+1, Helen wrote:
I am interested in using the directionality index method to call TADs on my .hic matrices created with Juicer. I want to use HiTC but this requires a specific format. Is there a tool which I can use to convert the format? Alternatively does anyone have any advice about how best to achieve this?

Thank you for your help.

Best wishes,

Helen

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.

Neva Durand

unread,
Apr 8, 2019, 6:54:24 PM4/8/19
to Helen, 3D Genomics

Do they not divide by chromosome?

In any case you can divide by the resolution, e.g.


juicer_tools dump observed NONE ${hicfile} $chr $chr BP $res | awk -v res=$res '{$1=($1/res)+1; $2=($2/res)+1; print}' > chr${chr}_${res}.txt

The bin location listed is the beginning of the bin.


To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/a27e81e3-309c-4a40-a21f-3ceca809d1fa%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Helen

unread,
Apr 9, 2019, 6:07:16 AM4/9/19
to 3D Genomics
Hi Neva,

Thank you, yes I think dividing by bin size would work.

I am not sure what you mean by divide by chromosome? Do you mean, do they operate on every intra and inter chromosomal contact matrix separately?

Best wishes,

Helen

On Monday, April 8, 2019 at 11:54:24 PM UTC+1, Neva Durand wrote:

Do they not divide by chromosome?

In any case you can divide by the resolution, e.g.


juicer_tools dump observed NONE ${hicfile} $chr $chr BP $res | awk -v res=$res '{$1=($1/res)+1; $2=($2/res)+1; print}' > chr${chr}_${res}.txt

The bin location listed is the beginning of the bin.


Neva Durand

unread,
Apr 9, 2019, 9:44:08 AM4/9/19
to Helen, 3D Genomics
I meant that one could in theory specify genomic coordinates in an absolute sense so that the beginning of chromosome 2 was the length of chromosome 1 + 1 and so on. Hopefully they don't do this because it makes life complicated.

To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/250f616b-ca13-47b9-8af2-960387bd438f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Helen

unread,
Apr 9, 2019, 1:03:44 PM4/9/19
to 3D Genomics
Ahh I see what you mean, I will contact the HiTC authors and find out. Thank you for your help!

Best wishes, 

Helen


On Tuesday, April 9, 2019 at 2:44:08 PM UTC+1, Neva Durand wrote:
I meant that one could in theory specify genomic coordinates in an absolute sense so that the beginning of chromosome 2 was the length of chromosome 1 + 1 and so on. Hopefully they don't do this because it makes life complicated.

Reply all
Reply to author
Forward
0 new messages