Converting from HIC file to BEDPE

627 views
Skip to first unread message

Joaquin Reyna

unread,
Apr 29, 2021, 4:00:12 PM4/29/21
to 3D Genomics
Hi,

I am currently using straw to dump my HIC data but I don't know how to convert the indexes to chr, start, end. So far my code looks like this:
Code:
import straw
import pandas as pd 
fn = 'my.hic'
strawObj = straw.straw(fn)
matrixObj = strawObj.getNormalizedMatrix('1', '2', 'NONE', 'FRAG', 1)

result = matrixObj.getDataFromBinRegion(0,10000,0,10000)
for i in range(len(result[0])):
    print("{0}\t{1}\t{2}".format(result[0][i], result[1][i], result[2][i]))

Output:
HiC version:  8
3635 961 1
3938 2848 1
4807 2558 1
5790 2448 1
4748 4152 1
5268 4731 1
7726 4228 1
9686 4166 1
6148 8607 1
7292 8360 1
4502 1397 1
623 3852 1
4011 3698 1
7317 3042 1
8695 5287 1
9202 7337 1
8047 9051 1

My intuition is telling me that the restriction enzyme file is the key. So for each line of the restriction enzyme file we have chrName + restriction enzyme cut sites, are the indexes for the chr1 loci and chr2 loci sequentially given? For example:

chr1    1000    2346    3031   ... 249000000
index      1           2         3       ...            n

chr2    1022    2776    4443   ... 247000000
index      1           2         3       ...            m

My other intuition is that the indexes for chr1 start and 1 up to n and similarly for chr2, 1 to m, since they are different columns it would make sense to restart especially given since some of the values in the chr1 column are larger then some of the values in the chr2 column.  

If I apply this conversion then I would get part of the way to constructing a BEDPE file, the last part would be to just get the start and end which should just be start = position of previous cut and end = position of current cut. 

Does this sound correct? Thanks for the help!

Joaquin






Neva Durand

unread,
Apr 29, 2021, 4:06:03 PM4/29/21
to Joaquin Reyna, 3D Genomics
Is there a reason you are extracting as fragment instead of at BP resolution?

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/4611be88-6215-492c-b7c7-7002e039562cn%40googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

Joaquin Reyna

unread,
Apr 29, 2021, 4:07:56 PM4/29/21
to 3D Genomics
Yes, my colleague made a tool that requires restriction level information, otherwise I would use BP resolution. 

Neva Durand

unread,
Apr 29, 2021, 4:09:11 PM4/29/21
to Joaquin Reyna, 3D Genomics
Did you also bin at BP resolution initially or does the hic file not contain BP resolutions?

Joaquin Reyna

unread,
Apr 29, 2021, 4:11:36 PM4/29/21
to 3D Genomics
The HIC file contains both FRAG and BP resolutions.

Neva Durand

unread,
Apr 29, 2021, 4:17:05 PM4/29/21
to Joaquin Reyna, 3D Genomics
I see, but the BP resolution is not precise enough?

Indeed the bin you're seeing is the index from the restriction site file. So 

3635 961 1

This is chr1 index 3635 chr2 index 961 has counts 1.

So yes, everything you said is correct.

Joaquin Reyna

unread,
Apr 29, 2021, 4:26:46 PM4/29/21
to 3D Genomics
The BP resolution is definitely precise however my colleague's tool uses the sizes of restriction enzymes cut sites in his algorithm (without variance in the sizes of bins the algorithm cannot run, in other words BP bins have no variance): https://github.com/ay-lab/HiCnv 
If everything I said is correct then I'll go ahead and convert from HIC to BEDPE with the logic from above. 

Thank you!
Reply all
Reply to author
Forward
0 new messages