Converting from HIC file to BEDPE

Joaquin Reyna

unread,

Apr 29, 2021, 4:00:12 PM4/29/21

to 3D Genomics

Hi,

I am currently using straw to dump my HIC data but I don't know how to convert the indexes to chr, start, end. So far my code looks like this:

Code:

import straw

import pandas as pd

fn = 'my.hic'

strawObj = straw.straw(fn)

matrixObj = strawObj.getNormalizedMatrix('1', '2', 'NONE', 'FRAG', 1)

result = matrixObj.getDataFromBinRegion(0,10000,0,10000)

for i in range(len(result[0])):

print("{0}\t{1}\t{2}".format(result[0][i], result[1][i], result[2][i]))

Output:

HiC version: 8

3635 961 1

3938 2848 1

4807 2558 1

5790 2448 1

4748 4152 1

5268 4731 1

7726 4228 1

9686 4166 1

6148 8607 1

7292 8360 1

4502 1397 1

623 3852 1

4011 3698 1

7317 3042 1

8695 5287 1

9202 7337 1

8047 9051 1

My intuition is telling me that the restriction enzyme file is the key. So for each line of the restriction enzyme file we have chrName + restriction enzyme cut sites, are the indexes for the chr1 loci and chr2 loci sequentially given? For example:

chr1 1000 2346 3031 ... 249000000

index 1 2 3 ... n

chr2 1022 2776 4443 ... 247000000

index 1 2 3 ... m

My other intuition is that the indexes for chr1 start and 1 up to n and similarly for chr2, 1 to m, since they are different columns it would make sense to restart especially given since some of the values in the chr1 column are larger then some of the values in the chr2 column.

If I apply this conversion then I would get part of the way to constructing a BEDPE file, the last part would be to just get the start and end which should just be start = position of previous cut and end = position of current cut.

Does this sound correct? Thanks for the help!

Joaquin

Neva Durand

unread,

Apr 29, 2021, 4:06:03 PM4/29/21

to Joaquin Reyna, 3D Genomics

Is there a reason you are extracting as fragment instead of at BP resolution?

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/4611be88-6215-492c-b7c7-7002e039562cn%40googlegroups.com.

--

Neva Cherniavsky Durand, Ph.D. | she, her, hers

Assistant Professor | Molecular and Human Genetics

Aiden Lab | Baylor College of Medicine

www.aidenlab.org

Joaquin Reyna

unread,

Apr 29, 2021, 4:07:56 PM4/29/21

to 3D Genomics

Yes, my colleague made a tool that requires restriction level information, otherwise I would use BP resolution.

Neva Durand

unread,

Apr 29, 2021, 4:09:11 PM4/29/21

to Joaquin Reyna, 3D Genomics

Did you also bin at BP resolution initially or does the hic file not contain BP resolutions?

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/b9096736-0d8a-4f02-b23a-f8a6052a75d4n%40googlegroups.com.

Joaquin Reyna

unread,

Apr 29, 2021, 4:11:36 PM4/29/21

to 3D Genomics

The HIC file contains both FRAG and BP resolutions.

Neva Durand

unread,

Apr 29, 2021, 4:17:05 PM4/29/21

to Joaquin Reyna, 3D Genomics

I see, but the BP resolution is not precise enough?

Indeed the bin you're seeing is the index from the restriction site file. So

3635 961 1

This is chr1 index 3635 chr2 index 961 has counts 1.

So yes, everything you said is correct.

To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/15d999e7-4956-4d76-8c53-27f788c91894n%40googlegroups.com.

Joaquin Reyna

unread,

Apr 29, 2021, 4:26:46 PM4/29/21

to 3D Genomics

The BP resolution is definitely precise however my colleague's tool uses the sizes of restriction enzymes cut sites in his algorithm (without variance in the sizes of bins the algorithm cannot run, in other words BP bins have no variance): https://github.com/ay-lab/HiCnv
If everything I said is correct then I'll go ahead and convert from HIC to BEDPE with the logic from above.

Thank you!

Reply all

Reply to author

Forward