Hi,
I am currently using straw to dump my HIC data but I don't know how to convert the indexes to chr, start, end. So far my code looks like this:
Code:
import straw
import pandas as pd
fn = 'my.hic'
strawObj = straw.straw(fn)
matrixObj = strawObj.getNormalizedMatrix('1', '2', 'NONE', 'FRAG', 1)
result = matrixObj.getDataFromBinRegion(0,10000,0,10000)
for i in range(len(result[0])):
print("{0}\t{1}\t{2}".format(result[0][i], result[1][i], result[2][i]))
Output:
HiC version: 8
3635 961 1
3938 2848 1
4807 2558 1
5790 2448 1
4748 4152 1
5268 4731 1
7726 4228 1
9686 4166 1
6148 8607 1
7292 8360 1
4502 1397 1
623 3852 1
4011 3698 1
7317 3042 1
8695 5287 1
9202 7337 1
8047 9051 1
My intuition is telling me that the restriction enzyme file is the key. So for each line of the restriction enzyme file we have chrName + restriction enzyme cut sites, are the indexes for the chr1 loci and chr2 loci sequentially given? For example:
chr1 1000 2346 3031 ... 249000000
index 1 2 3 ... n
chr2 1022 2776 4443 ... 247000000
index 1 2 3 ... m
My other intuition is that the indexes for chr1 start and 1 up to n and similarly for chr2, 1 to m, since they are different columns it would make sense to restart especially given since some of the values in the chr1 column are larger then some of the values in the chr2 column.
If I apply this conversion then I would get part of the way to constructing a BEDPE file, the last part would be to just get the start and end which should just be start = position of previous cut and end = position of current cut.
Does this sound correct? Thanks for the help!
Joaquin