Coordinate format in text file for Juicer tools pre

167 views
Skip to first unread message

Robert Schöpflin

unread,
Apr 12, 2019, 2:53:22 PM4/12/19
to 3D Genomics
Dear group,

I was wondering, if the genomic coordinates (positions) in the textfile for the Juicer_tools-pre command are 0-based? From the behavior of Juicer_tools pre it seems to be 0-based. However, in the file merged_nodups.txt, which is produced by the Juicer-pipeline and which can be also used for the pre-command, I have the impression that the coordinates are 1-based. I did not find this information in the documentation. Sorry, if I have overlooked it in the wiki.

Best wishes

Robert

Neva Durand

unread,
Apr 13, 2019, 6:34:23 AM4/13/19
to Robert Schöpflin, 3D Genomics
Binning is done by dividing the position by the resolution. So reads 1-4999 will be in first 5kb bin, 5000-9999 in second, and so on. The position is given by BWA which is 1-based. 

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/87548bad-b517-4b52-8158-40fcc55a1891%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Robert Schöpflin

unread,
Apr 13, 2019, 10:00:22 AM4/13/19
to 3D Genomics
Dear Neva,

thanks for your response. Does it mean, that when having 5kb resolution that the first bin has a length of 4999 bp (1-4999), but all following bins have a length of 5000bp (5000-9999, ...)?

The intervals in the output of Juicer dump start at 0 (suggesting 0-based intervals), but if I understand correctly, they also refer to 1-4999, 5000-9999, because the coordinate of reads starts at 1.

It appears like having 1-based coordinates, but 0-based intervals. Would it be an alternative to make all bins the same length and use 1-based intervals from 1-5000, 5001-10000 (like displayed in Juicebox) or use 0-based coordinates instead and have 0-based intervals (0-4999, 5000-9999)?


Best wishes

Robert

Neva Durand

unread,
Apr 13, 2019, 10:29:51 AM4/13/19
to Robert Schöpflin, 3D Genomics
I suppose there’s a slight discrepancy at the first bin but it doesn’t matter in practice, at least for human, due to the telomeres - the first possible read is at 10000. 

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Robert Schöpflin

unread,
Apr 13, 2019, 3:01:23 PM4/13/19
to 3D Genomics
 Dear Neva,

thanks for your response. I think in contigs of genome assemblies reads can also map to the first base. Additionally, the intervals of the bins are displayed in Juicebox as 1-5000, 5001-10000, ..., i.e. with an offset of 1 compared to how the binning is working.

I think using 0-based coordinates in the Juicer-Tool pre text files would solve these things:
(1) All bins would have the same size, going from 0-4999, 5000-9999, ... (0-based in Juicer tools and the hic-file)
(2) The intervals would fit to the notion in Juicebox (where bins are displayed in the one-based manner), the offset would disapear


Best wishes,

Robert
Reply all
Reply to author
Forward
0 new messages