hdf5 format to hic format

1,032 views
Skip to first unread message

Berkley Gryder

unread,
Oct 14, 2016, 11:44:36 AM10/14/16
to 3D Genomics
Is there a way to transform hdf5 format to hic format using Juicer, or do I have to start from fastq?

Neva Durand

unread,
Oct 14, 2016, 12:09:57 PM10/14/16
to Berkley Gryder, 3D Genomics
Hi Berkley,

You can use Juicebox command line tools to convert hic contacts into a hic file. Is your data already binned?  The command "pre" takes a list of contacts and can also take binned input as data. If your data is already binned, you need to send in the resolution with -r and the bin numbers need to correspond to the resolution. 

Best
Neva


On Friday, October 14, 2016, Berkley Gryder <berkley...@gmail.com> wrote:
Is there a way to transform hdf5 format to hic format using Juicer, or do I have to start from fastq?

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/0faa9c65-28a3-4371-bbe0-a1114fd3e755%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Berkley Gryder

unread,
Oct 14, 2016, 12:19:09 PM10/14/16
to 3D Genomics, berkley...@gmail.com
Neva,

Thanks so much!  I'm new to this, and was trying to make good use of the new ENCODE 3 data sets here: https://www.encodeproject.org/experiments/ENCSR549MGQ/
which have processed "hdf5" files which are listed as "chromatin interactions".  Does that mean it is "already binned" as you say?  How are these files compared to the .hic filetype you're using?


Thank you,
Berkley

Neva Durand

unread,
Oct 17, 2016, 9:06:11 AM10/17/16
to Berkley Gryder, 3D Genomics
Hi Berkley,

HDF5 is a generalized file container.  It's like XML, in that it's self-describing.  I'm not sure exactly how these Hi-C contacts were stored; you'd have to explore using some of the HDF5 libraries out there (here's a link: https://support.hdfgroup.org/HDF5/whatishdf5.html ).

The .hic file format is just the binned contact maps at multiple resolutions, compressed, and stored in such a way that it's easy to jump to a particular location and zoom level.  Jim Robinson (of IGV) created it; it has a lot in common with the TDF format used in IGV.  There's a lot more detail in the supplementary material of our paper: http://www.cell.com/cell-systems/fulltext/S2405-4712(16)30219-8

You can read from .hic files using the Juicebox command line tools; they also work via URL, so you don't have to download the file.  Use the "dump" command to read and the "pre" command to create.
(More documentation coming soon!)

You can also use straw to extract binned reads from local files quickly, and to programmatically get access to the data: https://github.com/theaidenlab/straw/wiki

If you end up writing a script to extract this data, let us know!  We've been adding to our public repository of Hi-C experiments and this would be great to have.

Best
Neva



For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages