I am a PhD student at the University of Ottawa researching functional implications of chromatin spatial arrangement, and was hoping to ask you a few questions regarding the HiC data generated by your group for the publication Cohesin loss eliminates all loop domains (Rao et al. 2018). I am looking to use the data you have made available via ENCODE for my research, but have run into some issues trying to parse the various file names, ENCODE accessions, replicates, and treatment conditions. What I’m looking for are the raw sequencing files (.fastq) for the untreated, treated, and treated+180min withdrawal samples that were used in figure 2. Going off of the supplemental spreadsheet that was included with the paper (attached), I believe the cases (libraries?) I need are as follows:
Rao-2017-HIC001
Rao-2017-HIC002
Rao-2017-HIC003
Rao-2017-HIC004
Rao-2017-HIC005
Rao-2017-HIC006
Rao-2017-HIC007
Rao-2017-HIC008
Rao-2017-HIC009
Rao-2017-HIC010
Rao-2017-HIC011
Rao-2017-HIC012
Rao-2017-HIC013
Rao-2017-HIC014
Rao-2017-HIC044
Rao-2017-HIC045
Rao-2017-HIC046
Rao-2017-HIC047
Looking through the fastq files available on ENCODE, it’s not clear to me which files correspond with which cases outlined here. I am looking at the files at this link https://www.encodeproject.org/experiments/ENCSR152HRS/ , and although I can see that these files have names that seem to match the conventions of the supplementary file, All 80 of the provided files are labelled either HIC045 or HIC046. Overall, I’m just wondering if you could provide some guidance for how to relate the files available for download with the Library IDs provided in the supplemental table. Any help with this matter would be much appreciated.
Furthermore, I was wondering how exactly your group went about amalgamating these raw sequencing files. For example, Rao-2017-HIC001 through Rao-2017-HIC007 are spread across two replicates for the same treatment condition, but have all of their reads summed in the “TOTAL” row. So my question is, were all seven of these files combined for the analysis and creation of Figure 2? Were the replicates kept separate at all, or were all of the reads combined to get the maximum number of reads/ contacts in the final analysis?
Thank you for your time in reading this and helping me with my questions. If you have any clarifying questions or comments for me, please do not hesitate to reach out.