HIC file & Assembly file

281 views
Skip to first unread message

kevin mckernan

unread,
Sep 6, 2018, 1:43:27 PM9/6/18
to 3D Genomics
We have run Juicer and now have a hic file.
We cant seem to find the assembly file. If we load the assembly fasta we used as input for Juicer, we get an error in Juicebox.
Is there another assembly file we need to use?

Brian-Tyler St.Hilaire

unread,
Sep 6, 2018, 2:29:33 PM9/6/18
to kevin mckernan, 3D Genomics
Hey Kevin,

Just to be clear, you have run your Hi-C data through juicer using your PacBio Cannabis assembly?
You should now have two “.hic” maps, and a file called "merged_nodups.txt?”

Brian 

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/67133f01-088b-4b60-b3bb-d7270b0a126f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

kevin mckernan

unread,
Sep 6, 2018, 2:59:55 PM9/6/18
to Brian-Tyler St.Hilaire, 3D Genomics, Biao Liu
Yes.

This is the out put on the aligned directory. When we load this into juicebox, we get the contigs in numerical order which makes for an unappealing picture as we suspect they are not Ordered and orientated in that fashion biologically. I figured the assembly map would straighten this out?

Is that the merged_nodups.txt?


-rw-rw-r-- 1 ubuntu ubuntu   307240248 Sep  6 12:28 abnormal.sam
-rw-rw-r-- 1 ubuntu ubuntu           1 Sep  6 12:28 collisions.txt
-rw-rw-r-- 1 ubuntu ubuntu  1248910676 Sep  6 01:42 dups.txt
-rw-rw-r-- 1 ubuntu ubuntu         455 Sep  6 17:05 header
-rw-rw-r-- 1 ubuntu ubuntu   423683504 Sep  6 15:13 inter.hic
-rw-rw-r-- 1 ubuntu ubuntu        1870 Sep  6 12:28 inter.txt
-rw-rw-r-- 1 ubuntu ubuntu   233633825 Sep  6 16:21 inter_30.hic
-rw-rw-r-- 1 ubuntu ubuntu        1868 Sep  6 15:17 inter_30.txt
drwxrwxr-x 2 ubuntu ubuntu        4096 Sep  6 16:22 inter_30_contact_domains.txt
-rw-rw-r-- 1 ubuntu ubuntu       10512 Sep  6 15:17 inter_30_hists.m
-rw-rw-r-- 1 ubuntu ubuntu       11813 Sep  6 12:28 inter_hists.m
-rw-rw-r-- 1 ubuntu ubuntu 20492288575 Sep  6 01:42 merged_nodups.txt
-rw-rw-r-- 1 ubuntu ubuntu 21757534498 Sep  6 01:36 merged_sort.txt
-rw-rw-r-- 1 ubuntu ubuntu    16335247 Sep  6 01:42 opt_dups.txt
-rw-rw-r-- 1 ubuntu ubuntu        1110 Sep  6 12:24 stats_dups.txt
-rw-rw-r-- 1 ubuntu ubuntu        9485 Sep  6 12:24 stats_dups_hists.m
-rw-rw-r-- 1 ubuntu ubuntu   657054698 Sep  6 12:28 unmapped.sam

Brian-Tyler St.Hilaire

unread,
Sep 6, 2018, 3:20:21 PM9/6/18
to kevin mckernan, 3D Genomics, Biao Liu
Hey Kevin,

So you have run Juicer successfully on your data! Congrats. 
Now the next step is to use the 3D-DNA pipeline which will require the merged_nodups.txt, made by Juicer, and the fasta you used to run Juicer.  
3D-DNA will use the contact data in the merged_nodups.txt file to order and orient the contigs in the  fasta. 
3D-DNA will produce a new map and a complimentary “.assembly” file that will allow you to visualize the final assembly and, if necessary, manually correct any misassemblies that you see. 

Brian 

Olga Dudchenko

unread,
Sep 6, 2018, 3:54:04 PM9/6/18
to 3D Genomics
Kevin,

To add to Brian's response, here are the things you can do:

1) look at the .hic files juicer has built for you. These will presumably contain a lot of pieces: if I remember correctly you've mentioned a few thousand. This is a non-interactive version of the hic file (if the number of pieces is too big it will fail)

2) build an interactive version of the file. Run commands on page 5 of the cookbook to build an .assembly file and interactive version of the map.

3) if there is little work to be done you can interactively put things together using Juicebox Assembly Tools. See 
and 

4) If you want some automation use 3D-DNA to do some of the additional scaffolding for you.

Olga
Reply all
Reply to author
Forward
0 new messages