Juicebox Assembly Tools With External Assembly

726 views
Skip to first unread message

Michael Alonge

unread,
Jun 27, 2018, 1:54:04 PM6/27/18
to 3D Genomics
Hi There,

I have a chromosome-scale assembly that I would like to touch up with Juicebox assembly tools. Here's what I have so far:

1. A fasta file containing 13 sequences corresponding to 13 chromosomes (genome size ~ 800 Mbp).
2. A .hic file of Hi-C links for this assembly. It is in the "medium" format and was created using BWA and the juicer pre tool. Also, I filtered it so as to only contain links on the same chromosome (diagonal). 

With this .hic file, I use Juicebox to visualize the Hi-C heatmaps, but I would also like to make edits to the assembly with JBAT. As I understand it, I will also need an additional ".assembly" file. What exactly is the format for this file?

I do have information about how underlying scaffolds were ordered and oriented to create the chromosomes if that is needed.

I would just like to know how I can put this info into the necessary files in order to use JBAT

Thank you,
Mike Alonge
Message has been deleted

Michael Alonge

unread,
Jun 27, 2018, 2:22:31 PM6/27/18
to 3D Genomics
If I had to guess I would say that the first part of the file is space delimited with 3 columns:

1. contig header
2. unique id
3. contig length

with one row for each contig. Then, after this, one line associated with each chromosome with the space delimited order of the contigs (using their unique id). 

Is this a correct interpretation? Also, does this account for padding between contigs?

Thanks

Olga Dudchenko

unread,
Jun 27, 2018, 10:33:38 PM6/27/18
to 3D Genomics
Hello Michael,

It seems that the most immediate thing you might want to do is to consider your current chromosome-length assembly as draft, visualize it in JBAT-compatible form and generate an associated .assembly file. These can be done using our 3D-DNA pipeline. See Genome assembly cookbook, page 5: Example command for running 3D-DNA on a draft genome assembly and Visualize candidate assembly. I did not quite get from your description but you will need a full-format merged_nodups.txt file for this which is a typical output from Juicer. It seems you have some experience with these files since you are using 'pre' but let me know if you need more help here.

(Note that I would not recommend to filter out inter-chromosomal contacts: if anything, this will prevent you to seeing any interchromosomal misassemblies that might be in your assembly!)

In this approach your chromosomes are going to be your original input scaffolds. If you want to go one layer below to original scaffolds that were used to create the chrom-length scaffolds you'll have to do a bit more work and also realign your Hi-C data to the intput scaffolds: the alignment to final chromosomes is not going to be useful here. At that point you might as well rerun the whole of our pipeline: juicer -> 3d-dna -> JBAT.

Your general description of contigs is more or less correct but does not account for any splitting of original contigs that might have happen (are you sure no editing has been performed on original contigs?). You will have to create separate contigs for gaps if you have any added I am afraid. Again, might be better to run the pipeline rather than to work this back.

In addition to the cookbook, you can find some potentially useful JBAT-related resources on aidenlab.org/assembly.

Best,
Olga

Michael Alonge

unread,
Jun 28, 2018, 10:31:36 AM6/28/18
to 3D Genomics
Hi Olga,

Thanks for the help. So If I understand it correctly, I should follow the following steps replacing "draft.fa" with my chromosomes.

1. Steps to run juicer on Hi-C data (to produce the nodups.txt file)
2. Example command for running 3D-DNA on a draft genome assembly
3. Visualize Candidate assembly.

And I could also do this with the original scaffolds if I wanted to go back one step in the assembly process.

Thanks for your help.
Mike

Olga Dudchenko

unread,
Jun 28, 2018, 5:04:10 PM6/28/18
to 3D Genomics
Hi Mike,

If you would follow route 1 I would suggest:

1) Find a merged_nodups.txt file which you used to run juicer pre. Preferably if you have it, roll back to the one not filterd to be intrachromosomal. If you are in a non 16-column format convert into 16 column format.
(If I misunderstood and you do not have that merged_nodups.txt file than yes, you want to rerun Juicer against your current genome to generate an appropriate merged_nodups.txt file)
2) Generate a .assembly file from your current chromosomes with that awk generate-assembly-file script from p.5
3) Run visualization script from p.5

If you would like to follow route 2:

1) Get your original scaffolds fasta
2) Run Juicer using your original scaffolds fasta as reference (see chapter 2 in manual)
3) Run 3D-DNA on scaffolds.fasta and the Juicer mnd (see chapter 3 in manual)
4) Examine output (.hic and .assembly files which will be output automatically) in JBAT (chapter 4 in manual)

Best,
Olga

Michael Alonge

unread,
Jun 28, 2018, 6:22:46 PM6/28/18
to 3D Genomics
Awesome thanks so much for the help. 

Mike 
Reply all
Reply to author
Forward
0 new messages