Hi-C Juicer pipeline help

1,747 views
Skip to first unread message

Jibran Tahir

unread,
Jun 23, 2017, 2:00:40 AM6/23/17
to 3D Genomics

Hi Neva

I am trying to run juicer pipeline. I have some questions related to it and need some help too (I don’t have skills set that a bioinformatician do like you).  Please can you help me in the following.

 

Here is a brief description of my work

 

We have done a Hi-C experiment on a plant genotype of a fruit species. Now we want to use Juicer to test that if Hi-C data from this genotype (sex-male ) can help in doing two things

1.       In assembling the scaffolds generated from another fruit genotype (but same species). Can juicer be used for it-

2.       3D visualisation of the fruit genome of another genotype (but same species)

 

At the moment I am trying to do step 2 to visualise the 3D configuration for the fruit genome based on my H-iC dataset. For this purpose I m facing problem in understanding the kind of files I need to use for following folders in opt

 

References (Here I am using an already assembled genome fasta files in form of pseudomoleucles/chromosome AND NOT the scaffolds that are not assembled)

 

Restriction sites ( If I m correct, here we have to use the restriction enzyme used for HiC. In our case it is Sau3AI enzyme. So we generate a list of positions that Sau3AI restriction would cut on the References Pseudomolecules/chromosomes, meaning a list of cut positions for all pseudomolecules/Chr. If this is the case please can you advise me how to do it)

 

 

 

Looking forward to your response

 

cheers

Jibran

Muhammad Saad Shamim

unread,
Jun 23, 2017, 2:21:06 AM6/23/17
to Jibran Tahir, 3D Genomics
Hey Jibran,

I think the second question is answered by this post:

i.e. you will need to download the appropriate fasta to the references folder and run bwa index on it first.
Then you can run generate_site_positions.py to create the restriction sites file.

​Best​,

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/9c126cb0-3d65-4927-b73a-9a4b1117ea3e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jibran Tahir

unread,
Jun 23, 2017, 2:51:15 AM6/23/17
to 3D Genomics, jibran...@gmail.com
Ok Muhammad, thanks for your help

So lets go step-by step.

So I first I am interested to ask that  if Juicer can help in assembling the genome of a different genotype but same species. Can it perform the same function as LACHESIS.
Now for the reference genome. I have downloaded the reference genome and ran BWA which generated the appropriate file. For the enzyme. I looked at the script you mentioned. Because I am not a bioinformatican as such, please would you guide me how to generate the restriction file for Sau3AI enzyme. Its target sequence is GATC. One more related question. Sau3AI is isoschizomers of MboI.The script you has already contains MboI- technically it should work. But then I delete the other enzyme info in the script.I am so confused.
thanks
Tahir
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Neva Durand

unread,
Jun 23, 2017, 4:28:03 AM6/23/17
to 3D Genomics, Jibran Tahir, Olga Dudchenko
Hello Jibran,

For assembly, you can use the assembly pipeline. Olga can say more; it's possible the diploid pipeline might be more appropriate. 

The restriction site file only looks for the cutting sequence. Since the cutting sequence is the same for Sau3AI and MboI, you can use the already created restriction site file. All of these can be found on our Box mirror and AWS mirror. And everything is detailed here:

I don't understand about deleting other enzyme info?  If I were you I would just call juicer with default parameters, possibly pointing to you reference sequence via the -z flag (you can also edit the script and replace the default file for hg19). This will use MboI as the default, but as the information used is simply the sequence GATC, the results will be the same. 

Best
Neva



Jibran Tahir

unread,
Jun 23, 2017, 5:03:35 AM6/23/17
to 3D Genomics, jibran...@gmail.com, Olga.Du...@bcm.edu
Thanks Neva very much.

Thats good. The samples are from diploid. I will wait for Olga's response and link to assembly pipeline, if its not different from what Juicer does for creating Hic. Is there any youtube video of the whole process both command line work and workflow drawings for biological outputs from the Juicer pipelines to better understand the computational biology.

I now understand the use of restriction file. Many thanks
Jibran

Jibran Tahir

unread,
Jun 26, 2017, 10:22:37 PM6/26/17
to 3D Genomics, Jibran Tahir, Olga.Du...@bcm.edu
Hi Olga
I am waiting for you reply
cheers
Jibran

Jibran Tahir

unread,
Jun 26, 2017, 11:08:04 PM6/26/17
to 3D Genomics, jibran...@gmail.com
Hi Muhammad
I have following problem  coming in python syntax - can you please advise whats wrong
File "<ipython-input-9-05b3a5e573de>", line 9
    print 'Usage: %s <restriction enzyme> <genome> [hrpjxt/bioinf_HiC/myjuicer/opt/juicer/restriction_sites]' % (sys.argv[0])
                                                                                                                   ^
SyntaxError: invalid syntax

On Friday, June 23, 2017 at 6:21:06 PM UTC+12, Muhammad Shamim wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Olga Dudchenko

unread,
Jun 27, 2017, 11:03:45 AM6/27/17
to 3D Genomics, jibran...@gmail.com, Olga.Du...@bcm.edu
Jibran, 
I would be happy to help, but I am not sure I fully understand your question about genotype. If you can elaborate I hope I might be able to answer that to a more satisfactory extent.
With respect to Lachesis these are pipelines with somewhat different functionality (3d-dna includes misassembly detection and does not rely on preliminary clustering of sequences based on chromosome territories) but they are designed with the same goal in mind: assembling chromosome-length genomes from draft sequences using Hi-C signal.
Best,
Olga

Jibran Tahir

unread,
Jun 27, 2017, 5:19:50 PM6/27/17
to 3D Genomics, jibran...@gmail.com, Olga.Du...@bcm.edu
I used the word genotype in the context of like an accession/ cultivar in a species. So a species can have multiple genotypes/accessions, which won't be clones of each other, but still you would expect significant similarities among genes, chromosome structure of two genotypes from same species. You would also expect almost similar orientation of genetic region along the chromosome excepts haplotype blocks and rearrangements sych as those caused by retrotransposons or transposons

Now coming back to the point of genome assembly using Juicer. Please can you tell me which pipeline is used by Juicer for the scaffolding of contigs (contigs generated from paired-end illumina reads) into pseudo-molecules or chromosomes using Hi-C data.
cheers
Jibran

Olga Dudchenko

unread,
Jun 28, 2017, 7:59:36 PM6/28/17
to 3D Genomics, jibran...@gmail.com, Olga.Du...@bcm.edu
Jibran,

You can find the pipeline here: https://github.com/theaidenlab/3d-dna

The description of the pipeline can be found here: http://science.sciencemag.org/content/356/6333/92.full

Please note that the current version of the pipeline is primarily aimed at reproducing the results of the paper. We will be making more tools available soon.

Best,
Olga

Jibran Tahir

unread,
Aug 21, 2017, 10:23:50 PM8/21/17
to 3D Genomics, jibran...@gmail.com
Hi Saad
Sorry I just couldn't progress on the script further. I am running it today. 
Please can you tel me how to generate the file mygenome.chrom.sizes
Jibran

On Friday, June 23, 2017 at 6:21:06 PM UTC+12, Muhammad Shamim wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Muhammad Saad Shamim

unread,
Aug 21, 2017, 11:35:31 PM8/21/17
to Jibran Tahir, 3D Genomics
Hey Jibran,

No worries.
The chrom.sizes file is a tab-delimited file which contains the chromosome names and sizes.

Instructions for generating from fasta here: https://www.biostars.org/p/173963/

You could also make it in a text editor if you already know the sizes of the chromosomes.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/0da38712-b661-4def-83ef-cafe1c5d3687%40googlegroups.com.

Jibran Tahir

unread,
Aug 22, 2017, 12:04:26 AM8/22/17
to 3D Genomics, jibran...@gmail.com
Hi Saad
One more question.
I am already set with running Juicer. I have my genome, BW and restriction site generate by Py script in the same folder. Now for call Juicer, shal I run it in the same folder. The R1 & R2 raw Hi-C files are in a separate folder (work)
For running Juicer I will just type following command in bash (what do you mean by flags)
juicer.sh -z <path to genome fasta file>-p <path to mygenome.chrom.sizes>, and -y <path to mygenome_myenzyme.txt> 

Muhammad Saad Shamim

unread,
Aug 22, 2017, 12:11:38 AM8/22/17
to Jibran Tahir, 3D Genomics
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/63d9e33c-17c6-40c1-800f-07ea0aaed97e%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages