using HIC pipeline on HIC data from ARIMA GENOMICS

1,178 views
Skip to first unread message

tan...@gmail.com

unread,
Mar 2, 2021, 7:54:00 PM3/2/21
to 3D Genomics
Dear all, 

if i may ask please, has anyone used the Juicer pipeline in the analysis of HIC data from Arima Genomics (they do use a cocktail of restriction enzymes). Thanks a lot :)

-- bogdan

Neva Durand

unread,
Mar 2, 2021, 8:03:43 PM3/2/21
to tan...@gmail.com, 3D Genomics
Yes, you can use the Juicer pipeline with the site "-s Arima"

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/2eccc568-98f7-4019-a7ce-ff0de0227b1cn%40googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

Matt Romero

unread,
Mar 2, 2021, 8:10:00 PM3/2/21
to 3D Genomics
Wow, I didn't know this! Is this flag specific to a specific juicer script version? 
Thanks!


On Tuesday, March 2, 2021 at 5:03:43 PM UTC-8, Neva Durand wrote:
Yes, you can use the Juicer pipeline with the site "-s Arima"

On Tue, Mar 2, 2021 at 7:54 PM tan...@gmail.com <tan...@gmail.com> wrote:
Dear all, 

if i may ask please, has anyone used the Juicer pipeline in the analysis of HIC data from Arima Genomics (they do use a cocktail of restriction enzymes). Thanks a lot :)

-- bogdan

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.

Neva Durand

unread,
Mar 2, 2021, 8:18:26 PM3/2/21
to Matt Romero, 3D Genomics
It's been there for a while in all versions. It just sets the ligation junction, but do note that you have to take into account this frequency over background.

It's also supported in generate_site_positions


To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/f2bd0bc9-8749-4c25-b87f-274043c9cb75o%40googlegroups.com.

Bogdan Tanasa

unread,
Mar 2, 2021, 8:25:45 PM3/2/21
to Neva Durand, Matt Romero, 3D Genomics
Dear Neva, 

thanks a lot for the quick reply and gracious help. If i may ask for a tiny bot of more information please : 

-- as ARIMA uses a cocktail of restriction enzymes, we do not have to trim (cut) the Illumina sequencing reads based on restriction sites, correct ?

i assume that BWA does a good job in the split alignment. 

-- and, at which step in the pipeline, is "-s arima" critically important ? (i believe that by not using "-s arima", the number of loops decrease significantly ?)

thanks a lot, 

bogdan

You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/1kgiGvi7vg8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/CAF1CciXNW%3DHrLwUsHad8%2BfoXXK7NjxEHVGi8FhzDZvAMo%2Br1Tw%40mail.gmail.com.

Matt Romero

unread,
Mar 2, 2021, 8:27:14 PM3/2/21
to 3D Genomics
Thanks, Neva! Maybe I am a little confused - if I have a restriction site file then do I still need to worry about the ligation junctions?
Also, I saw the -s option for juicer version 1.6 on SLURM, but for AWS version 1.5.6 Arima doesn't seem to be an option. I am using 1.5.6 on AWS, so that's why I'm a bit confused. 
Would another option be to copy the ligation junction options from the juicer script and use that as a -b flag?
Perhaps like this:
juicer.sh -b ligation="'(GAATAATC|GAATACTC|GAATAGTC|GAATATTC|GAATGATC|GACTAATC|GACTACTC|GACTAGTC|GACTATTC|GACTGATC|GAGTAATC|GAGTACTC|GAGTAGTC|GAGTATTC|GAGTGATC|GATCAATC|GATCACTC|GATCAGTC|GATCATTC|GATCGATC|GATTAATC|GATTACTC|GATTAGTC|GATTATTC|GATTGATC)'"
or perhaps
juicer.sh -y path/to/restriction_site_file -b ligation="'(GAATAATC|GAATACTC|GAATAGTC|GAATATTC|GAATGATC|GACTAATC|GACTACTC|GACTAGTC|GACTATTC|GACTGATC|GAGTAATC|GAGTACTC|GAGTAGTC|GAGTATTC|GAGTGATC|GATCAATC|GATCACTC|GATCAGTC|GATCATTC|GATCGATC|GATTAATC|GATTACTC|GATTAGTC|GATTATTC|GATTGATC)'"
Not sure you'd want to use both or not...
I hope this question makes sense!
Thanks!


On Tuesday, March 2, 2021 at 5:18:26 PM UTC-8, Neva Durand wrote:
It's been there for a while in all versions. It just sets the ligation junction, but do note that you have to take into account this frequency over background.

It's also supported in generate_site_positions


On Tue, Mar 2, 2021 at 8:10 PM Matt Romero <512m...@gmail.com> wrote:
Wow, I didn't know this! Is this flag specific to a specific juicer script version? 
Thanks!

On Tuesday, March 2, 2021 at 5:03:43 PM UTC-8, Neva Durand wrote:
Yes, you can use the Juicer pipeline with the site "-s Arima"

On Tue, Mar 2, 2021 at 7:54 PM tan...@gmail.com <tan...@gmail.com> wrote:
Dear all, 

if i may ask please, has anyone used the Juicer pipeline in the analysis of HIC data from Arima Genomics (they do use a cocktail of restriction enzymes). Thanks a lot :)

-- bogdan

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/2eccc568-98f7-4019-a7ce-ff0de0227b1cn%40googlegroups.com.


--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.

Neva Durand

unread,
Mar 2, 2021, 8:43:46 PM3/2/21
to Bogdan Tanasa, Matt Romero, 3D Genomics
On Tue, Mar 2, 2021 at 8:25 PM Bogdan Tanasa <tan...@gmail.com> wrote:
Dear Neva, 

thanks a lot for the quick reply and gracious help. If i may ask for a tiny bot of more information please : 

-- as ARIMA uses a cocktail of restriction enzymes, we do not have to trim (cut) the Illumina sequencing reads based on restriction sites, correct ?


These days we keep intrafragment reads. That said, even with a cocktail you can still create fragment-delimited maps and choose to discard intrafragment reads. The fragments are just smaller.
 
i assume that BWA does a good job in the split alignment. 

Yes, we find the BWA split alignment reliable as long as flags are set to align each end independently.
 
-- and, at which step in the pipeline, is "-s arima" critically important ? (i believe that by not using "-s arima", the number of loops decrease significantly ?)


The flag "-s Arima" is used only to set the ligation junction, and depending on your flags, to search for the restriction site file. The ligation junction is used only for statistics, it is not used at all in analysis. You do not need to use '-s Arima' at all if you plan to keep all intrafragment reads (the default) and if you don't want fragment delimited maps (also the default). You may just use the restriction site "none" (which is also the default, following the theme).

Neva Durand

unread,
Mar 2, 2021, 8:47:41 PM3/2/21
to Matt Romero, 3D Genomics
Yes, I'm sorry but we are woefully behind on support for AWS. We are planning a push with the ENCODE Juicer pipeline that will replace AWS that should be ready in the coming weeks.

As I said above, it's supported in new Juicer, and what you wrote above is indeed how you would call it in AWS. However, our defaults these days don't really use that information (we no longer discard intrafragment reads, in particular), in which case you should be able to send in '-s none'. The ligation junction information is just used for counting ligation junctions, a sanity check to be sure, but less meaningful with Arima where there are many possible ligation junctions.

Best
Neva

To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/5ff7ffc4-7027-45f4-aca6-e5853ee9e666o%40googlegroups.com.

Bogdan Tanasa

unread,
Mar 2, 2021, 9:01:20 PM3/2/21
to Neva Durand, Matt Romero, 3D Genomics
Dear Neva, thanks a lot, very helpful and very prompt messages !

i may have some more questions along the way, so .. i hope you will not mind, shall i post new questions on the forum in the near future. 

thank you in advance for your help :) !

-- bogdan



You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/1kgiGvi7vg8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/CAF1CciUMSLYYGEujgnSLGSJa4vOrCF7oCBUaVF%2BnUwe1T%2BrYMw%40mail.gmail.com.

tan...@gmail.com

unread,
Mar 5, 2021, 8:03:47 PM3/5/21
to 3D Genomics

Dear Neva, and Matt, we wish you a good weekend, and thank you for our conversations.

just to double check with you please :  i have set up the script for ARIMA HIC data in the following way  -- would you please let me know if it is correct ? 

/labs/ARIMA_HiC_now_analysis_JUICER/juicer/CPU/juicer.sh \
-g mm10 \
-s Arima \
-a 'C_1m' \
-p /labs/ARIMA_HiC_now_analysis_JUICER/juicer/chrom.size.mm10/chrom.size.mm10.txt \
-y /labs/ARIMA_HiC_now_analysis_JUICER/juicer/restriction_sites_mm10_for_ARIMA_with_generate_site_positions/mm10_Arima.txt \
-z /labs/ARIMA_HiC_now_analysis_JUICER/juicer/references_bwa_mm10/mm10.fa \
-D /scg/ARIMA_HiC_now_analysis_JUICER/juicer/SLURM \                                                                               ### with the SCRIPTS FOLDER
-t 8

Robert King

unread,
Mar 6, 2021, 2:52:35 PM3/6/21
to tan...@gmail.com, 3D Genomics
Arima changed their enzyme mix recently, the script is likely based upon version 1 rather than arimas version 2 mix (not sure if widely released yet) so just be careful if using arima flag what version of the mix it is for, I think the old one.

Neva Durand

unread,
Mar 6, 2021, 3:40:14 PM3/6/21
to Robert King, tan...@gmail.com, 3D Genomics
This is easy enough to fix but I just want to emphasize that the restriction enzyme(s) only matter if you're explicitly excluding intrafragment reads. If not, it does not affect the final product in any way.

Bogdan Tanasa

unread,
Mar 6, 2021, 5:56:24 PM3/6/21
to Neva Durand, Robert King, 3D Genomics
Dear Neva, and Dear Robert, thank you very much for your replies and suggestions ! I will email also to Arima to ask. If I may ask a few more questions please : 

1. Arima released some fragment files (for Hi-C Pro) based on enzyme cocktail : ftp://ftp-arimagenomics.sdsc.edu/pub/HiCPro_GENOME_FRAGMENT_FILES

is there a way (or is it useful)  to use these fragment files in JUICER pipeline ? (the folks from Arima recommended to use these fragments for Hi-C Pro)

what i have done was to use the script "generate_site_positions.py", that takes into consideration the enzyme cocktail ('Arima'  : [ 'GATC', 'GANTC' ],) that was used for our data too. 

2. when I run the Juicer pipeline, I do the following (below). The question becomes please : is it necessary to include both "-s Arima \" and "-y" (with the file "mm10_Arima.txt") in the script ?

/labs/ARIMA_HiC_now_analysis_JUICER/juicer/CPU/juicer.sh \
-g mm10 \
-s Arima \
-a 'C_1m' \
-p /labs/ARIMA_HiC_now_analysis_JUICER/juicer/chrom.size.mm10/chrom.size.mm10.txt \
-y /labs/ARIMA_HiC_now_analysis_JUICER/juicer/restriction_sites_mm10_for_ARIMA_with_generate_site_positions/mm10_Arima.txt \
-z /labs/ARIMA_HiC_now_analysis_JUICER/juicer/references_bwa_mm10/mm10.fa \
-D /scg/ARIMA_HiC_now_analysis_JUICER/juicer/SLURM \                                                                               ### with the SCRIPTS FOLDER
-t 8

3. Dear Neva, when you write "excluding intrafragment reads", what does it exactly mean ? where do the "intrafragment reads" impact the final results ? 

thanks a lot for your time in the weekend, and for the conversations !

-- bogdan




Neva Durand

unread,
Mar 10, 2021, 4:06:45 PM3/10/21
to Bogdan Tanasa, Robert King, 3D Genomics
On Sat, Mar 6, 2021 at 5:56 PM Bogdan Tanasa <tan...@gmail.com> wrote:
Dear Neva, and Dear Robert, thank you very much for your replies and suggestions ! I will email also to Arima to ask. If I may ask a few more questions please : 

1. Arima released some fragment files (for Hi-C Pro) based on enzyme cocktail : ftp://ftp-arimagenomics.sdsc.edu/pub/HiCPro_GENOME_FRAGMENT_FILES

is there a way (or is it useful)  to use these fragment files in JUICER pipeline ? (the folks from Arima recommended to use these fragments for Hi-C Pro)

what i have done was to use the script "generate_site_positions.py", that takes into consideration the enzyme cocktail ('Arima'  : [ 'GATC', 'GANTC' ],) that was used for our data too. 

You can use theirs but I think it's a bed file, so if you used generate_site_positions that's right.


2. when I run the Juicer pipeline, I do the following (below). The question becomes please : is it necessary to include both "-s Arima \" and "-y" (with the file "mm10_Arima.txt") in the script ?

Yes both is best since you put the file in a folder not labeled simply "restriction_sites". Using "site" will set the ligation junction.
 

/labs/ARIMA_HiC_now_analysis_JUICER/juicer/CPU/juicer.sh \
-g mm10 \
-s Arima \
-a 'C_1m' \
-p /labs/ARIMA_HiC_now_analysis_JUICER/juicer/chrom.size.mm10/chrom.size.mm10.txt \
-y /labs/ARIMA_HiC_now_analysis_JUICER/juicer/restriction_sites_mm10_for_ARIMA_with_generate_site_positions/mm10_Arima.txt \
-z /labs/ARIMA_HiC_now_analysis_JUICER/juicer/references_bwa_mm10/mm10.fa \
-D /scg/ARIMA_HiC_now_analysis_JUICER/juicer/SLURM \                                                                               ### with the SCRIPTS FOLDER
-t 8

3. Dear Neva, when you write "excluding intrafragment reads", what does it exactly mean ? where do the "intrafragment reads" impact the final results ? 


Up until fairly recently, Hi-C maps excluded intrafragment reads by default since this is mostly nonspecific pulldown (unligated fragments). We no longer do that so if you're using a new jar, you don't need to worry about this one way or the other. And in fact, setting the sites overall will make no difference except in the statistics calculation (for QC). If you choose to include fragment delimited maps, which we don't do by default, that's another place you would need the sites. But overall you can process your Arima library with "-s none" instead of Arima and the resulting contact maps will be the same.

Bogdan Tanasa

unread,
Mar 10, 2021, 4:14:28 PM3/10/21
to Neva Durand, 3D Genomics
Dear Neva, thanks a lot :) !

have a happy spring time ! 

and, i may still ask you folks a few questions in the near future :)

Edoardo Marcora

unread,
Apr 30, 2021, 1:40:53 PM4/30/21
to 3D Genomics
I am new to Juicer and when I run it with the -s Arima option I do get an error about missing the hg19_Arima.txt file. I looked for it in the Box mirror but it's not there! https://bcm.app.box.com/v/juicerawsmirror/folder/11284713908

Do I have to generate it myself? If so, how?

Thanks

Edoardo

Neva Durand

unread,
Apr 30, 2021, 1:45:03 PM4/30/21
to Edoardo Marcora, 3D Genomics
Just run with -s none. You can also search other threads here, which have long discussions about why you can just use none. If you do want to generate it anyway, you can via the restriction enzyme script here: https://github.com/aidenlab/juicer/tree/master/misc


Neil Young

unread,
Jul 4, 2021, 7:33:52 PM7/4/21
to 3D Genomics
I love this forum! 
I was looking at using juicer with the new Arima 4 enzyme high coverage kit. 
Everything was clear until the line in juicer.sh
It is currently written with an assumption that 2 enzymes are used.

grep Arima ./juicer/CPU/juicer.sh 

      Arima) ligation="'(GAATAATC|GAATACTC|GAATAGTC|GAATATTC|GAATGATC|GACTAATC|GACTACTC|GACTAGTC|GACTATTC|GACTGATC|GAGTAATC|GAGTACTC|GAGTAGTC|GAGTATTC|GAGTGATC|GATCAATC|GATCACTC|GATCAGTC|GATCATTC|GATCGATC|GATTAATC|GATTACTC|GATTAGTC|GATTATTC|GATTGATC)'" ;;

Here, i would need to work out every combination of possible ligation junction with the four enzymes and add these to this line?  
Any chance your team have made a handy script to do this ? :)
Alternatively, i just run with the -s none mode I guess.
All the best,
Neil
 

Shawn Bai

unread,
Feb 21, 2023, 10:55:47 AM2/21/23
to 3D Genomics
Hi, Neil. Have you figure this out?
Reply all
Reply to author
Forward
0 new messages