Arima v2.0 kit cuts digest_genome.py script

378 views
Skip to first unread message

Robert King

unread,
Jan 4, 2021, 7:13:26 AM1/4/21
to HiC-Pro

Using this script below. I had this "-r ^GATC ^TTAA G^AATC G^ATTC G^AGTC G^ACTC" which worked for Arima version 1 kits but now the enzymes have changed. Adding additional enzyme CTNAG, I'm not sure where the ^ goes, anyone have any ideas?

https://github.com/nservant/HiC-Pro/blob/master/bin/utils/digest_genome.py

Guessing like below?

digest_genome.py -r ^GATC ^TTAA G^AATC G^ATTC G^AGTC G^ACTC ^CTAAG ^CTCAG ^CTGAG ^CTTAG

nservant

unread,
Jan 4, 2021, 7:47:18 AM1/4/21
to HiC-Pro
Hi,
I do not know. Would you have any link/manual to the new kit version ?
The one I found here (https://arimagenomics.com/conformation) still refers to the v1 I guess.
Of note, the digest_genome.py script should now supports the 'N' bases, if you do not want to specify all cases.
Best

Robert King

unread,
Jan 4, 2021, 8:15:17 AM1/4/21
to HiC-Pro
I think just early access at the moment but looking at this page

I think it is this  ^GATC ^TTAA G^AATC G^ATTC G^AGTC G^ACTC C^TAAG C^TCAG C^TGAG C^TTAG  

I'll try the N version of the script but see how this goes first.

Robert King

unread,
Jan 4, 2021, 9:24:52 AM1/4/21
to HiC-Pro
Now stuck on ligation site information. I originally had this but don't know how to add the extra enzyme?
LIGATION_SITE =ATCGATC,GAATGATC,GATTGATC,GAGTGATC,GACTGATC,GAATAATC,GAATATTC,GAATAGTC,GAATACTC,GATTAATC,GATTATTC,GATTAGTC,GATTACTC,GAGTAATC,GAGTATTC,GAGTAGTC,GAGTACTC,GACTAATC,GACTATTC,GACTAGTC,GACTACTC,GATCAATC,GATCATTC,GATCAGTC,GATCACTC

nservant

unread,
Jan 4, 2021, 12:23:47 PM1/4/21
to HiC-Pro
Hi Robert,

This exercice is always painful :) I did it hundreds of time, and I always have doubts.
So, if I understood correctly, you now have three enzymes with the following RESTRICTION modifs ; ^GATC, G^ANTC, C^TNAG
I'm no longer replacing the 'N', as HiC-Pro is expected to do it automatically.

Then, in terms of possible LIGATION motifs, I would expect to have ; GATCGATC, GATCANTC, GATCTNAG, GANTGATC, GANTANTC, GANTTNAG, CTNAGATC, CTNAANTC, CTNATNAG

I'm attaching a picture to explain you how I found these ligation motifs.
At the top-left, you have your 3 enzymes. At the top-rigth, the different possible fragment ends after fill-in.
Then, I just made the ligations, taking into account that a 3' end can only be linked with a 5' and (and vice-versa)
The arrow represents the 5'->3' orientation for sequencing. 
Here, I'm just using one read end, but the other one gives exactly the same motifs.

Let me know if it sounds good. Anyway, it is still worth double checking ;)
Best
Nicolas
20210104_175541.jpg

Konstantin Okonechnikov

unread,
Dec 23, 2021, 4:37:49 AM12/23/21
to HiC-Pro
Hi,

have actually related question - we got data from novel Arima genomics capture protocol ( https://arimagenomics.com/products/custom-capture-hic/ ) targeted on promoters and some other genomic fragments, and I'm trying now to analyze the data with HiC-Pro.

From the company I was told that the "chromatin is digested at ^GATC and G^ANTC". 

So, have 2 questions:
1) How to correctly digest genome? 
Is it correct: 
digest_genome.py -r ^GATC G^ANTC

2) What should be stated for the LIGATION SITE argument in the config? 
My suggestion (not sure, here especially would be thankful for comments):
LIGATION_SITE =GATCGATC,GAATTAAG, GAGTTGAG, GACTTCAG,  GATTTTAG

Thanks in advance,
   Konstantin

nservant

unread,
Jun 29, 2022, 5:49:06 AM6/29/22
to HiC-Pro
Hi Konstantin,
Best

Reply all
Reply to author
Forward
0 new messages