Restriction Site File

603 views
Skip to first unread message

Aiden Lab

unread,
Feb 23, 2016, 3:40:19 PM2/23/16
to Juicer
Original at Github: https://github.com/theaidenlab/juicer/issues/1

------------------------------------------------------------------------------------------

Hi, 
Thanks for sharing the code in this much detail!

just wondering where can I find the restriction site file

$site_file = "/opt/juicer/restriction_sites/hg19_DpnII.txt";

Is it generated by HICUP? just wanna know what's the format look like.

Thanks!
Hurley


------------------------------------------------------------------------------------------


Hi Hurley, It's a whitespace-delimited file. There is no header. Each line has as the first field the chromosome, followed by the list of restriction sites, in increasing order. The last field is the size of the chromosome. E.g.: 1 11160 12411 12461 ... 249250621 2 11514 11874 12160 ... 243199373 ... The restriction sites are the location of the motif of the restriction enzyme in the reference genome. So in this example, the first location of GATC in hg19 on chromosome 1 is 11160, then 12411, then 12461, etc. Best Neva


------------------------------------------------------------------------------------------


Got it! Thanks Neva!

jenniferwalsh

unread,
Feb 23, 2016, 4:04:06 PM2/23/16
to Juicer
Hi Neva,

Where can the text file (in this format) be found?

Thanks,
Jennifer

Muhammad Shamim

unread,
Feb 23, 2016, 5:03:46 PM2/23/16
to Juicer
Hi Jennifer,

You will need to generate the file as it depends on the restriction enzyme and genome used for your experiment.
You can use the following script from Github to build the file:

Lin

unread,
Mar 15, 2016, 7:27:06 PM3/15/16
to 3D Genomics, juicer_...@googlegroups.com
Hi Muhammad,

I just tried the 'generate_site_positions.py' with HindIII cutter and hg19. It seems only generated the position for restriction sites on the positive strand. Do we need count the sites on the negative strand? For example, it counts the number of 'AAGCTT' in the one strand reference genome, but do we need to know the occurance of its complementary 'TTCGAA' on the reference genome also?

Thanks 
Message has been deleted

Suhas Rao

unread,
Mar 15, 2016, 7:49:05 PM3/15/16
to 3D Genomics, juicer_...@googlegroups.com
Hey Lin,

The reverse complement of 'AAGCTT' is 'AAGCTT', not 'TTCGAA', thus all sites for HindIII on the forward strand are also sites on the reverse strand and vice versa. In short, 'generate_site_positions.py' will identify all sites for any restriction enzyme that recognizes a palindromic site. 

Cheers,
Suhas 
Reply all
Reply to author
Forward
0 new messages