Hello Viv,
It is a little tricky to get correct counts for ligations in this case and depends on the grep installed on your system. The easiest thing to do is run Juicer as usual but calculate the ligation junctions separately. You would calculate the ligation junctions in the contacts using a command like this:
grep -cE '(GATCGATC|GAATGATC|GATTGATC|GACTGATC|GAGTGATC)' merged_nodups.txt
NB: I only listed the ligations junctions generated from the first two; you would need to generate all the possibilities (there are many).
For the fastqs, you would do something similar, but
paste <(zcat fname_R1.fastq.gz) <(zcat fname_R2.fastq.gz) | grep -cE '(GATCGATC|GAATGATC|GATTGATC|GACTGATC|GAGTGATC)'
Finally, you will need to generate a restriction site file the looks for the union of all these cut sites. You should modify this script to do so:
https://github.com/theaidenlab/juicer/blob/master/misc/generate_site_positions.py
In particular you need to test against multiple test strings.
If you aren’t worried about reads mapping to the same fragment, you could run Juicer with the flags “-x -s none”.
Best
Neva
--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/2e6859b9-68a9-43f0-bc22-cc38cf15f5bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Sequenced Read Pairs: 335,962,570
Normal Paired: 294,375,197 (87.62%)
Chimeric Paired: 20,978,603 (6.24%)
Chimeric Ambiguous: 15,480,378 (4.61%)
Unmapped: 5,128,392 (1.53%)
Ligation Motif Present: 52,126,754 (15.52%)
Alignable (Normal+Chimeric Paired): 315,353,800 (93.87%)
Intra-fragment Reads: 10,708,012 (3.19% / 3.54%)
Below MAPQ Threshold: 105,092,480 (31.28% / 34.77%)
Hi-C Contacts: 186,467,439 (55.50% / 61.69%)
Ligation Motif Present: 22,382,290 (6.66% / 7.40%)
3' Bias (Long Range): 55% - 45%
Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
Inter-chromosomal: 15,892,647 (4.73% / 5.26%)
Intra-chromosomal: 170,574,792 (50.77% / 56.43%)
Short Range (<20Kb): 55,468,879 (16.51% / 18.35%)
Long Range (>20Kb): 115,104,816 (34.26% / 38.08%)
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/2e6859b9-68a9-43f0-bc22-cc38cf15f5bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/cd9814b8-53b0-4266-bb8f-8d4030f67e34%40googlegroups.com.
cat merged_nodups.txt | awk 'BEGIN{N0=0;N30=0;total=0};$9<30 {N30=N30+1}; $9<=0 {N0=N0+1}; {total = total+1}; END{percent30=N30/total;percent0 = N0/total; print "no of below 30 MAPQ "percent30; print "no of zero or below MAPQ "percent0}'
no of below 30 MAPQ 0.249307
no of zero or below MAPQ 0.225292
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/a09ee3f6-c56e-4b54-a0ea-1bfdf53812d8%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/a09ee3f6-c56e-4b54-a0ea-1bfdf53812d8%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/b4584864-904d-4347-ae69-b5c31addb2eb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/2e6859b9-68a9-43f0-bc22-cc38cf15f5bb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/b59ebef0-dd5c-4579-89e2-1297df0bb1b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/b59ebef0-dd5c-4579-89e2-1297df0bb1b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
When you begin analyzing your Arima-HiC data, it is likely that you will need to tailor your analysis pipeline to account for our unique restriction enzyme cocktail. This will allow for the generation of genome cut-site files as well as the identification of ligation junctions in your dataset. The following information should allow you to input the restriction sites used with Arima HiC into Juicer.
The restriction enzymes in the Arima-HiC cocktail cut at the following motifs, where ‘^’ is the cut site on the + strand (‘N’ can be either of the 4 genomic bases):
^GATC
G^ANTC
You will need to provide a custom Arima-HiC-specific Juicer.sh command line script. Arima can provide you this file for mm9, mm10, hg19, and hg38. Please copy and paste this Ftp link into your browser to access the files (ftp://ftp-arimagenomics.sdsc.edu/pub/JUICER_CUTSITE_FILES).
The default behavior of Juicer will be to calculate the number of chimeric reads assuming the only possible ligation junction (GATCGATC), which is one of the expected ligation junctions produced by our kit. However, our Arima-HiC chemistry uses multiple restriction enzymes and can produce 25 possible ligation junction motifs:
LIGATION_SITE = GAATAATC,GAATACTC,GAATAGTC,GAATATTC,GAATGATC,GACTAATC,GACTACTC,GACTAGTC,GACTATTC,GACTGATC,GAGTAATC,GAGTACTC,GAGTAGTC,GAGTATTC,GAGTGATC,GATCAATC,GATCACTC,GATCAGTC,GATCATTC,GATCGATC,GATTAATC,GATTACTC,GATTAGTC,GATTATTC,GATTGATC
Unfortunately Juicer cannot accommodate more than one possible ligation junction motif so if you want the true number (which will be higher than what will be calculated), you would need to obtain that manually using grep. We can provide instruction for how to do this if need be.
In addition to -y, the other argument variables we define are:
-d
-p
-z
-D
-t
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/c9533ee4-4920-400d-8330-d593543f6e26%40googlegroups.com.
Hello Neva, I have a similar question when processing the multiple enzyme data. The two cut site motifs are: ^GATC and G^ANTC, how can I set the appropriate site parameter for juicer? The usage webpage said the default is MboI, is it ok that I use this default value and use a restriction site file consistent to the multiple enzyme? I am not clear about the relationship between the site parameter and restriction site file and their impact on the analysis results.Thank you.
--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/1765a724-ff67-4099-b7cd-39e0dc9fdcfc%40googlegroups.com.