input files with no restriction fragments information

485 views
Skip to first unread message

nicolas.se...@gmail.com

unread,
Mar 30, 2016, 7:37:31 AM3/30/16
to 3D Genomics
Hi Neva,

I'm just wondering how I should specifiy to Juicebox that I do not have the restriction fragment information.
I run the clt jar with no -f, it generates the .hic file. But then when I look at my map, I have no cis interactions.
I currently put a 0 at the resfrag name. I try the put "NA" but a interger is expected.
Thank you for your help
Nicolas

>>head tmp/13576_allValidPairs.pre_juicebox_sorted
SRR400264.100043 1 1 73312058 0 0 1 105973177 0 42 42
SRR400264.100055 0 1 105829190 0 1 1 106032267 0 42 42
SRR400264.100073 0 1 247931515 0 0 1 248450614 0 42 42
SRR400264.100100 0 1 163617012 0 1 1 163633673 0 42 42
SRR400264.100167 0 1 65359270 0 0 1 66674080 0 40 42
SRR400264.100262 0 1 72208769 0 1 1 165975621 0 42 37
SRR400264.100286 0 1 37178207 0 0 1 47228707 0 42 42
SRR400264.10035 0 1 159040736 0 0 1 190019234 0 42 37
SRR400264.100360 1 1 15448618 0 1 1 15456905 0 42 42
SRR400264.100387 1 1 208030494 0 1 1 243864469 0 42 42


Neva Durand

unread,
Mar 30, 2016, 8:45:05 AM3/30/16
to nicolas.se...@gmail.com, 3D Genomics
Hi Nicolas,

Juicebox automatically removes reads that are on the same restriction fragment.  If you're supplying a fake restriction site number, it should be different on each end.  So 0 on the end and 1 on the other for example.  Your first read should be:

SRR400264.100043 1 1 73312058 0 0 1 105973177 1 42 42

and so on for all the reads.

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/03d30c34-157c-4a9f-a023-ef158fe7b3db%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

nicolas.se...@gmail.com

unread,
Apr 14, 2016, 4:17:26 AM4/14/16
to 3D Genomics, nicolas.se...@gmail.com
Hi Neva,

I'm re-opening this discussion because it seems that I have another issue.
All the tests I did on Human data work well. But for a reason that I do not understand, when I'm doing the same on Mouse data, I have an error :

 >>/java -jar /bioinfo/local/build/juicebox/juicebox_clt_1.4.jar pre 12612_allValidPairs_G1.pre_juicebox_sorted test mm9
WARNING: Not including fragment map
Start preprocess
Writing header
Writing body
java.lang.RuntimeException: No reads in Hi-C contact matrices. This could be because the MAPQ filter is set too high (-q) or because all reads map to the same fragment.
    at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.mergeAndWriteBlocks(Preprocessor.java:1387)
    at juicebox.tools.utils.original.Preprocessor$MatrixZoomDataPP.access$000(Preprocessor.java:1158)
    at juicebox.tools.utils.original.Preprocessor.writeMatrix(Preprocessor.java:572)
    at juicebox.tools.utils.original.Preprocessor.writeBody(Preprocessor.java:303)
    at juicebox.tools.utils.original.Preprocessor.preprocess(Preprocessor.java:213)
    at juicebox.tools.clt.old.PreProcessing.run(PreProcessing.java:94)
    at juicebox.tools.HiCTools.main(HiCTools.java:78)

I put 0/1 to all fragments as discussed before. And the MAPQ are high ...
My input file looks like ;
10000000 0 1 9094472 0 0 1 8907781 1 40 40
10000001 1 1 14723403 0 1 1 14674631 1 40 40
10000002 1 1 24071965 0 0 1 23867153 1 40 40
10000003 1 1 70231084 0 1 1 95013384 1 40 40
10000004 1 1 70907240 0 0 1 70954861 1 40 40
10000006 1 1 74372607 0 0 1 36287931 1 40 40
10000007 1 1 64851027 0 0 1 84929829 1 40 40

I guess the error is more related to the gnome itself. For instance, when I used my Human data, ut juicbox_clt with mm9 genome I have the same error. Which sounds good as I guess some reads will be out of the range of Mouse chromosome ...
Any idea ?
Thank you
Nicolas

Neva Durand

unread,
Apr 14, 2016, 1:16:18 PM4/14/16
to nicolas.se...@gmail.com, 3D Genomics
Hi Nicolas

The chromosome names have to be the same as in the fasta file to which they were aligned. The default mm9 and mm10 genomes have chromosome names "chr1" "chr2" etc. 

Best
Neva
--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Neva Durand

unread,
Apr 14, 2016, 1:31:44 PM4/14/16
to nicolas.servant, nicolas.se...@gmail.com, 3D Genomics
It depends on the fasta file to which they are aligned. Fly chromosomes are completely different eg. For hg19 we used the GR37 fasta, which doesn't have the "chr" in front. You can always change from the default by sending in your own chrom.sizes file corresponding to the genome to which you aligned. 

On Thursday, April 14, 2016, nicolas.servant <nicolas...@curie.fr> wrote:
Thank you Neva I will test that.
But why is it not the case for hg19 ? 
Do you plan to use the same name rules for all the organisms ?
Thanks. N



Envoyé depuis mon smartphone Samsung Galaxy.
-------- Message d'origine --------
De : Neva Durand <ne...@broadinstitute.org>
Date : 14/04/2016 19:16 (GMT+01:00)
Cc : 3D Genomics <3d-ge...@googlegroups.com>
Objet : Re: input files with no restriction fragments information

Nicolas Servant

unread,
Apr 14, 2016, 2:04:03 PM4/14/16
to Neva Durand, nicolas.servant, 3D Genomics
ok. Thank you for your prompt reply.
N
Reply all
Reply to author
Forward
0 new messages