chrom.sizes file doubt

127 views
Skip to first unread message

Alicia Talavera

unread,
May 6, 2020, 1:33:28 PM5/6/20
to 3D Genomics
Dear 3D Genomics group, 

I am trying to scaffold one assembly (genome size ~1.17 Gb, assembly size 1.1 Gb with 15,076 sequences and an N50 of 1,314 sequences). 

First, I would like to use Juicer, but I don´t understand the file Chrom.size file. I was using the script: awk 'BEGIN{OFS="\t"}{print $1, $NF}' draft_DpnII.txt > mygenome.chrom.size, and I obtained the same number of sequences of our assembly, but my draft genome is very fragmented. Is this result correct?. After reading your comment "The chrom.sizes file is just the sizes of the chromosomes you wish to be displayed in your hic file". I guess that I have to obtain the number of chromosomes of my species. Could you please help me with this conflict? I think I didn´t understand it well.

Thank you very much,

Best regard,

Alicia 

Olga Dudchenko

unread,
May 7, 2020, 3:14:22 PM5/7/20
to 3D Genomics
Hi Alicia,

Please consider consulting the Genome Assembly Cookbook for an overview of assembly workflow (dnazoo.org/methods).

It sounds like you are on step 2: aligning your Hi-C data to the draft genome assembly. You do not need the chrom.sizes file for this and can run with -S early (and without -p flag) to produce alignments and without building the chromosome-length hic file (which you cannot since you do not have the assembly yet).

Your next steps would be running 3D-DNA which will build the hic files for your automatically as it assembles the draft based on Hi-C alignments. 

Best,
Olga

Alicia Talavera

unread,
May 7, 2020, 4:57:28 PM5/7/20
to Olga Dudchenko, 3D Genomics
Hi Olga,

Thank you for your quick response,

I consulted the Genome Assembly Cookbook and I checked a tutorial that confused me. Sorry if the question was basic. 

Thanks for your help, I appreciate it.

Best,

Alicia

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/41d7965b-2b63-48c6-a6fa-d5637f39405d%40googlegroups.com.


--
Alicia Talavera Júdez, Ph.D.Student
Department of Subtropical and mediterranean fruit crops.
Estación Experimental IHSM La Mayora (CSIC-UMA)
29750 Algarrobo-Costa, Málaga (Spain)

Find me on: 



Olga Dudchenko

unread,
May 8, 2020, 8:35:15 PM5/8/20
to 3D Genomics
My pleasure. Best, -Olga


On Thursday, May 7, 2020 at 3:57:28 PM UTC-5, Alicia wrote:
Hi Olga,

Thank you for your quick response,

I consulted the Genome Assembly Cookbook and I checked a tutorial that confused me. Sorry if the question was basic. 

Thanks for your help, I appreciate it.

Best,

Alicia

El jue., 7 may. 2020 a las 21:14, 'Olga Dudchenko' via 3D Genomics (<3d-ge...@googlegroups.com>) escribió:
Hi Alicia,

Please consider consulting the Genome Assembly Cookbook for an overview of assembly workflow (dnazoo.org/methods).

It sounds like you are on step 2: aligning your Hi-C data to the draft genome assembly. You do not need the chrom.sizes file for this and can run with -S early (and without -p flag) to produce alignments and without building the chromosome-length hic file (which you cannot since you do not have the assembly yet).

Your next steps would be running 3D-DNA which will build the hic files for your automatically as it assembles the draft based on Hi-C alignments. 

Best,
Olga

On Wednesday, May 6, 2020 at 12:33:28 PM UTC-5, Alicia Talavera wrote:
Dear 3D Genomics group, 

I am trying to scaffold one assembly (genome size ~1.17 Gb, assembly size 1.1 Gb with 15,076 sequences and an N50 of 1,314 sequences). 

First, I would like to use Juicer, but I don´t understand the file Chrom.size file. I was using the script: awk 'BEGIN{OFS="\t"}{print $1, $NF}' draft_DpnII.txt > mygenome.chrom.size, and I obtained the same number of sequences of our assembly, but my draft genome is very fragmented. Is this result correct?. After reading your comment "The chrom.sizes file is just the sizes of the chromosomes you wish to be displayed in your hic file". I guess that I have to obtain the number of chromosomes of my species. Could you please help me with this conflict? I think I didn´t understand it well.

Thank you very much,

Best regard,

Alicia 

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-ge...@googlegroups.com.

Jahn Davik

unread,
Sep 24, 2020, 11:49:35 AM9/24/20
to 3D Genomics
I am also confused: What is wrong with my command?

juicer.sh -g Anitra50K -s MboI -z Anitra50K.fasta -y Anitra50K_MboI.txt -t 12 -S early

And here is the reply from juicer:
***! You must define a chrom.sizes file via the "-p" flag that delineates the lengths of the chromosomes in the genome at Anitra50K.fasta; you may use "-p hg19" or other standard genomes


I am not working on standard genomes, rather a de novo genome. Any feedback is appreciated.

here is the folder I work from:


Neva Durand

unread,
Sep 24, 2020, 12:05:54 PM9/24/20
to Jahn Davik, 3D Genomics
You need to send in the chrom.sizes file with the -p flag

On Thu, Sep 24, 2020 at 11:49 AM Jahn Davik <jahn...@gmail.com> wrote:
I am also confused: What is wrong with my command?

juicer.sh -g Anitra50K -s MboI -z Anitra50K.fasta -y Anitra50K_MboI.txt -t 12 -S early

And here is the reply from juicer:
***! You must define a chrom.sizes file via the "-p" flag that delineates the lengths of the chromosomes in the genome at Anitra50K.fasta; you may use "-p hg19" or other standard genomes


I am not working on standard genomes, rather a de novo genome. Any feedback is appreciated.

here is the folder I work from:

--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine

Olga Dudchenko

unread,
Sep 25, 2020, 9:57:44 AM9/25/20
to 3D Genomics
Hi all,

The Juicer seems to be run for assembly purposes. For assembly, you do not need/should not use the chrom.sizes. The newer versions of Juicer correctly handle this and do not require -p with -S early. For earlier versions of Juicer pass "-p assembly" to circumvent (this is the workaround described in Genome Assembly Cookbook).

Best,
Olga

Reply all
Reply to author
Forward
0 new messages