Coversion of the annotation

116 views
Skip to first unread message

Pavla Navratilova

unread,
Aug 20, 2021, 2:58:53 AM8/20/21
to 3D Genomics
Hi all,
we have reassembled our genome using 3D-DNA followed by reviewing in Juicebox. For the reviewed final assembly, we generated .cprops and .asm using convert-assembly-to-cprops-and-asm.awk. We converted the original .gff annotation file to bed using bedops 
gff2bed into:
scaffold_1      587     3589    GSOIDG00000001001       2089.2116       +       Gaze    gene    .       ID=GSOIDG00000001001;Name=GSOIDG00000001001;Note=Complete 1;
scaffold_1      587     3589    GSOIDT00000001001       2089.2116       +       Gaze    mRNA    .       ID=GSOIDT00000001001;Name=GSOIDT00000001001;Note=Gene GSOIDG00000001001;Parent=GSOIDG00000001001
scaffold_1      587     738     .       105.48  +       Gaze    CDS     0       Parent=GSOIDT00000001001
scaffold_1      810     922     .       74.4    +       Gaze    CDS     1       Parent=GSOIDT00000001001
scaffold_1      1018    1086    .       80.33   +       Gaze    CDS     2       Parent=GSOIDT00000001001
scaffold_1      1177    1337    .       73.33   +       Gaze    CDS     1       Parent=GSOIDT00000001001
scaffold_1      1621    1662    .       38      +       Gaze    CDS     2       Parent=GSOIDT00000001001
............................

and attempted to convert that one for the new assembly by:

awk -v scale=1 -f lift-input-annotations-to-asm-annotations.awk new.cprops new.asm old_annotation.bed > new_annotation.bed

The result, however, looks odd and cannot be visualized in Juicebox.

scaffold_1      587     3589    GSOIDG00000001001       2089.2116       +       Gaze    gene    .       ID=GSOIDG00000001001;Name=GSOIDG00000001001;Note=Complete 1;
assembly        53230635        53230635        assembly        53230635        53230635        Gaze    gene    53230635        53230635        53230635        53230635
assembly        53230635        53230635        assembly        53230635        53230635        Gaze    mRNA    53230635        53230635        53230635        53230635
assembly        53230635        53230635        assembly        53230635        53230635        Gaze    UTR     53230635        53230635        53230635        53230635
assembly        53230635        53230635        assembly        53230635        53230635        Gaze    CDS     53230635        53230635        53230635        53230635
assembly        53230635        53230633        assembly        53230635        53230633        Gaze    CDS     53230635        53230633        53230635        53230633
assembly        42871075        42871075        assembly        42871075        42871075        Gaze    gene    42871075        42871075        42871075        42871075

 ...........................................

Could you help me to spot where do I miss something?
Thank you!
Pavla

Olga Dudchenko

unread,
Aug 26, 2021, 3:29:16 PM8/26/21
to 3D Genomics
Hi Pavla,

Sorry for the delay in responding.

The lift input annotations is for bedpe annotations rather than bed. I've just updated some code we had for lifting bed onto the dev branch (phasing), you are welcome to giving it a try. Note that you want to lift to _HiC.assembly (previously called .FINAL.assembly).


Hope this helps,
Olga

Pavla Navratilova

unread,
Aug 31, 2021, 7:35:23 PM8/31/21
to 3D Genomics

Hi Olga, 
thank you for your response and the script. 
We ran

awk -v sandbox=0 -v outlabel="assembly" -v editlabel=":::" -f lift-input-bed-to-HiC-bed.awk our_reviewed.assembly annot.new.bed > annot.mod.bed

however, then the outlabel overwrites the original scaffold name which is not what we want. 

So our question is 1) how to fix this (so we get the updated scaffold names in the mapping)

 2) if we can convert directly the annotation of the original assembly into the 3D-DNA corrected and manually reviewed in Juicebox assembly annotation or need to convert step-wise.

Pavla

Olga Dudchenko

unread,
Sep 1, 2021, 12:33:28 AM9/1/21
to 3D Genomics
Hi Pavla,

if you guys want this mapped to the HiC_scaffold_*** this is the "sandboxed" assembly (see discussion on sandboxed vs assembly mapping elsewhere in the forum). So, you want sandbox=1. Outlabel would be HiC_scaffold_ (I think). Lift should work from input to the final _HiC.assembly (.FINAL.assembly in previous nomenclature), i.e. final with added gaps, corresponding to the chromosome-length fasta. Note that when your bed annotation would span a breakpoint you'd loose that bed entry (and get some relevant message).

Hope this helps,
Olga

Pavla Navratilova

unread,
Sep 2, 2021, 3:03:16 AM9/2/21
to 3D Genomics
Hi Olga, 
this works well to convert the annotation between Input and the 3D-DNA-generated _HiC.assembly. We have, however also edited that one manually using Juicebox into .review.assembly (and generated .asm and .cprops using convert-assembly-to-cprops-and-asm.awk). 
We did not get working the conversion Input>.review.assembly so far. Should that work directly, analogically or maybe step-wise Input > _HiC > .review? 

Best,
Pavla

Olga Dudchenko

unread,
Sep 2, 2021, 11:27:34 AM9/2/21
to 3D Genomics
After JBAT review you should run the 3D-DNA post-review script and generate the corresponding _HiC.fasta and _HiC.assembly. See Genome Assembly Cookbook for more detail. -Olga
Reply all
Reply to author
Forward
0 new messages