[Genome] occludin-gene duplication

Taipalensuu Jan

unread,

Nov 10, 2005, 10:46:12 AM11/10/05

to

Is this due to an artefact during sequence assembly or is it a real duplication?

While using the UCSC May 2004 assembly of the human genome for PCR-primer design we identified a genomic region that is highly similar to part of the occludin gene. The sequence identity is 99,8 %, and extends from exon 4_+19781 nt down to at least 500 nt downstream of the last exon (exon 9) of the occludin gene. The duplicated region is thus located on the minus-strand between 70425579 ? 70403677 bp of chr 5. The position of the corresponding occludin-region (plus strand) is 68865473 ? 68887387 bp of chr 5.

In the July 2003 assembly there are two high similarity duplications of the same part of occludin gene in this genomic region.

Sincerely,

Jan Taipalensuu

Jennifer Jackson

unread,

Nov 10, 2005, 12:26:28 PM11/10/05

to

Hello Jan,
While the final decision about whether this region represents a
biological duplication event or an assembly artifact is up to you, I can
point you to some additional evidence contained in our browser tracks
that you may find helpful.

Variation and Repeats: Segmental Dups
Duplications of >1000 Bases of Non-RepeatMasked Sequence
With the browser region centered at position chr5:68,865,473-68,887,387,
set this track to full. The duplication between the two regions that you
mention is annotated here. If the position is changed to
chr5:70,403,678-70,425,578, the reverse duplication is also annotated.
Position: chr5:68865474-68944233 <-> Other Position:
chr5:70346809-70425579 (-)

Genes and Gene Prediction Tracks: Yale Pseudo
This track shows identified pseudogenes as recorded in the Yale
Pseudogene Database.
With the browser region centered at position chr5:70,403,678-70,425,578,
this track contains no data. This tells me this group has not yet
identified the region as a pseudogene.

Genes and Gene Prediction Tracks: RefSeq Genes
The RefSeq Genes track shows known protein-coding genes taken from the
NCBI mRNA reference sequences collection (RefSeq).
With the browser region centered at position chr5:70,403,678-70,425,578,
this track contains no data. This means that the other position
(chr5:68,865,473-68,887,387) is the best match and that these two
regions differ in the coding region.

Please see each track's description page for methods used to generate
the data sets. From this limited analysis, it appears that the region is
a known inexact duplication but that the cause is undetermined.
Exploring other annotation tracks may also be useful, these are just an
example of where I would start.

Hope this helps. Please let us know if we can be of additional assistance,
Jennifer Jackson
UCSC Genome Bioinformatics Group

>_______________________________________________
>Genome maillist - Genome at soe.ucsc.edu
>http://www.soe.ucsc.edu/mailman/listinfo/genome
>
>

Heather Trumbower

unread,

Nov 10, 2005, 7:41:11 PM11/10/05

to

Jan:

I'd like to add a bit to my colleague's useful suggestions.

I believe that the region at about chr5:68,750,000-71,000,000 in the
May 2004 human assembly is a region of high duplication. In
addition to the Segmental Dups track that Jen mentioned, you can also
see this in the WSSD Duplication and Self Chain tracks. OCLN lies
just on the edge of this range. You can see by viewing the Known Genes
and RefSeq tracks that there are a number of full gene annotations that
are mapped twice within this region, such as SERF1A, SMN2 and GTF2H2.

Cheng, She, Church, Eichler, et.al. recently published the results of
their work to identify segmental duplications in chimp, and compare
these to human. ("A genome-wide comparison of recent chimpanzee and human
segmental duplications", Nature 437:88-93, Sept. 1, 2005). They write that
"there is a particular bias for human-specific duplications noted on chromosomes
5 and 15."

The chromosome coordinator for chr5 is Jeremy Schmutz from the Stanford
Human Genome Center (jeremy at shgc.stanford.edu), as listed at our
page http://genome.ucsc.edu/goldenPath/credits.html#human_credits. I will
write to Jeremy about this region, and copy you on the message. Perhaps
he will have further insights that he can share with us.

Heather Trumbower
UCSC Genome Bioinformatics Group

> >While using the UCSC May 2004 assembly of the human genome for PCR-primer design we identified a genomic region that is highly similar to part of the occludin gene. The sequence identity is 99,8 %, and extends from exon 4_+19781 nt down to at least 500 nt downstream of the last exon (exon 9) of the occludin gene. The duplicated region is thus located on the minus-strand between 70425579 ??? 70403677 bp of chr 5. The position of the corresponding occludin-region (plus strand) is 68865473 ??? 68887387 bp of chr 5.

Reply all

Reply to author

Forward

[Genome] occludin-gene duplication - artifact or real?

Taipalensuu Jan

Jennifer Jackson

Heather Trumbower