HW3 2c

13 views
Skip to first unread message

Orion

unread,
Dec 1, 2010, 9:24:01 PM12/1/10
to csc2417-f10
I was hoping for a little clarification of 2.c. My current
interpretation is that you want:

1) a table of 64 codons and their frequency in ORFS, in the reading
frame,
2) a table of 21 amino acids and their frequency in ORFS, and
3) a table of 64 codons and their frequency in all DNA not spanned by
any ORF, for all frame shifts of that DNA.

Is this correct?

Thanks,
Orion

Michael Brudno

unread,
Dec 1, 2010, 9:31:51 PM12/1/10
to csc24...@googlegroups.com
Yes.

Orion Buske

unread,
Dec 1, 2010, 11:07:35 PM12/1/10
to csc24...@googlegroups.com
I presume that you want the ORF frequencies to be with respect to the strand of the ORF, yes? And what about the non-ORF codon frequencies? Should those be with respect to the '+' strand, or averaged over both?

Thanks for your patience,
Orion

Michael Brudno

unread,
Dec 1, 2010, 11:11:36 PM12/1/10
to csc24...@googlegroups.com
The strand of the ORF. For the others it really does not matter, the
numbers will be very similar. + strand is fine.

Brian

unread,
Dec 5, 2010, 6:33:18 PM12/5/10
to csc2417-f10
Q2C says: Determine and submit the same table (codon frequency) for
DNA sequences outside of these ORFs (putatively non- coding DNA).

So hypothetically, I have an ORF like this:

CCC ATG ... ORF continues that-a-way--> ...

For the table above, would it contain just one instance of CCC, or one
instance each of CCC, CCA, and CAT?

Then if we look at the complementary strand (reversed):

<-- ORF on opposite strand... TAC GGG

Now, technically on this strand, TAC is not in an ORF, but it is
complementary to part of an ORF. Should the table include TAC, ACG,
and/or CGG?

On Dec 1, 11:11 pm, Michael Brudno <bru...@gmail.com> wrote:
> The strand of the ORF. For the others it really does not matter, the
> numbers will be very similar. + strand is fine.
>
> On Wed, Dec 1, 2010 at 11:07 PM, Orion Buske <orion.bu...@gmail.com> wrote:
> > I presume that you want the ORF frequencies to be with respect to the strand of the ORF, yes? And what about the non-ORF codon frequencies? Should those be with respect to the '+' strand, or averaged over both?
>
> > Thanks for your patience,
> > Orion
>
> > On 1 Dec 2010, at 9:31 PM, Michael Brudno wrote:
>
> >> Yes.
>

Orion Buske

unread,
Dec 5, 2010, 6:48:58 PM12/5/10
to csc24...@googlegroups.com
I interpreted it as all strand shifts of all DNA base-PAIRS that aren't included in any ORF.

Because, if you included sequence that is on the complementary strand of an ORF, you'll still get an ORF frequency bias.

So, for the following example:

CCC TTT ATG ----ORF---> TAG CGC TAT

You'd get the following DNA triplets added to the non-ORF frequencies:
CCC, CCT, CTT, TTT (+ strand)
ATA, TAG, AGC, GCG (- strand)

Just my $0.02. :)

-Orion

Michael Brudno

unread,
Dec 5, 2010, 8:32:57 PM12/5/10
to csc24...@googlegroups.com
Yes, this is the correct interpretation. However if you do it differently, e.g. count triplets from only one strand, that's fine as long as you state this.

-M

Brian

unread,
Dec 5, 2010, 9:34:27 PM12/5/10
to csc2417-f10
Why are you adding triplets from both strands?

Or rather, why is it that before the ORF, you're adding triplets from
the + strand, but after the ORF, you're adding triplets from the -
strand? Shouldn't you be consistently using just the + or the -?

I suppose it doesn't matter, since it's non-coding anyways, so the
codon distribution should be indistinguishable between the two
strands. Just seems an odd choice. My current implementation does:

CCC, CCT, CTT, TTT, CGC, GCT, CTA, TAT.

Orion Buske

unread,
Dec 5, 2010, 9:41:48 PM12/5/10
to csc24...@googlegroups.com
Sorry, good point. I was just trying to address the shifting triplet effect, but didn't follow through fully.

I think Mike recommended just using the positive strand a few days ago (thus getting exactly the codons you got), but it doesn't sound like it matters, as long as you explain what you do.

Yoni

unread,
Dec 7, 2010, 3:55:02 PM12/7/10
to csc2417-f10
I'd like to know about frame shifted codons inside an ORF when we're
looking for the non-coding DNA

For example if we have a single ORF and we're just looking at the +
strand:

ATG CCC TAG

I understand that CCC should not be included, but I would think that
TGC,GCC, CCT, CTA should be included, because they're not coding
triplets.

Is this correct?
Thanks,
Yoni

Recep Çolak

unread,
Dec 7, 2010, 4:03:12 PM12/7/10
to csc24...@googlegroups.com
I guess you shouldn't. Because, the fact that a region is ORF determines its codon characteristics (GC content for example, or evolutionary constraints on the regions) regardless of the frame.
For example if you have a long ORF with high GC content, the codons that include G an/or C will have higher frequency no matter at which frame shift you look at the sequence. 

Yeleiny Bonilla

unread,
Dec 7, 2010, 4:15:04 PM12/7/10
to csc24...@googlegroups.com
Hey guys, in the example you were discussing above:


CCC TTT ATG ----ORF---> TAG CGC TAT

Can we do something like putting all the no-conding  together like :

CCC TTT ATG TAG CGC TAT

And then get all the codons, in all frames,  from this?


Yele.

2010/12/7 Recep Çolak <rco...@cs.toronto.edu>

Orion Buske

unread,
Dec 7, 2010, 4:19:23 PM12/7/10
to csc24...@googlegroups.com
I don't think so.

First, you wouldn't want to include the ATG and TAG codons (those are part of the ORF).

Second, even if you remove those, you would get:
CCC TTT CGC TAT

The problem with this is that if you analyze it now, you'll get the following codons, which didn't actually occur:
TTC, TCG

Yeleiny Bonilla

unread,
Dec 7, 2010, 4:28:14 PM12/7/10
to csc24...@googlegroups.com
yeah sorry taking out the ATG and TAG for sure.

Michael Brudno

unread,
Dec 7, 2010, 5:50:09 PM12/7/10
to csc24...@googlegroups.com
No; you should ignore anything that is inside an ORF, even if that is out of frame, as those sequences will have very different properties and frequencies.

Yue Li

unread,
Dec 7, 2010, 7:59:31 PM12/7/10
to csc24...@googlegroups.com
Do we need to determine the frequencies of the amino acids or just the frequencies of the codons? If it is the latter, then it shouldn't be just "a 21 element frequency table" but rather a 64 element frequency table (preferably with the degenerative codons grouped together just like the table in lecture slide 31 does.)

Also, the table posted on the lecture slide 31 displays "the frequency of usage of each codon (per thousand)" and the "Relative frequency of each codon among synonymous codons". But in our assignment, it seems to me that we just need to determine the frequency of usage and/or relative frequency of each codon among ALL the codons (i.e., not the codons corresponding to the same amino acid) from the putative ORF.

I am not sure which interpretation is correct.

Yue Li

unread,
Dec 8, 2010, 10:45:58 AM12/8/10
to Yue Li, csc24...@googlegroups.com
Sorry. I missed the early message where my question was already addressed. My apology.
Reply all
Reply to author
Forward
0 new messages