On Aug 7, 9:44 am, "C. Titus Brown" <
c...@msu.edu> wrote:
> Any comments on the alignment tutorial I sent out?
On Aug 7, 9:44 am, "C. Titus Brown" <
c...@msu.edu> wrote:
> Any comments on the alignment tutorial I sent out?
allright, I am going to respond to prodding and comment, take
everything with a grain of salt, its is just an opinion. (plus I
wander off into totally other direction around the end of the post)
Note that from the group here I am probably the only one who does not
regularly work with multiple sequence alignments. We usually work with
a single organism, yeast, drosophila and human, and look at the
relative positions of various annotations. So you could say that I
approach the alignment tutorial from an outsider's perspective. My
overall feeling is that it is a bit complicated and counter intuitive
In general I found that every time I need to use these modules/classes
I had to look up the usage and spend some time thinking about how it
works. I think that is a warning sign.
For example using the += sign to add anything feels *very* unusual to
me. Adding two numbers with the + sign has nothing in common with the
operation as adding an element to a set. They just happen to be called
the same word. On top of that += means a destructive operation and its
semantics can be subtle (leading to potentially mindbending bugs). For
example try:
a = b = [1, 2, 3]
then contrast a = a + [4] versus a += [4] and see what happens to b.
For lists it is far more preferable to use a.append(4) because it
makes it clear what happens. Now, moving on when I read:
simple_al += mouse
ival = mouse[40:60]
simple_al[ival] += rat[42:62]
simple_al[ival] += frog[38:58]
I get stumped again. Once because it introduces several layers of
redirection, it is indexing with an object while at same time mutating
it?, plus at this point I am not sure anymore if the += operator in
the first line is doing the same thing as the += operators in the last
two lines. Not of course.
The main problem lies in not knowing what exacty are the objects that
are being manipulated on. I can't look up += so my avenuse to
understanding what is happening are very limited. At least for me the
only way for me to understand this is to do a:
print type(ival)
print type(simple_al[ival])
The start looking up what these objects do. Basically I think there
are a lot of tacit assumptions and even this tutorial expects a lot of
background information to be useful.
Now obviously changing how pygr works internally would be too much to
ask, but I wonder, would it be possible to creat an "lite" submodule
that would expose some of pygr's functionality but in a lot more
accessible way.
What would make a lot more sense is to make the steps more explicit,
where you don't have to guess from the operator what happens instead
you could just read the code and understand approximately what is
happening behind the scenes like this:
>>> sequences = seqdb.SequenceFileDB('data/sp_all_hbb')
>>> mouse_seq = sequences['HBB1_MOUSE']
>>>
>>> alignments = pygr.AlignmentDB('hbb', mode='memory', pairwiseMode=True)
>>>
>>> mouse_ref = alignments.get_reference(mouse_seq)
>>> mouse_ref[40:60].align(rat[42:62])
>>>
>>> alignments.build()
>>>