tutorials

C. Titus Brown

unread,

Aug 7, 2009, 9:44:40 AM8/7/09

to pygr...@googlegroups.com

Hi all,

how are they coming along? anyone have anything written yet?

*prod*

Any comments on the alignment tutorial I sent out?

*prod*

--titus
--
C. Titus Brown, c...@msu.edu

Istvan Albert

unread,

Aug 7, 2009, 12:33:54 PM8/7/09

to pygr-dev

On Aug 7, 9:44 am, "C. Titus Brown" <c...@msu.edu> wrote:

> Any comments on the alignment tutorial I sent out?

allright, I am going to respond to prodding and comment, take
everything with a grain of salt, its is just an opinion. (plus I
wander off into totally other direction around the end of the post)

Note that from the group here I am probably the only one who does not
regularly work with multiple sequence alignments. We usually work with
a single organism, yeast, drosophila and human, and look at the
relative positions of various annotations. So you could say that I
approach the alignment tutorial from an outsider's perspective. My
overall feeling is that it is a bit complicated and counter intuitive

In general I found that every time I need to use these modules/classes
I had to look up the usage and spend some time thinking about how it
works. I think that is a warning sign.

For example using the += sign to add anything feels *very* unusual to
me. Adding two numbers with the + sign has nothing in common with the
operation as adding an element to a set. They just happen to be called
the same word. On top of that += means a destructive operation and its
semantics can be subtle (leading to potentially mindbending bugs). For
example try:

a = b = [1, 2, 3]

then contrast a = a + [4] versus a += [4] and see what happens to b.

For lists it is far more preferable to use a.append(4) because it
makes it clear what happens. Now, moving on when I read:

simple_al += mouse
ival = mouse[40:60]
simple_al[ival] += rat[42:62]
simple_al[ival] += frog[38:58]

I get stumped again. Once because it introduces several layers of
redirection, it is indexing with an object while at same time mutating
it?, plus at this point I am not sure anymore if the += operator in
the first line is doing the same thing as the += operators in the last
two lines. Not of course.

The main problem lies in not knowing what exacty are the objects that
are being manipulated on. I can't look up += so my avenuse to
understanding what is happening are very limited. At least for me the
only way for me to understand this is to do a:

print type(ival)
print type(simple_al[ival])

The start looking up what these objects do. Basically I think there
are a lot of tacit assumptions and even this tutorial expects a lot of
background information to be useful.

Now obviously changing how pygr works internally would be too much to
ask, but I wonder, would it be possible to creat an "lite" submodule
that would expose some of pygr's functionality but in a lot more
accessible way.

What would make a lot more sense is to make the steps more explicit,
where you don't have to guess from the operator what happens instead
you could just read the code and understand approximately what is
happening behind the scenes like this:

>>> sequences = seqdb.SequenceFileDB('data/sp_all_hbb')
>>> mouse_seq = sequences['HBB1_MOUSE']
>>>
>>> alignments = pygr.AlignmentDB('hbb', mode='memory', pairwiseMode=True)
>>>
>>> mouse_ref = alignments.get_reference(mouse_seq)
>>> mouse_ref[40:60].align(rat[42:62])
>>>
>>> alignments.build()
>>>

Christopher Lee

unread,

Aug 7, 2009, 12:50:57 PM8/7/09

to pygr...@googlegroups.com

Hi Istvan,
thank you! This is exactly the kind of feedback we need. It would be
easy to add new method names as you suggested in your example.

A general comment: most of what seems "wierd" about Pygr to an
outsider is that it follows a consistent interface for all its
different types of data; namely, in Pygr everything is a graph
database, and all graph databases have the same basic interface, i.e.

g[node1][node2] = edge

There are little shortcuts e.g.
g += node1 # add a node to the graph
g[node1] += node2 # equivalent to g[node1][node2] = None
i.e. no edge info...

Such patterns are harder to learn initially but later make it easier
to start using other parts of Pygr by just re-using the same patterns.

We can add clear method names that are specific to a particular
subdomain (e.g. Alignment) very easily, to help people get into Pygr
quickly.

-- Chris
On Aug 7, 2009, at 9:33 AM, Istvan Albert wrote:

Istvan Albert

unread,

Aug 7, 2009, 1:14:19 PM8/7/09

to pygr-dev

On Aug 7, 12:50 pm, Christopher Lee <l...@chem.ucla.edu> wrote:
>
> We can add clear method names that are specific to a particular
> subdomain (e.g. Alignment) very easily, to help people get into Pygr
> quickly.

Sounds good. It would be neat to mock up a few potential versions and
run them by people who do not know pygr at all and ask them what they
thought the code did. Often that leads to surprising outcomes.

I have a few students that could give me some opinions, you too
probably.

Istvan

Istvan Albert

unread,

Aug 7, 2009, 1:20:18 PM8/7/09

to pygr-dev

> Any comments on the alignment tutorial I sent out?

And of course, my apologies for forgetting to mention that I actually
liked the tutorial - but I got too worked up in what I did not
like ... ooops

Overall lots of good stuff in it!!! Need to try out some things I did
not know were possible!

Istvan

C. Titus Brown

unread,

Aug 7, 2009, 1:46:41 PM8/7/09

to pygr...@googlegroups.com

On Fri, Aug 07, 2009 at 10:20:18AM -0700, Istvan Albert wrote:
-> > Any comments on the alignment tutorial I sent out?
->
-> And of course, my apologies for forgetting to mention that I actually
-> liked the tutorial - but I got too worked up in what I did not
-> like ... ooops

That's fine -- I take the critique as a sign that you now understand the
NLMSA stuff well enough to figure out where it doesn't match your
internal models, which means the tutorial fulfilled its mission ;)

cheers,

Reply all

Reply to author

Forward