I'm done with my first pass through seqdb.py. The results are available
on github; as before, I changed only seqdb.py and seqdb_test.py.
---
You can see seqdb.py here,
http://github.com/ctb/pygr/blob/6f73203f6da3d85e24a958911765c07a896ce1de/pygr/seqdb.py
and the new test diff here,
http://github.com/ctb/pygr/commit/f4fc1d59025859d33b79c4b160697c645215059e#diff-3
---
I didn't find any more bugs -- darn it ;).
These changes increased the code coverage by about 20% to 85%, which is
getting pretty respectable!
Review comments follow. Comments solicited; where I suggest backwards
incompatible changes, I don't see any likely breakages within pygr
itself, but I don't know what would happen within existing software
that uses pygr, so please let me know if there's a good reason to
keep things the way they are.
(If you want to just browse the source file, look for '@CTB' comments.)
After dealing with the issues below, and maybe figuring out how to test
the caching, I'll start writing up more documentation.
General comments
----------------
The class names in this file still confuse me. Do we have any rules
on how to build names for classes in inheritance hierarcy? e.g.
FileDBSequence and SequenceFileDB -- which one is the sequence db,
which one is the sequence??
The invert behavior is confusing (__invert__ points to
classutil.standard_invert which then redirects to self._inverse,
creating it from self._inverseClass if necessary). Can we use
descriptors or something "simpler" here?
Have we decided anything about logging, as Istvan established in his
branch? I'd like to get rid of explicit writes to sys.stderr, in
particular.
I'd like to establish some documentation or tests on exactly what
db.seqInfoDict should do. So far I know that it should contain info
objects with a length attribute, for NLMSAs to use; and an id and
sequence, presumably. What else? Ultimately it'd be good to
establish a simple executable test for "proper" seqInfoDict behavior.
Specific issues
---------------
I don't like some aspects of the 'ifile' handling in SequenceFileDB.
In particular, I dislike the part where it re-opens the file to
re-read it and then closes 'ifile'; this may not be expected behavior
('ifile' itself is only used to grab the filename!) and I don't like
closing user-passed-in file handles. Plus, pulling the filename out
of a file handle disallows StringIO and other "file-like" handles such
as sockets... Is the ifile mechanism actually used anywhere?
The 'set_seqtype' function on SequenceDB depends on things being a FASTA file
instead of just iterating. Why?? Can we remove that?
Also, 'set_seqtype' returns the sequence type as well as setting it on
the db object. This behavior isn't used anywhere in the code base. Can
we trash it?
In fact, it seems like 'set_seqtype' is used only within SequenceDB
and its children. Why not rename it to be private, i.e. prepend it
with '_'?
Should SequenceDB._cacheMax be configurable? Right now it's hardcoded.
I don't know what idFilter does in _store_seqlen_dict, and it seems
like it would be easier to just implement that in your own
seqfmt-based reader, anyway. Can we trash it? It's not used anywhere
in the code base.
The 'Sequence' classes in seqdb (e.g. FileDBSequence) should never be
created directly by a user, right? Should we make them private by
prepending a '_'?
SeqPrefixUnionDict is oddly named.
PrefixUnionDict could contain __iadd__ behavior (instead of
SeqPrefixUnionDict, which inherits from it solely to add __iadd__
behavior). Why not do things this way?
There's an 'if' statement in SeqPrefixUnionDict.__iadd__ that can
never be reached, as far as I can tell. Could someone (Chris) please
verify that for me?
In two places (SeqPrefixUnionDict.__iadd__, find @CTB untested; and
_PrefixUnionDictInverse.__getitem__, find @CTB abstraction) explicit
knowledge of annotation objects is assumed in classes that otherwise
don't know anything special about them. Any way to deal with this
better or should I just suck up and deal & write some tests?
A few PrefixUnionDict issues:
- PrefixUnionDict.__contains__ can take either a sequence ID or a
sequence object. This is inconsistent with __getitem__ (which only
takes seq IDs). Ugly.
- PrefixUnionDict.__init__ takes an argument 'trypath' that is really a
list of paths to try. Could we rename this to 'trypaths'? Or should
I just suck up and deal & write a nice doc?
- PrefixUnionDict.writeHeaderFile function (etc.) isn't used anywhere
in the code. How useful is it?
- PrefixUnionDict.newMemberDict(...) and the associated
_PrefixUnionDictMember class don't have a clear purpose to me.
Could someone give me a concrete use case? If not, maybe they should
be removed or something; they're confusing!
- _PrefixUnionDictMember.default handling could probably be written
in a more standard way using dict.setdefault, no? How much code depends
on this specific behavior?
The SliceDB, VirtualSeq, and VirtualSeqDB classes are untested and
unused in the rest of the code base. I'd appreciate some guidance on
what they do, and perhaps some tests to get started. (@CTB issue)
The SQLSequence stuff at the bottom of seqdb.py could probably go in a
different file, like sqlgraph.py. They don't depend on anything in
seqdb.py.
cheers,
--titus
--
C. Titus Brown, c...@msu.edu
On Mar 14, 2009, at 6:14 PM, C. Titus Brown wrote:
> I didn't find any more bugs -- darn it ;).
Just because you're not paranoid doesn't mean they aren't out to ...
whatever.
>
>
> These changes increased the code coverage by about 20% to 85%, which
> is
> getting pretty respectable!
I'm sure Oscar Wilde said something witty and cutting about
respectability, but it has slipped my mind.
>
> The class names in this file still confuse me. Do we have any rules
> on how to build names for classes in inheritance hierarcy? e.g.
I believe Steve McConnell's books (e.g. Code Complete) cover these
naming convention issues. We could look to see what he says.
> FileDBSequence and SequenceFileDB -- which one is the sequence db,
> which one is the sequence??
I guess I'm just following English, e.g. "sequence database" is a
database; vs. "database sequence" is a sequence. If someone has a
better convention, please describe it.
>
>
> The invert behavior is confusing (__invert__ points to
> classutil.standard_invert which then redirects to self._inverse,
> creating it from self._inverseClass if necessary). Can we use
> descriptors or something "simpler" here?
Please elaborate further; I don't understand what you're suggesting.
The Python standard defines __invert__ as the invert operator method
name; I follow that standard. How can a descriptor help us here?
>
>
> Have we decided anything about logging, as Istvan established in his
> branch? I'd like to get rid of explicit writes to sys.stderr, in
> particular.
Seems like a worthy goal. Has anyone entered an issue for this in the
tracker? I'm not aware that anyone has explicitly proposed this for
0.8.
>
>
> I'd like to establish some documentation or tests on exactly what
> db.seqInfoDict should do. So far I know that it should contain info
> objects with a length attribute, for NLMSAs to use; and an id and
> sequence, presumably. What else? Ultimately it'd be good to
> establish a simple executable test for "proper" seqInfoDict behavior.
Hmm... I certainly don't envision *requiring* a "sequence" attribute.
I made seqInfoDict a standard mechanism for getting information about
a specific sequence ID without having to do the full set of operations
for instantiating a sequence object (which, depending on the backend
implementation, might involve actually retrieving the sequence, and
thus could be slow). The only seqInfoDict value attribute currently
required in Pygr is the length. However you and Alex Nolley pointed
out the value of being able to associate arbitrary attributes with
sequence objects (initially, the FASTA description line), so I
formulated seqInfoDict as a general-purpose mechanism for such
attributes.
I suggest that the list of attributes required on all seqInfoDict
values be minimal -- only what Pygr core code simply can't live
without. (e.g. in the case of NLMSA, it must know the length in order
to pack the coordinate system). Of course, users' code can look for
other seqInfoDict attributes if they need to. We should just define
consistent names, so that different backends will use the same
attribute name for a given kind of data (e.g. free-form text
description like the FASTA description line could be called
"description"). But unless there is a compelling reason why Pygr's
core code needs the "description", I don't think we should add a test
that asserts the absence of that attribute is a bug.
So the executable test you're proposing only needs to look for the
"length" attribute.
>
>
> Specific issues
> ---------------
>
> I don't like some aspects of the 'ifile' handling in SequenceFileDB.
> In particular, I dislike the part where it re-opens the file to
> re-read it and then closes 'ifile'; this may not be expected behavior
> ('ifile' itself is only used to grab the filename!) and I don't like
> closing user-passed-in file handles. Plus, pulling the filename out
> of a file handle disallows StringIO and other "file-like" handles such
> as sockets... Is the ifile mechanism actually used anywhere?
This was used by some coordinator module functionality that is made
obsolete by the much more powerful capabilities of pygr.Data. We can
certainly get rid of this ifile mechanism.
>
>
> The 'set_seqtype' function on SequenceDB depends on things being a
> FASTA file
> instead of just iterating. Why?? Can we remove that?
The reason this exists is historical. At one point Namshin reported
that opening a sequence database had become very slow, and it was due
to the set_seqtype() method invoking the iterator instead of our
original read_fasta_one_line(), which I had removed during our seqdb
refactoring. The seqdb iterator invoked the Python shelve __iter__(),
which turned out to load the entire list of keys into memory before
returning the iterator; very slow for solexa data! To get a fix to
Namshin quickly, I first restored the use of read_fasta_one_line,
which solved the problem. I later was able to solve the shelve
problem; because we had previously found btree to be more scalable
than the default hash-based shelve, we had created our own subclass of
shelve. So I just added a proper __iter__() method to our subclass.
Conclusion: we could again get rid of read_fasta_one_line().
>
>
> Also, 'set_seqtype' returns the sequence type as well as setting it on
> the db object. This behavior isn't used anywhere in the code base.
> Can
> we trash it?
Sounds fine to me.
>
>
> In fact, it seems like 'set_seqtype' is used only within SequenceDB
> and its children. Why not rename it to be private, i.e. prepend it
> with '_'?
Fine by me. Actually, this seems like a topic ripe for code review.
Python is not that zealous about the whole public vs. private
distinction, and I've accordingly been lax about this. We could go
through the code and designate all purely internal methods, then
prefix them with _ .
>
>
> Should SequenceDB._cacheMax be configurable? Right now it's
> hardcoded.
Sure, why not.
>
>
> I don't know what idFilter does in _store_seqlen_dict, and it seems
> like it would be easier to just implement that in your own
> seqfmt-based reader, anyway. Can we trash it? It's not used anywhere
> in the code base.
idFilter addressed the nasty reality that one major database source
(NCBI) abuses their ID field (treating it as a blob into which they
can cram as many junk fields as they want... and then blastall returns
an ID that does not match the original in the file). idFilter allows
you to do any transformations on the input IDs that you want.
However, as a result of the seqdb refactoring, this may no longer be
needed in the sequence file db class itself. All the code for the
NCBI blast ID munging got moved to blast.py and is handled there. We
could get rid of idFilter in _store_seqlen_dict.
>
>
> The 'Sequence' classes in seqdb (e.g. FileDBSequence) should never be
> created directly by a user, right? Should we make them private by
> prepending a '_'?
Throughout pygr we allow users to pass an itemClass to a database
constructor, so the item classes are public. The _init_subclass
pattern tended to modularize all the backend details in the item
class, making the database class very generic (works with any
itemClass). Based on your advice, I've reversed that policy, so that
the backend details are on the database class. But still, I can
easily imagine scenarios in which a user would want to subclass
FileDBSequence, so I don't see these as private.
>
> The SliceDB, VirtualSeq, and VirtualSeqDB classes are untested and
> unused in the rest of the code base. I'd appreciate some guidance on
> what they do, and perhaps some tests to get started. (@CTB issue)
I don't think anyone has used these classes in at least a couple of
years. We could delete them.
>
>
> The SQLSequence stuff at the bottom of seqdb.py could probably go in a
> different file, like sqlgraph.py. They don't depend on anything in
> seqdb.py.
That seems like a reasonable idea.
I'll answer the PrefixUnionDict questions in a separate email.
Yours with thanks,
Chris
(...without looking at the code or understanding the issues in
detail...)
I think we might as well try. I will put this on my list of things to
look at this weekend.
thanks,
--titus
Welcome!
-> > The class names in this file still confuse me. Do we have any rules
-> > on how to build names for classes in inheritance hierarcy? e.g.
->
-> I believe Steve McConnell's books (e.g. Code Complete) cover these
-> naming convention issues. We could look to see what he says.
->
-> > FileDBSequence and SequenceFileDB -- which one is the sequence db,
-> > which one is the sequence??
->
-> I guess I'm just following English, e.g. "sequence database" is a
-> database; vs. "database sequence" is a sequence. If someone has a
-> better convention, please describe it.
So this is more of a pragmatic issue for me than a style question: I
just get confused when reading those names in code. At this point I've
mentally over-analyzed the situation so I can't think clearly about it
any more.
I think FileDB_Sequence and Sequence_FileDB would be clearer to me,
emphasizing that the adjectival associations are (FileDB)+Sequence and
Sequence+(FileDB). But those are ugly names so I don't want to do that.
Is this a mental problem unique to me? Or does anyone else get
confused?
-> > The invert behavior is confusing (__invert__ points to
-> > classutil.standard_invert which then redirects to self._inverse,
-> > creating it from self._inverseClass if necessary). Can we use
-> > descriptors or something "simpler" here?
->
-> Please elaborate further; I don't understand what you're suggesting.
-> The Python standard defines __invert__ as the invert operator method
-> name; I follow that standard. How can a descriptor help us here?
It's not __invert__ itself, it's the optimization that constructs
__invert__ only when needed. I don't think
class SomeClass(...):
__invert__ = classutil.standard_invert
...
_inverseClass = FooBar
is an obvious construct, although I do understand why you defined things
that way. How about,
__invert__ = classutil.lazy_create(FooBar)
instead, for example? Then FooBar would be obviously and directly
connected to __invert__.
-> > Have we decided anything about logging, as Istvan established in his
-> > branch? I'd like to get rid of explicit writes to sys.stderr, in
-> > particular.
->
-> Seems like a worthy goal. Has anyone entered an issue for this in the
-> tracker? I'm not aware that anyone has explicitly proposed this for
-> 0.8.
Istvan? How much work would this be, do you think?
Maybe we could at least put the mechanisms in, and then convert the code
base slowly?
-> > I'd like to establish some documentation or tests on exactly what
-> > db.seqInfoDict should do. So far I know that it should contain info
-> > objects with a length attribute, for NLMSAs to use; and an id and
-> > sequence, presumably. What else? Ultimately it'd be good to
-> > establish a simple executable test for "proper" seqInfoDict behavior.
[ ... info on seqInfoDict ... ]
OK, I'll write this up in some nice way in the module itself, and then
you can check it over. I think I understand seqInfoDict now, at any
rate ;)
-> So the executable test you're proposing only needs to look for the
-> "length" attribute.
[ ... ]
OK. I'll just put such a statement into the documentation on building
your own seqdb implementation, for now.
-> > I don't like some aspects of the 'ifile' handling in SequenceFileDB.
-> > In particular, I dislike the part where it re-opens the file to
-> > re-read it and then closes 'ifile'; this may not be expected behavior
-> > ('ifile' itself is only used to grab the filename!) and I don't like
-> > closing user-passed-in file handles. Plus, pulling the filename out
-> > of a file handle disallows StringIO and other "file-like" handles such
-> > as sockets... Is the ifile mechanism actually used anywhere?
->
-> This was used by some coordinator module functionality that is made
-> obsolete by the much more powerful capabilities of pygr.Data. We can
-> certainly get rid of this ifile mechanism.
OK, that will definitely simplify the code -- fantastic!
-> > The 'set_seqtype' function on SequenceDB depends on things being a
-> > FASTA file
-> > instead of just iterating. Why?? Can we remove that?
[ ... __iter__ problem on bsddb ... ]
-> Conclusion: we could again get rid of read_fasta_one_line().
OK, great!
-> > Also, 'set_seqtype' returns the sequence type as well as setting it on
-> > the db object. This behavior isn't used anywhere in the code base.
-> > Can
-> > we trash it?
->
-> Sounds fine to me.
Great!
-> > In fact, it seems like 'set_seqtype' is used only within SequenceDB
-> > and its children. Why not rename it to be private, i.e. prepend it
-> > with '_'?
->
-> Fine by me. Actually, this seems like a topic ripe for code review.
-> Python is not that zealous about the whole public vs. private
-> distinction, and I've accordingly been lax about this. We could go
-> through the code and designate all purely internal methods, then
-> prefix them with _ .
I'll just assume this as part of the module-level code review, then. It
works well with systematically improving code coverage analysis, as you
can quickly discover where you forgot compensatory _ prefixes when you
have near-100% code coverage...
-> > Should SequenceDB._cacheMax be configurable? Right now it's
-> > hardcoded.
->
-> Sure, why not.
ok
-> > I don't know what idFilter does in _store_seqlen_dict, and it seems
-> > like it would be easier to just implement that in your own
-> > seqfmt-based reader, anyway. Can we trash it? It's not used anywhere
-> > in the code base.
->
-> [ ... ] We could get rid of idFilter in _store_seqlen_dict.
Excellent!
-> > The 'Sequence' classes in seqdb (e.g. FileDBSequence) should never be
-> > created directly by a user, right? Should we make them private by
-> > prepending a '_'?
->
-> Throughout pygr we allow users to pass an itemClass to a database
-> constructor, so the item classes are public. The _init_subclass
-> pattern tended to modularize all the backend details in the item
-> class, making the database class very generic (works with any
-> itemClass). Based on your advice, I've reversed that policy, so that
-> the backend details are on the database class. But still, I can
-> easily imagine scenarios in which a user would want to subclass
-> FileDBSequence, so I don't see these as private.
OK, makes sense. I was -0 on doing this and you've explained why ;)
-> > The SliceDB, VirtualSeq, and VirtualSeqDB classes are untested and
-> > unused in the rest of the code base. I'd appreciate some guidance on
-> > what they do, and perhaps some tests to get started. (@CTB issue)
->
-> I don't think anyone has used these classes in at least a couple of
-> years. We could delete them.
OK, great!
-> > The SQLSequence stuff at the bottom of seqdb.py could probably go in a
-> > different file, like sqlgraph.py. They don't depend on anything in
-> > seqdb.py.
->
-> That seems like a reasonable idea.
I'll try it out & see what happens...
OK, I see how this is clearer in a single line than setting __invert__
and _inverseClass in two consecutive lines in my original code. Your
lazy_create() would return a callable function that behaves like my
existing standard_invert. Do you want to implement this, or should I?
-- Chris
I suspect that incorporating your library blast processor into the 0.8
release may be complicated by the fact that BlastDB itself is now
deprecated. We refactored all the blast support functionality into a
separate module (pygr.blast); BlastDB is retained only for backward
compatibility. I'll have to look at your code to see exactly what
it's assuming and how easily this could be "ported" to our new
BlastMapping construct. I guess your processor could be made into a
BlastMapping method. Then there is the issue of testing -- do you
have (or could you write) a test suite that would test this
thoroughly, that we could incorporate into our standard tests or
megatests (if the tests would take too long or require resources too
big to be incorporated in the standard release package)?
FYI, I have summarized the blast refactoring in the Pygr Dev
discussion group and also somewhat in the issue tracker.
Unfortunately, I haven't updated the regular documentation yet,
because Istvan changed our documentation format to Restructured Text,
and we (I) have to convert all the existing docs (50,000 words worth)
to REST before updating its content. The switch to REST seems like a
great idea, but it has slowed the updating of the docs to reflect all
the 0.8 changes.
-- Chris
> There is an issue in the tracker for this:
>
> http://code.google.com/p/pygr/issues/detail?id=61
Great!
> I'd say that to get it incorporated the logger module now currently
> under tests/testlib/logger.py would need to be copied/moved to the
> main pygr package, then each sys.stderr message would need be replaced
> with a call to logger.debug(), info(), warn() or error().
I guess that would mean we'd need a test suite for logger's
functionality, since it's now being incorporated as part of Pygr?
>
>
> Probably it's best if only one module is migrated at once, to see how
> it works out in practice.
> The issue above also mentions the verbosity of these messages that
> would be best to be made shorter. It is fairly straightforward to do,
> all it needs some on-the-fly decision making on what severity level
> corresponds to each of the messages. (severity levels rise in the
> order listed above).
There aren't *too* many cases where we currently print to stdout /
stderr. Changing these to logger calls should not break algorithmic
correctness, so once logger itself is tested, I see no reason not to
make this change in one clean sweep.
Release Management Question: is this a functionality change (and
therefore delays the alpha release), or is this "debugging" that can
occur after the alpha?
> The way I typically do it is that I have debug messages be littered
> throughout the code but these are suppressed by default and get turned
> on only when testing with higher verbosity modes (1 or 2). Info and
> above are printed by default.
I agree, with only one caveat. Unlike a compiled language like C
where debug messages can even be placed deep in inner loops, because
compiling without debug flags takes them out during preprocessing, in
an interpreted language like Python there is a performance hit for
every debug statement (presumably an IF statement has to be executed
for each one). So this becomes unattractive in code where performance
is important. Do you know of any good way around this in Python?
-- Chris
I'd like to be jealous and keep my metaphysical "lock" on seqdb, if you
don't mind... I'll make the patch discrete & clean and send it on to you
for review when I do it.
So: I will.
Removed.
-> > The 'set_seqtype' function on SequenceDB depends on things being a
-> > FASTA file
-> > instead of just iterating. Why?? Can we remove that?
[ ... ]
-> Conclusion: we could again get rid of read_fasta_one_line().
Removed.
-> > Also, 'set_seqtype' returns the sequence type as well as setting it on
-> > the db object. This behavior isn't used anywhere in the code base.
-> > Can
-> > we trash it?
->
-> Sounds fine to me.
Removed.
-> > In fact, it seems like 'set_seqtype' is used only within SequenceDB
-> > and its children. Why not rename it to be private, i.e. prepend it
-> > with '_'?
->
-> Fine by me.
done.
-> > I don't know what idFilter does in _store_seqlen_dict, and it seems
-> > like it would be easier to just implement that in your own
-> > seqfmt-based reader, anyway. Can we trash it? It's not used anywhere
-> > in the code base.
->
-> [ ... ] We could get rid of idFilter in _store_seqlen_dict.
Done; I removed the 'ifile' mechanism from there, too, since it's no
longer needed.
-> > The SliceDB, VirtualSeq, and VirtualSeqDB classes are untested and
-> > unused in the rest of the code base. I'd appreciate some guidance on
-> > what they do, and perhaps some tests to get started. (@CTB issue)
->
-> I don't think anyone has used these classes in at least a couple of
-> years. We could delete them.
Removed.
All changes tested & pushed to my github pygr repo, branch seqdb_review.
done; now you can do
__invert__ = classutil.lazy_create_invert(_SequenceDBInverse)
in a class definition to get a one-time-only lazily created __invert__
object per parent class. This object is saved in self._inverseObj.
I could have made the function entirely generic, e.g.
__invert__ = classutil.lazy_create(_SequenceDBInverse)
and stored the created object in an isolated and unreachable scope, but
then the object would have been inaccessible for debugging purposes,
too. If it is (or ends up being) a frequent pattern we might want to do
this anyway, but I thought it was a poor tradeoff for now.
Note, I didn't remove standard_invert from throughout the codebase, only
from within seqdb.py. We can change it elsewhere as we do code reviews.
Moved. Works fine!
-> I'll answer the PrefixUnionDict questions in a separate email.
I didn't see anything on this -- no hurry, just wanted to make sure I
didn't miss something in the e-mail flurry ;)
I agree, at least for the moment. The logger module is pretty trivial
and if we use it everywhere we'll quickly find out when it breaks ;)
-> > Release Management Question: is this a functionality change (and ?
-> > therefore delays the alpha release), or is this "debugging" that can ?
-> > occur after the alpha?
->
-> I think the logger should be added into the pygr package as soon as
-> possible and new
-> code should make use of it. Switching all prints to logging calls is
-> sort of an 'scratch an itch' like task, one evening when the time is
-> right someone can just swap out the prints to logger calls in 10
-> minutes. The important thing is to have the logger module present so
-> that we can actually do it.
So, I put this change into my seqdb_review branch and changed seqdb.py
to make use of the new pygr.logger module; you can see the resulting
code changes here,
http://github.com/ctb/pygr/commit/e96845367ecd7c404c26a623be5adb705b0b4ba9
The only problem I ran into is that I would like to turn off the logging
messages from the main code when running the test suite! Right now I've
done that by using 'logger.debug' in seqdb.py; to see how annoying it is
to do anything else, try changing them to 'logger.info' and running the
tests...
Yes, but I think the info/warn/error messages from the test code should
be printed under different circumstances than the messages from the pygr
code.
-> PS> So, I put this change into my seqdb_review
->
-> hey Titus, you have a seqdb_review and a seqdb-review branch up there,
-> something gut pushed onto a different branch
Yeah, I know. it's seqdb_review that you should look at; I'll delete
the other one.
--t
No, I mean redirecting the pygr.logger stuff while the tests themselves
are running. This would probably mean reintroducing a separate test
logging module.
I'll play around with it next time it irritates me and see if I can come
up with a clean solution.
Exactly.
-> One solution that I briefly entertained is to rewrite (provide my own)
-> TextTestRunner class, but was never sufficiently mad to actually do
-> it.
That way lies madness and/or nose.
;)
--titus
I noticed that in seqdb-review you reverted some of my changes, e.g.
you changed 'seqID' back to 'id'. I prefer 'seqID', both because it
states specifically what the variable is, and because of the concern
stated above. I propose that we be consistent throughout Pygr in
naming such variables 'someID', i.e. with a meaningful prefix that
tells you what kind of identifier it is, and with ID in upper-case.
That's the pattern I am now following generally.
What do you think?
-- Chris
sure -- I thought you'd standardized on 'id' and was moving the other
direction ;)
What do you think? The challenge will be to identify what if any code
relies on the old behavior... grrr. It's quite possible that no
existing code uses that behavior, but checking that will take some work.
-- Chris
-- Chris
Hmm, probably. It's pretty easy to implement... sorry I missed that,
I'm not very conversant with __invert__ behavior ;)
How about deprecating it -- raising a warning in 0.8, perhaps? -- and
then removing it in 0.9?
Or, we could remove it in 0.8 after we get the megatests working....
Many thanks to Titus, who has done an incredible amount of work to
clean up and improve the seqdb module!
We have now completed a series of major refactor / reintegrate cycles
for 0.8 using our various git branches:
- Istvan's reworking of the test framework
- Istvan's replacement of the documentation framework
- Titus code review and cleanup of seqdb
- my pygr.Data / metabase refactoring
- Marek's & my updating of the megatests
Taken together, these position us very well for the upcoming 0.8
release. We can release the 0.8 alpha (function freeze) within a week
I think. Congratulations and thanks to everyone who has helped to
make all this happen!
Next we need to update the docs to reflect everything new / changed in
0.8!
Yours,
Chris
hurray!
--titus