Request for Feedback: PySplicer 0.2

48 views
Skip to first unread message

Cathal Garvey

unread,
Apr 24, 2013, 8:26:09 AM4/24/13
to diyb...@diybio.eu, diy...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi all,
As some of you may know, a while ago I wrote a byzantine python script
to perform frequency-based reverse translation from amino acids to
codons, with optional exclusion of arbitrary strings of RNA/DNA from
the output.

It worked, just about, although it was truly awful to read and was
fundamentally badly designed, architecture wise. It took a while for
me to notice, in fact, that the exclusion feature suffered a critical
flaw where sequences containing "T" could be overlooked entirely.

Well, I've finally rewritten PySplicer from the ground up. Much of the
"boilerplate" stuff for handling DNA/RNA sequences has been shovelled
into a library called "sequtils", and the rest is the original
pysplicer concept, but managed by clearly separated and more cleanly
written objects.

Also, I've added a new, fairly significant feature: Hairpin detection.
Previously, PySplicer could only do IUPAC-notation exclusion of
pre-determined DNA/RNA patterns, but it will now also detect fairly
simple hairpins in the nucleotide sequences it is assembling and
modifying, and will attempt (by default at least) to remove them.

I'm emailing this all here to ask for feedback; PySplicer was written
for my own use, and I'm planning to put it to use ASAP. But if anyone
with the Python/Bioinfo chops is willing to give it a try and point
out any glaring errors, they could save me a few hundred euro. :)

Of course, like all good things in life, PySplicer is Free/Libre Open
Source Software (FLOSS), so provided you're willing to take the entire
risk of it generating terrible output on your own shoulders, go forth
and do as thou wilt!

Gitorious Repo (Main):
https://gitorious.org/pysplicer

Github Repo (Secondary):
https://github.com/cathalgarvey/PySplicer

Best,
Cathal
- --
Please note my new email: cathal...@cathalgarvey.me
PGP Key: 988B9099
Bitmessage: BM-opSmZfNZHSzGDwdD5KzTnuKbzevSEDNXL
Twitter: @onetruecathal
Code: https://gitorious.org/~cathalgarvey
Blog: http://www.indiebiotech.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBCAAGBQJRd89fAAoJEL0iNgSYi5CZ/c8P/i25jX1m3LDx0M9bVGG6+i93
j+zv2EeinKK8dCW7a4s3MjkfFVJMtPvVWWVmnzjLiad06sJ2z0J3Mdv+2EYafaTt
hk7YGNDZZ7PrbSSN8+sinMWQkEQhzihWQq2WFCm/eTe2Eckr+UcmuJoiZlhwp6hj
wHimdYieb7LdPaZtuBOJk3hqy9nckxfpXk5nlTAqXFVK1tRuYbn2RB1InhXFkpR6
gpqgAeaNnuqcmeVT7hZYAzClPlfcX3XlLEeZd0F+noho32WB8EDmj+q+pXWo/+zy
eUZp0wGUVv7tGH9CihLXbnh07zmwxHcVjudoBTwSd7W3LSq36zI1LoAzpqG+DUy/
rH2hTMFo8R5wPrOMC8pdFG7h2CyvtlfHt69GMMaK1rHIoDTzt6Xn+2cRF2F3HR4x
pU1xGRwHx/Hjrjj7c1tImW4jApVIfoBZmpO6zwg9k9saPoHbUP+jQ8YCkjRFRv/a
o+lC/9+DQyzK3KGGRZLR14UA3V90CUuRm4RRwsWVALcKjlgvQmqxLnb8RpwpLx6w
h4perFBONeSWnDtJl+ULUm9rnpvazoincgPQFbPfFwLywkM7rdKfJaSCBnaT+hmw
SEw6EaYk1cNx2i/UOzU9SmBgG0MIOuq73iglygz8uIA0XxT/Ko0ia4IWA+BssaJu
oNu0in1Drus4TpFebM87
=Y68t
-----END PGP SIGNATURE-----

Bryan Bishop

unread,
Apr 24, 2013, 1:31:43 PM4/24/13
to diy...@googlegroups.com, Bryan Bishop, Cathal Garvey
On Wed, Apr 24, 2013 at 7:26 AM, Cathal Garvey wrote:
> I'm emailing this all here to ask for feedback; PySplicer was written
> for my own use, and I'm planning to put it to use ASAP. But if anyone

Complaints:

* No tests. That should be fixed, especially if you plan on your work
being verifiable multiple months into the future. I believe biopython
has a good test suite that you can go look at (and if not, just use
python-requests' tests as a gold standard).

* AGPL. No thanks.. too many problems with that license.

* You have a hash-bang in setup.py, but nobody ever marks their
setup.py as executable so I'm not sure why you would do that..

* easy_install is usually easy_install and not easy-install

* There's a bunch of lowercase class names.. should you want to be
really truly pedantic, then use pylint or pyflakes. Ultimate pedantry
achieved.

- Bryan
http://heybryan.org/
1 512 203 0507

Cathal Garvey (Phone)

unread,
Apr 24, 2013, 1:48:43 PM4/24/13
to diy...@googlegroups.com
Thanks Bryan!

There are some assorted tests in the individual modules, but they were written for testing by relative import so they may not function as a package. Consider "decent test suite" a roadmap objective.

Can you point me to a good discussion of the AGPL's shortcomings? Nobody say "it's viral!", I know and that's exactly the point.

Hashbang is silly but not a bug. Where's the easy install bug? I copied all that boilerplate. Plus, I don't even use easy-install ;)

Will go look at the style suggesters. If they reference strunck and white (as does PEP8), I'm blaming you!
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Jonathan Cline

unread,
Apr 28, 2013, 1:21:52 AM4/28/13
to diy...@googlegroups.com, diyb...@diybio.eu, cathal...@cathalgarvey.me, jcline


On Wednesday, April 24, 2013 5:26:09 AM UTC-7, Cathal Garvey wrote:

Gitorious Repo (Main):
https://gitorious.org/pysplicer

Github Repo (Secondary):
https://github.com/cathalgarvey/PySplicer
 

Random nitpick stuff:

You could release executables which might be easier to run for many bio types.

In my python coding standards, all filenames must match class names.   In my source code control standards, all filenames must be lowercase, words separated by underscore.  In python.org's recommendations, they suggest not using py prefix for anything.

Your self's internal variables should start with underscore.

Your verbose messages should go to stdout and errors to stderr, if insisting on using the console.  Otherwise write both to a file with a timestamp'ed name which you create on startup, which is written as .csv using the csv package.

Likely you should time each run in microseconds.  This has the beneficial side effect of pointing out areas of needed optimization (perhaps in C).

Each file should have your author comment-banner.

I never use "import xyz from abc" because I think it confuses the namespace.  I only use "import xyz" and use full class names where necessary.


Go ahead and include XNA, you know you want to.

You made sure your use of random is in fact random right?  I haven't used random in python but assume you need a seed and the quick glance doesn't show you seeding.

I only use python 3. I didn't notice a comment anywhere in the source referring to required python version.  Python version changes are a pain, I always list compatibility in every source file.



Tangent.
Your constant PGP signage is bad forum practice imho, similar to having an 8-line signature or etc.  Doubt anyone is impersonating you.  If you'd like authentication then repost all your posts to your own site for verification.


 ## Jonathan Cline
## jcl...@ieee.org
## Mobile: +1-805-617-0223
########################

Reply all
Reply to author
Forward
0 new messages