bioinformatics toolkit in clojure: what would that look like?

188 views
Skip to first unread message

jandot

unread,
Jun 27, 2010, 6:15:47 PM6/27/10
to Clojure
Hi all,

I have been a ruby user for several years and have contributed to the
bioruby toolkit for bioinformatics. Lately however I got interested in
clojure as it's a functional language and should be very good for
working with the huge datasets we have to handle.

Although there are bioinformatics toolkits for many OO languages
(biojava, bioperl, biopython and bioruby), nothing similar exists for
clojure yet. And I'd be interested to start building such toolkit
while I learn the language. At first for my own use, but maybe
later... who knows.

Being new to functional languages, I wonder how such a toolkit would
be best approached. In an OO language you create classes with
properties and methods that describe one particular entitiy in the
field. For example: you define a DNASequence class with a "name" and
"sequence" property, and a method to print it out in an international
standard text format, and another method for translating the DNA
sequence in that of the resulting protein. Much of the functionality
of these toolkits is about retrieving a bit of information,
manipulating it and ultimately writing it to screen/file.

As functional languages are more about verbs than nouns: how could a
bioinformatics toolkit be idiomatically set up? Would it still be the
Right Way (TM) to create some type of classes, a-la OO?

For more information on the OO toolkits, see www.bioperl.org, www.biojava.org,
bioruby.org and biopython.org.

As clojure (especially combined with incanter) seems to be a very good
candidate for future work in bioinformatics, I would very much welcome
a little discussion on this.

Many thanks,
jan.

Nicolas Oury

unread,
Jun 28, 2010, 2:22:57 PM6/28/10
to clo...@googlegroups.com
Hi, 

I am using Clojure for bioinformatics, but not the same kind of stuff.
I am writing a stochastic simulator.

Would love to discuss more about your idea though.

You can have something quite close to what you describe as the OO approach with
protocols and data-types.
+ by using defrecord, you can easily have generic functions.

That would the easier approach, I think.

I will have a look to bio*.

But if you want to start a project, count me in.
 

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Rob Lachlan

unread,
Jun 28, 2010, 5:04:48 PM6/28/10
to Clojure
The clojure way seems to be to do as much as possible with functions
on raw (immutable) data, rather than building up object systems. The
sequence is already one of clojure's primary abstractions, and it may
not always *need* to wrapped in something like defrecord. (Though for
some applications, I'm sure it will.)

Anyway, I'd be interested in contributing, whatever design decisions
are ultimately made.

Rob
> > clojure+u...@googlegroups.com<clojure%2Bunsu...@googlegroups.com >

Nicolas Oury

unread,
Jun 28, 2010, 5:27:42 PM6/28/10
to clo...@googlegroups.com


On Mon, Jun 28, 2010 at 10:04 PM, Rob Lachlan <robert...@gmail.com> wrote:
The clojure way seems to be to do as much as possible with functions
on raw (immutable) data, rather than building up object systems.  The
sequence is already one of clojure's primary abstractions, and it may
not always *need* to wrapped in something like defrecord.  (Though for
some applications, I'm sure it will.)
 
I agree. One totally different approach to OO would be to use Seq as much as possible and add
informations about them in meta.

There is also room for purely functional OO-style with records and a few protocols when needed...

Moritz Ulrich

unread,
Jun 28, 2010, 5:30:22 PM6/28/10
to clo...@googlegroups.com
On Mon, Jun 28, 2010 at 11:27 PM, Nicolas Oury <nicola...@gmail.com> wrote:
> I agree. One totally different approach to OO would be to use Seq as much as
> possible and add
> informations about them in meta.

Please don't store important information (like equality-relevant
information) in metadata, that's not what metadata is for. (Ignore
this if I misunderstand you)


--
Moritz Ulrich
Programmer, Student, Almost normal Guy

http://www.google.com/profiles/ulrich.moritz

Edmund Jackson

unread,
Jun 29, 2010, 4:25:13 AM6/29/10
to clo...@googlegroups.com
Hi Jan,

Perhaps R's excellent bioconductor project could be mapped nicely into Incanter (Clojure's R) ?

Edmund

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com

> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Edmund

Jeff Rose

unread,
Jun 29, 2010, 5:50:50 AM6/29/10
to Clojure
Hi Jan,
After coming from Ruby and previous OO languages I think many of us
have the same questions. For starters, I'd recommend reading a couple
other libraries to get a sense for how people organize libraries.
That will probably give you the most concrete sense for how to really
get started. Beyond that, I think it is best to just start out light
weight and see how far you can go. Represent anything you currently
think of as an object, a.k.a. a bag of properties, as a regular
clojure map. Don't use records or protocols or structs or metadata or
anything fancy, just regular old maps. For modeling sequential data,
like DNA base pairs, use vectors. Then create a series of functions
to read these things in, write them out, and perform some different
transformations. Don't worry so much about where or how in memory you
are going to "store" stuff. Just write a library of functions that
can read, write and manipulate your objects of interest. That's
pretty much a functional library, and you'll surprise yourself how
much can be done in this way.

I'm also interested in learning more about bio-informatics so I'd be
willing to help out. What is your first target application of the
library? What specific kind of research do you want to support?

-Jeff

npt11tpn

unread,
Jun 29, 2010, 6:18:17 AM6/29/10
to Clojure
Hi,
There has been some interest towards Clojure from the cheminformatics
community as well (e.g. http://blog.rguha.net/?tag=clojure ) in
relation to the Chemistry Development Toolkit (CDK, http://sourceforge.net/projects/cdk)
and the approach seems to be to use the CDK java classes directly in
clojure or write simple wrapper functions around them and build
further abstractions on top. A similar approach here would be to build
bioclojure on top of biojava.
Best
Nik

Nicolas Oury

unread,
Jun 29, 2010, 6:47:53 AM6/29/10
to clo...@googlegroups.com
On Tue, Jun 29, 2010 at 10:50 AM, Jeff Rose <ros...@gmail.com> wrote:
Don't use records or protocols or structs or metadata or
anything fancy, just regular old maps.  For modeling sequential data,
like DNA base pairs, use vectors.  Then create a series of functions
to read these things in, write them out, and perform some different
transformations.  Don't worry so much about where or how in memory you
are going to "store" stuff.  Just write a library of functions that
can read, write and manipulate your objects of interest.  That's
pretty much a functional library, and you'll surprise yourself how
much can be done in this way.


+1 if you make the function not just for vector but for any seqs. That way of we need special BioSeqs someday...

jandot

unread,
Jun 28, 2010, 4:06:51 PM6/28/10
to Clojure
Bonjour Nicolas,

I've started using clojure for my bioinformatics work, but it is still
*very* early days. Will try to become more proficient in it, but
slowly building up a toolkit for myself might just be the seed for
bioclojure. Have no idea to what extent clojure is used at the moment
in the field.

Will have a look at protocols and defrecord. Will keep you informed if/
when I think I can take it further.

jan.
> > clojure+u...@googlegroups.com<clojure%2Bunsu...@googlegroups.com >

Rob Lachlan

unread,
Jun 29, 2010, 1:41:15 PM6/29/10
to Clojure
>
> +1 if you make the function not just for vector but for any seqs. That way
> of we need special BioSeqs someday...

Yes, I concur. I think that the default in-memory data format for
(DNA, RNA, Protein sequences) should be a vector, but that the
functions should take sequences. Mind you in some cases, we might want
to exploit transient vectors for better performance.

Rob

Nicolas Oury

unread,
Jun 29, 2010, 1:45:18 PM6/29/10
to clo...@googlegroups.com
If you want to open-source it, one way of learning and keeping other involved
 would be to have a repository where you put what you do.
Other can look and comment at first, and maybe commit too when you think your work will be ready for more commiters.

Moritz Ulrich

unread,
Jun 29, 2010, 1:48:12 PM6/29/10
to clo...@googlegroups.com
I would recommend Github as a hosting-service for the code. It's free,
collaboration with others is extreme cool and most other notable
clojure projects (and even clojure itself) is hosted there.

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your
> first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com

> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--

Reply all
Reply to author
Forward
0 new messages