Re: [Genome Informatics] GSoC 2013 - Reactome - Big Picture Visualisation

73 views
Skip to first unread message

David Croft

unread,
Apr 26, 2013, 6:08:34 AM4/26/13
to ross...@googlemail.com, genome-in...@googlegroups.com
Hi Ross,
>
> I am hoping to pass the summer as a Google Summer of Code
> developer, and the Reactome Big Picture Visualisation project looks
> very interesting. Would you be able to let me know if you think I may
> be suitable for this?
>
> How big a problem would it be to have only a passing knowledge of
> biology?

This would not be a big problem, you can learn what you need to know
during the project.

> Would it be possible to gain enough of the concepts required for the
> project within a few weeks?

Yes!

> I have spent some time over the last few days trying to understand
> what Reactome represents and what a biological pathway is. I can
> certainly appreciate the usefulness of the right visualisation
> software in this area.
>
> While I have implemented a Voronoi diagram algorithm (at the 'hello
> world' level), I am unsure in what way you intend them used for your
> data sets. For instance, what would be represented by the seed points
> in the diagram? And what distance metric would be used? I found a web
> application, Voronto, that maps gene expression data to biological
> ontologies, which included data from Reactome[1,2] (I have only a
> loose notion of what that means, but I think I understand what it is
> doing). Would that project be of any use as a reference point?

Take a look at this web site:

http://reactomedev.oicr.on.ca/ReactomeGWT/site.html

Click on the button marked "Pathway network". That shows you more or
less the information that we would like to show in the Voronoi map. The
idea would be to make the cells in the map represent our
top-level-pathways, with their size depending on the number of proteins
in the pathway. The lines you see criss-crossing on the web page
represent the number of proteins shared by any given pair of pathways,
which you could use in a distance metric.
>
> If you think the lack of domain experience can be overcome, I am
> currently a first year undergraduate on the Informatics degree
> programme at Edinburgh University, returning to education as an older
> student. I have spent the last 20 years or so working in areas
> unrelated to computer science (like medical journals publishing) but
> with a longstanding interest in computers and their uses to society.
>
> I am still a novice Java programmer, but I hope advancing quickly. The
> last piece of software I wrote can be found at link [3] below, if it
> is of use for to you to see it. It's a very basic agent-based
> simulation intended to replicate a paper from 2000 on behavioural
> traits in an evolutionary context.
>
> I also have had no experience with the Google Web Toolkit, but would
> be very keen to start early and learn as much as possible.
>
Check it out, it is pretty cool!

https://developers.google.com/web-toolkit/overview

It allows you to program web clients in pure Java, in a similar style to
AWT. This Java then gets compiled to Javascript and is then
incorporated into your web page.

You need to look over the Reactome website that you see in the above
link (this is our new website) and have a think about where you would
fit the work cloud and the Voronoi map into this. You would need to
research the trechnologies that you will need to understand, such as
GWT, RESTful web services. You would need to investigate existing
packages for creating word clouds and Voronoi maps, and decide whether
to re-implement yourself or whether you can use them. Finally, you
would need to come up with a design proposal. This would form the
technical part of a proposal.

Then you would need to decide how to break down the tasks into a linear
sequence, and make a timeline out of them. Don't forget to include time
needed to research/learn new stuff.

All of this would become the proposal you submit to the GSoC Melange system.

Cheers,

David Croft.
>
> 1. http://vis.usal.es/voronto/Intro.html
> 2.
> http://bioinformatics.oxfordjournals.org/content/early/2012/07/04/bioinformatics.bts428
> 3. https://bitbucket.org/rosshadam/agentsim
> --
> You received this message because you are subscribed to the Google
> Groups "Genome Informatics-Google Summer of Code Group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to genome-informat...@googlegroups.com.
> To post to this group, send email to genome-in...@googlegroups.com.
> Visit this group at
> http://groups.google.com/group/genome-informatics?hl=en-US.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

cr...@ebi.ac.uk

unread,
Apr 30, 2013, 8:38:19 AM4/30/13
to genome-in...@googlegroups.com
Hi Ross,
>
> Thanks very much for your reply. Apologies for the delay in replying,
> there were quite a few things to look at first!
>
>
> And, apologies for the length of this post, there are a lot of questions!
> Particularly on what you want from the Voronoi diagram (details further
> down).
>
> I'm glad that you think that it will be possible to gain enough
> familiarity
> with the domain to be able to do useful work.
>
>
> One of your last points was on the re-use of existing packages. If
> successful, I would prefer to go this route if possible and use libraries
> for the word cloud and voronoi diagram, so that more time can be spent on
> learning the Reactome architecture, GWT, and making sure the interface is
> well done and that it provides a reasonable and useful user experience.
>
Sounds reasonable.
>
> I'm not sure yet where the visualisations would fit into the website.
> Perhaps I could ask a potential user what they would like to see, and
> where? I guess that it should respond to what would be most useful to the
> users. Perhaps that would fit into the project itself, as the first task
> parallel to the 'community bonding period'?
>
If you go to the beta version of our new website here:

http://reactomedev.oicr.on.ca/ReactomeGWT/site.html

...and click on the "Pathway network" button, you will see one overview
that we have already experimented with. So a possibility would be to put
a button onto the front page, leading to, say, the Voronoi map. I think
it might actually be reasonable to put the word cloud on the front page
itself. At least, this would be something to show potential users, and
ask them how useful it could be.
>
> Would there perhaps be a 'visualisation page' where the pathway network
> diagram could co-exist with the new ones? Would it be possible to jump
> from
> one to the other, at the click of a button, each being different
> renderings
> of the same point in the data?

That would be another possibility.
>
>
> I have had a quick look at some available packages that could be used,
> details below. One worry would be licensing – are there any licences that
> Reactome may be allergic to?

We try to be as open source as possible, claiming that anybody can
download our code and use it. So commercial code, or code with a
restrictive license, would be bad.

>
> Word cloud
> At the moment, it looks as if the word cloud would be the easier
> visualisation to implement.
> Perhaps this could be adapted, a small set of classes to provide simple
> word clouds:
> OpenCloud: http://opencloud.mcavallo.org/index.jsp

It would be good if you could collect together a few packages and review
them in your project proposal. Then select one, and explain why you chose
it.
>
> I will have to look into how how GWT positions things on the page and how
> interactivity is achieved. It looks from the pathway browser that a high
> level of interactivity can be achieved.
> Perhaps there could be more than one word cloud displayed at a time - one
> for the top-level pathways, another for shared proteins when a word in the
> first is clicked on.

It's a thought.
>
> Maybe some animation effects? Drag a slider to see changes? As seen here:
> http://chir.ag/projects/preztags/
>
>
> Voronoi diagram (query)
> To see if I've understood what you are looking for, would this
> interpretation be correct?
> The top level of cells would be the top-level pathways, size adjusted to
> the amount of proteins they involve (as you said). For example, going by
> the pathway network diagram for Homo Sapiens, the largest cell would be
> Metabolism.
> Sub cells of those, level 2 cells, would be named for the other top-level
> pathways and the size adjusted by the amount of proteins they share with
> the parent cell. So the largest cell within the Metabolism level 1 cell
> would be Disease as that looks to be the thickest line joining the two.
>
> Please put me right if I have misinterpreted you!

Would it be possible to have only level 1 cells representing top level
pathways, with proximity "coding" for the number of shared proteins?
>
> However, once the basic algorithms are in place alternate visualisations
> would likely be achieved without too much trouble.
>
> If the above is the case... It looks to me that the creation of a Voronoi
> map from data that is not inherently geometric will be more complex than
> the simplest case as presented in the Wikipedia entry. I think that I
> would
> have to use a weighted Voronoi map as described here:
> http://www.inf.uni-konstanz.de/gk/pubsys/publishedFiles/BaDe05b.pdf
>
> This makes the implementation a little more involving and I would like
> adapt a working program, if possible, rather than reimplement it, even if
> the algorithms are established and published.

Do you have something in mind?
>
> Voronto software - MIT licence.
> http://vis.usal.es/voronto/Intro.html
> Having had a skim of some of the literature referenced here, and assuming
> no misinterpretation, I think that adapting some of the code here may be a
> good option. A fair bit of further evaluation of the source code is
> required!
>
If you could list the different options that you looked at and give the
reasons why you chose Voronto, that would be useful and could also go into
your project proposal.

> Reactome itself
>
> I've quickly tried installing Reactome at home using the instructions on
> the website. However, I have been unable to find an architectural overview
> of the system, such as a developer might use. Would that be available? The
> installation is not complete yet, mysql is having problems, but I should
> get that sorted out soon.
>
Sadly, as with most such projects, documentation is sparse for Reactome.
>
> GWT
>
> I've added this to my Eclipse installation, compiled and ran the
> introductory example code and tested it in a browser. Pretty cool, indeed.
>
> If there is some similarity with using Swing I will hopefully be able to
> pick up the basics without too much trouble. Even if I can't get a
> suitable
> application together I'll still look further into this.
>
OK.
>
> I will try to get a rough proposal worked out by Monday. If the above is
> not too far from what you are looking for, would you be able to have a
> quick look at it? No problem if not! There will probably be more questions
> from me, also...
>
I would certainly be prepared to look at your proposal.

Cheers,

David.

Ross Adam

unread,
Apr 30, 2013, 2:44:56 PM4/30/13
to genome-in...@googlegroups.com, cr...@ebi.ac.uk
Hi David,
Still having a problem with the Voronoi concept...

>> Voronoi diagram (query) To see if I've understood what you are
>> looking for, would this interpretation be correct? The top level of
>> cells would be the top-level pathways, size adjusted to the amount
>> of proteins they involve (as you said). For example, going by the
>> pathway network diagram for Homo Sapiens, the largest cell would
>> be Metabolism. Sub cells of those, level 2 cells, would be named
>> for the other top-level pathways and the size adjusted by the
>> amount of proteins they share with the parent cell. So the largest
>> cell within the Metabolism level 1 cell would be Disease as that
>> looks to be the thickest line joining the two.
>>
>> Please put me right if I have misinterpreted you!
>
> Would it be possible to have only level 1 cells representing top
> level pathways, with proximity "coding" for the number of shared
> proteins?

In short, I don't know.

I don't quite have the mathematical skill to properly interpret the
exposition papers on Voronoi diagrams, and give an absolute answer.

I do think that the size of the cells can be adjusted to indicate the
number of proteins related to a top-level pathway (can I shorten to
TLP?) that are also shared with other TLPs.

I'm not sure that the cells could be positioned such that a cell's
closest neighbours are those cells with which it shares the most proteins.

I think in this instance it may only be relative area of cells that can
be used as a visual measure, to represent a ratio of quantity. I'm not
sure that distance on the map can be usefully interpreted as anything.
But as stated I do not have enough skill to faithfully interpret the
mathematics.

Do you mean that the cell size would be adjusted to the total number of
shared proteins? That every protein shared with another top-level
pathway increases the size of the cell? And if the same protein is
shared with more than one other TLP then it would count as many times as
it is shared?

E.g. if a protein is shared between Metabolism and Disease, and also
between Metabolism and Immune System then that would count twice in
determining the size of the Metabolism cell?

The data itself.
Where to get the data for both word clouds and Voronoi maps - Would the
data for the pathways would be from the XML document for
pathwayHierarchy from the Reactome RESTful API? And the proteins from
pathwayParticipants?
Would we have to build a set of proteins for each TLP and find the
intersections?

Yours,
Ross

cr...@ebi.ac.uk

unread,
May 1, 2013, 9:32:37 AM5/1/13
to Ross Adam, genome-in...@googlegroups.com, cr...@ebi.ac.uk
Hi Ross,
Yes, that's what I was thinking, perhaps naievely, as someone who doesn't
know that much about the details of how Voronoi maps work!
>
> E.g. if a protein is shared between Metabolism and Disease, and also
> between Metabolism and Immune System then that would count twice in
> determining the size of the Metabolism cell?
>
> The data itself.
> Where to get the data for both word clouds and Voronoi maps - Would the
> data for the pathways would be from the XML document for
> pathwayHierarchy from the Reactome RESTful API?

Basically yes. JSON rather than XML though.

> And the proteins from
> pathwayParticipants?
> Would we have to build a set of proteins for each TLP and find the
> intersections?
>
Either that or write a new web service request to get that information.

If you are interested in this project, I suggest you put together a draft
proposal now, and submit it in the next couple of days, because Friday is
the deadline. You can always refine the proposal once it has been
submitted. It would be quite legitimate to say that one of the problems
that you need to solve is the exact mapping from Reactome to a Voronio
map.

Cheers,

David.

Ross Adam

unread,
May 1, 2013, 10:08:29 AM5/1/13
to genome-in...@googlegroups.com, cr...@ebi.ac.uk
Hi David - that's great, thanks very much.

I have a draft proposal worked out. I'll have to submit in a few hours
as I've studies to get on with!

But that info helps quite a bit.

Yours,
Ross

cr...@ebi.ac.uk

unread,
May 2, 2013, 7:01:55 AM5/2/13
to Ross Adam, genome-in...@googlegroups.com, Ross Adam, cr...@ebi.ac.uk
Hi Ross,

How long are you planning to go on vacation?

Cheers,

David.

> If you are interested in this project, I suggest you put together a
> draft
>
>> proposal now, and submit it in the next couple of days, because Friday
>> is
>> the deadline. You can always refine the proposal once it has been
>> submitted. It would be quite legitimate to say that one of the problems
>> that you need to solve is the exact mapping from Reactome to a Voronio
>> map.
>>
>> Cheers,
>>
>> David.
>>
>>
> Hi David,
> This is just to let you know that I've posted an application in the
> melange/GSoC website.
>
> I think it may be a bit long on words and short on details, but I would
> like to know what you think of it. I'll then try and improve it as best I
> can.
>
> There is one issue that may be important. I have a pre-booked holiday in
> June. It was booked around January and I'd put it to the back of my mind!
> It is detailed in the application, with a possible work around, if the
> application looks suitable to be accepted.
>
> Thanks,
> yours,
> Ross
>


Ross Adam

unread,
May 2, 2013, 9:06:53 AM5/2/13
to genome-in...@googlegroups.com
Hi David,
Sorry, forgot to put that in... 18 June to 2 July.
Ross

Ross Adam

unread,
May 2, 2013, 7:46:27 PM5/2/13
to genome-in...@googlegroups.com, cr...@ebi.ac.uk
Hi David,
Thanks for looking at the proposal.

> It would be good if you could add some more detail about GWT and how
> hou plan to integrate with it - a couple of paragraphs would suffice.

I've added some text at the end of the background required section. Is
that the sort of thing you are looking for?

> Also, please add some more detail to the timeline entries.

I've added a bit more detail, but I'm worried that it's still a bit vague.

Cheers,
Ross

David Croft

unread,
May 3, 2013, 5:13:11 AM5/3/13
to Ross Adam, genome-in...@googlegroups.com
Hi Ross,
>
> Thanks for looking at the proposal.
>
> > It would be good if you could add some more detail about GWT and how
> > hou plan to integrate with it - a couple of paragraphs would suffice.
>
> I've added some text at the end of the background required section. Is
> that the sort of thing you are looking for?

Yes, that's good.
>
> > Also, please add some more detail to the timeline entries.
>
> I've added a bit more detail, but I'm worried that it's still a bit
> vague.
>
The timeline is mostly fine now, but you mention in two places
"Integration with Reactome website". What does that mean? I think I
can guess, but for other readers of the proposal, you will need to be
more explicit.

Cheers,

David.

Ross Adam

unread,
May 3, 2013, 7:32:34 AM5/3/13
to genome-in...@googlegroups.com, David Croft
Thanks David,

On 03/05/13 10:13, David Croft wrote:
>>
> The timeline is mostly fine now, but you mention in two places
> "Integration with Reactome website". What does that mean? I think I
> can guess, but for other readers of the proposal, you will need to be
> more explicit.

I guess that could cover a number of things!

I really meant deploying the files/code modules on the server (or test
server for the beta site) and getting the visualisation window/area or
page to appear within the existing pages and having associated controls
appear appropriately, as desired, and having it access and display live
data.

I'll update that in a short while.

Cheers,
Ross
Reply all
Reply to author
Forward
0 new messages