The Annotator - Midterm prototype

Maximilian Ludvigsson

unread,

Jul 9, 2012, 4:51:37 PM7/9/12

to crow...@googlegroups.com

Hey!

Just finished a blog post where I go through my current prototype so that someone not

familiar with the project can get up to speed:

http://mludv.github.com/2012/07/09/midterm/

There is a link to the current prototype in the post, but here it is again for reference:

http://50.112.124.237/

Check it out and tell me what you guys think,

Max

Benjamin Good

unread,

Jul 9, 2012, 5:34:53 PM7/9/12

to crow...@googlegroups.com

Hi Max, looking good. It is running really slowly though.. I'm seeing this error in the javascript

http://50.112.124.237/projects/sviewer/seqconfig.cgi 404 (Not Found)

and I can see that you are getting a lot of 502 errors from BioGPS. Have you mentioned that to the BioGPS guys? We should know if that error is on your side or on theirs.

To answer your question at the bottom of your blog post regarding uses, there are many and perhaps that is why its a little nebulous. As one very simple example, I might be interested in finding novel functions of genes that might be missing from structured databases but present in text about the gene (like the summary you captured in protein atlas and in hundreds of other sites like it). This system will allow me to issue a single query "get me all the summary text about all human genes" that spans hundreds of different websites/databases. I can then feed the output of that query into my text mining analysis pipeline and very rapidly identify undocumented functions.. This saves me the trouble of assembling my own web scraper for all of those sites. Your tool basically provides a way to crowdsource the first step of information extraction applications. I might also be interested images, antibodies, orthologous genes, related researchers etc.

Hopefully the last phase of your project will focus on building out more end-user focused applications.

-Ben

--
You received this message because you are subscribed to the Google
Groups "Crowdsourcing Biology" group.
To post to this group, send email to crow...@googlegroups.com
To unsubscribe from this group, send email to
crowdbio+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/crowdbio?hl=en?hl=en

GSoC Organization page: http://www.google-melange.com/gsoc/org/google/gsoc2012/scripps_crowdbio
GSoC Ideas page: http://sulab.org/gsoc/

Ian MacLeod

unread,

Jul 9, 2012, 5:43:46 PM7/9/12

to crow...@googlegroups.com

We were seeing some 502 errors in BioGPS over the last few days so this is likely coming from our side. I just committed code that will hopefully resolve this.

Ian

On Monday, July 9, 2012 2:34:53 PM UTC-7, Benjamin Good wrote:

Hi Max, looking good. It is running really slowly though.. I'm seeing this error in the javascript
http://50.112.124.237/projects/sviewer/seqconfig.cgi 404 (Not Found)

and I can see that you are getting a lot of 502 errors from BioGPS. Have you mentioned that to the BioGPS guys? We should know if that error is on your side or on theirs.

To answer your question at the bottom of your blog post regarding uses, there are many and perhaps that is why its a little nebulous. As one very simple example, I might be interested in finding novel functions of genes that might be missing from structured databases but present in text about the gene (like the summary you captured in protein atlas and in hundreds of other sites like it). This system will allow me to issue a single query "get me all the summary text about all human genes" that spans hundreds of different websites/databases. I can then feed the output of that query into my text mining analysis pipeline and very rapidly identify undocumented functions.. This saves me the trouble of assembling my own web scraper for all of those sites. Your tool basically provides a way to crowdsource the first step of information extraction applications. I might also be interested images, antibodies, orthologous genes, related researchers etc.

Hopefully the last phase of your project will focus on building out more end-user focused applications.

-Ben

On Mon, Jul 9, 2012 at 1:51 PM, Maximilian Ludvigsson <m...@invoco.se> wrote:

Hey!

Just finished a blog post where I go through my current prototype so that someone not
familiar with the project can get up to speed:

http://mludv.github.com/2012/07/09/midterm/

There is a link to the current prototype in the post, but here it is again for reference:

http://50.112.124.237/

Check it out and tell me what you guys think,

Max

--
You received this message because you are subscribed to the Google
Groups "Crowdsourcing Biology" group.
To post to this group, send email to crow...@googlegroups.com
To unsubscribe from this group, send email to

crowdbio+unsubscribe@googlegroups.com

Maximilian Ludvigsson

unread,

Jul 10, 2012, 9:25:24 AM7/10/12

to crow...@googlegroups.com

Thanks!

I sometimes get 404 errors when the site in the iframe requests relative resources. I have some thoughts about solving this but it haven't been a large problem yet.

As for speed, the web scraper is more than ten times faster on my machine.

Den måndagen den 9:e juli 2012 kl. 23:34:53 UTC+2 skrev Benjamin Good:

Hi Max, looking good. It is running really slowly though.. I'm seeing this error in the javascript
http://50.112.124.237/projects/sviewer/seqconfig.cgi 404 (Not Found)

and I can see that you are getting a lot of 502 errors from BioGPS. Have you mentioned that to the BioGPS guys? We should know if that error is on your side or on theirs.

To answer your question at the bottom of your blog post regarding uses, there are many and perhaps that is why its a little nebulous. As one very simple example, I might be interested in finding novel functions of genes that might be missing from structured databases but present in text about the gene (like the summary you captured in protein atlas and in hundreds of other sites like it). This system will allow me to issue a single query "get me all the summary text about all human genes" that spans hundreds of different websites/databases. I can then feed the output of that query into my text mining analysis pipeline and very rapidly identify undocumented functions.. This saves me the trouble of assembling my own web scraper for all of those sites. Your tool basically provides a way to crowdsource the first step of information extraction applications. I might also be interested images, antibodies, orthologous genes, related researchers etc.

Hopefully the last phase of your project will focus on building out more end-user focused applications.

-Ben

On Mon, Jul 9, 2012 at 1:51 PM, Maximilian Ludvigsson <m...@invoco.se> wrote:

Hey!

Just finished a blog post where I go through my current prototype so that someone not
familiar with the project can get up to speed:

http://mludv.github.com/2012/07/09/midterm/

There is a link to the current prototype in the post, but here it is again for reference:

http://50.112.124.237/

Check it out and tell me what you guys think,

Max

--
You received this message because you are subscribed to the Google
Groups "Crowdsourcing Biology" group.
To post to this group, send email to crow...@googlegroups.com
To unsubscribe from this group, send email to

crowdbio+unsubscribe@googlegroups.com

Maximilian Ludvigsson

unread,

Jul 11, 2012, 12:19:04 PM7/11/12

to crow...@googlegroups.com

I've implemented a few performance tweaks and the pages should load faster!

Check it out on http://ec2-50-112-60-250.us-west-2.compute.amazonaws.com.

Max

Reply all

Reply to author

Forward