New game idea 0 for GSOC 2013: Improving Gene Ontology and Biocuration by Crowdsourcing Annotation

Mel K.

unread,

Apr 30, 2013, 2:13:23 AM4/30/13

to crow...@googlegroups.com

Hello, this is Maeoll Kim with another game idea. This time my game idea seeks to improve biocuration by electronic annotation accuracy using crowdsource-supported text mining and neural networks doing supervised learning.

Problem: Effective biocuration is currently inhibited by the sheer number of publications being produced every year. While manual curation by human efforts to classify and annotate these publications are highly accurate, they are costly and have low throughput. Electronic annotation is used in its place because of its high throughput, but it has lower accuracy and specificity. Equivocal and ambiguous words in publications (like how 'transduction' has multiple meanings depending on the context) and myriad words describing the same thing (gluconeogenesis, glucose formation, glucose biosynthesis, etc), cause difficulty to the process of text mining publications.

Solution: We can achieve a middle ground of accuracy between electronic and manual curation using general consensus. In this way we can approximate natural language processing. One way this can be achieved is by having these equivocal or myriad words initially represented as untreated electronic annotation selections (using text mining algorithms), and can be later be shifted to a different meaning when a majority of the players select a new meaning. Another way it can be achieved is by reinforcement of correct electronic annotation, which will improve electronic annotation accuracy. This can be made similar to how Recaptchas work by general consensus.

Players: They will be asked to create a profile, where their level of biological science knowledge is asked to rank how much weight we give each player's answers. Outliers (high level of education but strange answers) will be given little weight, but a few more players of similar knowledge in consensus with their answers will validate their answer.

How the game will look: It will be a mideval action game with simulated 8-bit pixel graphics for simplicity's sake, where the player will fight opponents using a sword and shield with a time limit. During each give and take of blows, play action will be slow motion and players will be given a snippet of an article with an ambiguous or equivocal word and asked to choose which of the several meanings fits the context (the in-game pieces). The correct answer will either be the answer discovered by the text mining algorithm, or after enough votes in its favor, the new consensus answer.

How the game works: Points will be awarded for number of correct hits within the time limit. The faster the player reads and plays the correct answer, the less time the game's action will be in slow motion, which the player wants. The player will also learn correct connotations, even if they have a low biological science knowledge rank. This will develop into a skill on the part of the player, which has been shown to be one of the components of game addiction, and the player may even be impressed they are learning something in a fun way.

How the game progresses: Players can quickly defeat low level opponents by getting many answers correct in a row, and will be eventually appropriately matched to their skill level. The early levels will be easy, and the game will be have progressively more difficult words.

The benefit from this game: By crowdsourcing connotations, we will improve a text mining program that will be attempting to categorize publications into their correct Gene Ontology ID by electronic annotation. We can then add this information to BioGPS and may end up providing a large amount of literature for users who search for a specific gene or gene product.

Backend: The text mining algorithm will be a neural network in supervised training mode that organizes publications according to their respective IDs. There will also be a clustering algorithm in that gathers together articles as game pieces.

My Background: Programming experience: 2 years of C/C++, 1/2 year of Matlab, 3 years of Java, beginner in Python, and 1 year of SQL. I am currently interested in and enrolled in a class about Data Mining which has a text mining module, and I have done a data mining project focused on facial recognition. I am interested in text mining and am looking for opportunities to practice this. I am taking a bioinformatics class with a neural networks component, and have a grasp on this concept.

Benjamin Good

unread,

Apr 30, 2013, 12:56:54 PM4/30/13

to crow...@googlegroups.com

Interesting in general but the specifics need more thought. The idea of embedding tasks in the context of semantically distinct game (e.i. biological text processing tasks required to defeat opponents in a fighting game) is something I'm curious to explore. (I have know idea if it will work or not because I think there are some reasonable experiments to run.) But, if I were doing it, I would try to tap into an existing game world (and gigantic player population). Could you embed the tasks in WoW or Second Life or etc. somehow? Also, the tasks themselves could be improved. Correctly identifying ambiguous terms (e.g. transduction) is a useful step, but it would be much more interesting if the players were identifying relationships between entities.

--
--
You received this message because you are subscribed to the Google
Groups "Crowdsourcing Biology" group.
To post to this group, send email to crow...@googlegroups.com
To unsubscribe from this group, send email to
crowdbio+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/crowdbio?hl=en?hl=en

2012 GSoC Organization page: http://www.google-melange.com/gsoc/org/google/gsoc2012/scripps_crowdbio
GSoC Ideas page: http://sulab.org/gsoc/
---
You received this message because you are subscribed to the Google Groups "Crowdsourcing Biology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crowdbio+u...@googlegroups.com.
To post to this group, send email to crow...@googlegroups.com.
Visit this group at http://groups.google.com/group/crowdbio?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Mel K.

unread,

Apr 30, 2013, 2:24:01 PM4/30/13

to crow...@googlegroups.com

It looks like it's possible to create a slow-motion mod in Minecraft, which has a large player base, and it looks like there is a slow motion option in Second Life, which also has a large base, so it might be possible to script it. However, there could be a better way to present these tasks to game players.

Could you explain further what you mean when you say "players were identifying relationships between entities"? Are you talking about relationships between GO IDs? If so it could be added as an additional task alongside identifying ambiguous terms.

Benjamin Good

unread,

Apr 30, 2013, 4:54:39 PM4/30/13

to crow...@googlegroups.com

basically entity a (e.g. a gene) is related somehow (e.g. upregulates) to some other entity (e.g. another gene or a process)

Maeoll Kim

unread,

Apr 30, 2013, 6:19:52 PM4/30/13

to crow...@googlegroups.com

That could be implemented by crowdsourcing as well, and can be implemented in a similar way as the equivocal words, or even replacing that as the main benefit from the game if you feel it is more beneficial. We can use supervised learning to determine possible answers for us after providing enough samples to use as a training set.

We may also leave that to the harder levels or to players with higher knowledge levels of biology.

You received this message because you are subscribed to a topic in the Google Groups "Crowdsourcing Biology" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/crowdbio/U_racwQZQU0/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to crowdbio+u...@googlegroups.com.

To post to this group, send email to crow...@googlegroups.com.
Visit this group at http://groups.google.com/group/crowdbio?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
Sincerely,

Maeoll Kim
Lab Coordinator and System Administrator
Graduate Assistant
Bioinformatics and Medical Informatics Department
San Diego State University

Reply all

Reply to author

Forward