Worm Behavior: Great chatting!

89 views
Skip to first unread message

Stephen Larson

unread,
Apr 2, 2013, 4:48:32 PM4/2/13
to Ev Yemini, openworm-discuss
Hi Ev,

  Thanks for the chat today.  I'm cc'ing the OpenWorm discussion list (which you should feel free to join, more info here)

   All, Ev is the main person behind the C. elegans behavioral database in the UK.  You can get a sense of what it contains by checking the attached paper, which Ev is a co-author on.  They are currently working on a manuscript to document the full contents of the database.  Ev gave me a walkthrough today via Hangout and it is nothing short of impressive.

   To give you a sense, he has 10,000 independent movies (examples here) of both wild type and mutant worms moving around that have been segmented and digitized.  From this digitization, he has extracted hundreds of features about the movement of these worms, from a few minutes to several hours.  He is happy to make these data available (raw and processed) as long as we cite him (not a problem in our community).  

   I think a dataset like this would be crucial for us to validate the OpenWorm model, and for those of you involved in the "Turing test" conversations, I think it makes some of our more abstract ideas a lot more concrete :)

   Ev, since we rely on email as our primary form of communication, can you please send any additional materials you feel comfortable sharing so they can have a better sense?  I note that the FAQ page on the site seems to be a dead link.

   Ev and I figured we would have a lot more to talk about, but I wanted to open up the conversation to anyone else who would like to join.  Just let me know and I'll add you to the scheduling process.

Thanks,
  Stephen
Brown et al. - 2013 - A dictionary of behavioral motifs reveals clusters of genes affecting Caenorhabditis elegans locomotion.pdf

Balazs Szigeti

unread,
Apr 3, 2013, 4:12:29 AM4/3/13
to openworm...@googlegroups.com
hey,

I would like to join this conservation, please add me to scheduling (wont have internet access till next Tuesday, so wont be responsive till then)
best,
b





2013/4/2 Stephen Larson <ste...@openworm.org>

--
You received this message because you are subscribed to the Google Groups "OpenWorm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openworm-discu...@googlegroups.com.
To post to this group, send email to openworm...@googlegroups.com.
Visit this group at http://groups.google.com/group/openworm-discuss?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ev Yemini

unread,
Apr 3, 2013, 2:18:46 PM4/3/13
to Stephen Larson, openworm-discuss
Hi Stephen & everyone,

It's a pleasure to share our work with you! My biggest fear has always been that no one would make use of the data, save for our own lab. I've been ecstatic to see Worm Tracker 2.0  (the system used to collect our data set) used by so many labs worldwide (around 20+ now -- the full hardware plans & software are available freely online at http://www.mrc-lmb.cam.ac.uk/wormtracker/ in case anyone's interested).

In summary we have:

1. 10,000+ single-worm experiments (young adult hermaphrodites & males, spontaneously behaving on food -- mostly 15 minutes long at 20-30Hz, but we also have 10 hour experiments as well) annotated in videos and stored as a time series of frame-by-frame worm skeletons and features.

2. These experiments represent 300+ strains (roughly 20+ worms, per strain, and often much more) with N2 controls run at the same time. The strains have nervous system mutations. There are multiple alleles for several of the genes, double & triple crosses alongside the corresponding single gene mutants, wild isolates from multiple global locations, several lineages of the N2 (our lab stock, the CGC N2, and LSJ1 which was maintained in liquid culture and is as close as I know to the original N2), 3 years worth of our own N2, and many other possibilities to choose from.

3. For each experiment, everything is organized simply in a struct within HDF5 formatted files (an open format that works with pretty much every language I know: Matlab, R, C/C++, Java, Python, ...). The data is freely available now.

If other people are interested, I can present the phenomic database in an organized session. I can also walk everyone through the experiment files. I'd love to share the manuscript but I'm fearful that this would violate some journalistic submission policy and thereby jeopardize our publication. I've asked my PhD advisor Bill Schafer how much I can share of the paper itself.

I love science. I left my work in finance and startups to do what I love. Nothing would make me happier than to play some small part in understanding an organism intimately, with true insight into how brains work holistically. I believe C. elegans will be the first animal for which we accomplish this goal. And, I believe efforts like OpenWorm will provide the platform with which to build and test new theory in order to understand, from the ground up, behavior and rich biological systems. Engineering provides an entirely different insight than empirical observation, I'm very excited to see what comes of your work.

All the best,
Ev

Stephen Larson

unread,
Apr 4, 2013, 8:00:37 PM4/4/13
to Ev Yemini, openworm-discuss
Ev -- terrific.  Great intro.

One thing that I'm wondering about is if there are URLs that are open to the public right now to snag the HDF5 files in some organized directory structure?  I see some links on the website, but I'm looking for the 'hacker' view on this.  Is the idea to click on each of the files independently or can we click on a directory somewhere to see the full overview in a file system?

Thanks,
  Stephen

Ev Yemini

unread,
Apr 4, 2013, 10:38:30 PM4/4/13
to Stephen Larson, openworm-discuss
Definitely. It's rather simple. Everything is located under:

I would strongly suggest using an FTP client to download large subdirectory structures that suit your needs.

The next part looks confusing but it's really simple to get used to and fairly straightforward. The subdirectories are organized as follows (don't worry about parsing annotations from the directory structure, all annotations are present within the feature files as well):

1. When present, the first subdirectory is the gene name (e.g., "unc-8"); otherwise, for wild isolates and N2 the subdirectory is "gene_NA".

2. When present, the next subdirectory is the allele (e.g., "n491n1192"); otherwise, for wild isolates and N2 the subdirectory is "allele_NA".

3. The subdirectory thereafter is the strain name (e.g., "AQ2947" is the Schafer lab copy of the CGC's N2). The strain name is always present.

4. Beyond this point the subdirectories describe whether the worm is on food ("on_food" or "off_food" -- only a small subset of N2s and MECs were done off food). The sex ("XX" or "XO" -- the only males are N2). Whether a habituation period was observed ("30m_wait" or "no_wait" -- 25 N2 experiments were done with no habituation and recorded for 2 hours straight; otherwise, we always observed a 30 minute habituation period).

5. At the end the subdirectories become far less meaningful to you. They indicate the ventral side ("L" = anti-clockwise or "R" = clockwise -- this can be confusing due to the orientation of the video vs. the experimenter's annotation). The tracker we used (1 through 8). The date (YYYY-MM-DD___HH_MM_SS). And, finally, the experiment's filename. The actual feature files contain further annotations (e.g., the room we used, the frame rate, ...).

Here are 2 examples:

1. unc-8(n491n1192)

2. CB4856 - the famous Hawaiian wild isolate
ftp://anon...@ftp.mrc-lmb.cam.ac.uk/pub/tjucikas/wormdatabase/results-12-06-08/Laura%20Grundy/gene_NA/allele_NA/CB4856/on_food/XX/30m_wait/L/tracker_1/2010-11-25___11_33_52/399%20CB4856%20on%20food%20R_2010_11_25__11_33_52___1___1_features.mat

I hope this wasn't too confusing.

All the best,
Ev

Stephen Larson

unread,
Apr 9, 2013, 2:13:25 AM4/9/13
to Ev Yemini, openworm-discuss
This is great.  I've gone ahead and created a starter task for the community to begin digging into these files here:


Alex Dibert -- I'm wondering if you'd be interested to take a look...?  You've already played with several of these libraries :)

Best,
  Stephen

Ev Yemini

unread,
Apr 9, 2013, 1:01:32 PM4/9/13
to Stephen Larson, openworm-discuss
Forgot to hit reply all.

I just want to give a sense of how easy it is with some Matlab code. I'm lazy so I made it easy for me to do everything quickly. The only tricky part is navigating the struct but I'll supply the methods which should make it all much simpler.

*** Using the feature files:

1. Compute the mean worm length over a video (unsegmented frames have NaN length measures) in microns:

meanLength = nanmean(worm.morphology.length)

2. Plot the worm speed vs. time (microns/seconds vs. seconds):

speed = worm.locomotion.velocity.midbody.speed;
time = (0:(length(speed) - 1)) / info.video.resolution.fps;
plot(time, speed);

*** Using the histogram files:

3. Show a histogram of backward-movement, ventral-sided bends at the tail (in degrees where 0 = no bend & 90 = bent perpendicular to body):

bends = worm.posture.bends.tail.mean.backward.histogram;
ventralBins = bends.bins < 0; % ventral-side is signed negatively
plot(abs(bends.bins(ventralBins)), bends.PDF(ventralBins));

Best,
Ev

Balazs Szigeti

unread,
Apr 17, 2013, 5:44:38 AM4/17/13
to openworm...@googlegroups.com
Hello Ev,

I wanted to write this email ages ago, but have been very busy in the lab :( First of all very cool paper! Shortly after the Stephens paper came out about eigenworms we discussed briefly with Netta Cohen that it would be good to see if it still works on mutants. Great to see that someone went ahead and did it! Especially with large number of mutations and alleles it is a very valuable dataset. 

I have a few questions about some of the details, hope you wont mind. First of all if you take the behavioural motif dictionary (the 700 after the mRMR) and overlay it on the time series of the behaviour of a particular worm then typically how much of the time series is covered by the motifs? According to your experience most of the time their behaviour is corresponding to a motif or motifs are only sparsely distributed along the time series? 

Have you guys thought about taking into account the temporal distribution of the motifs when constructing the behavioural phenotypes? What I mean is that if there are two strains that perform motif A and B identically, they still might be distinguished by the order in which these typically motifs occur. For example it is  one phenotype prefers AABB, while the other ABAB. In this case the phenotypes would show very different behaviour even though their motifs are identical. I am not sure how realistic is this example, but I think it is worth to discuss.

Third but not the least I was wondering if the analysis code that you have developed in the lab are available upon request or not? Hopefully in the coming month OpenWorm will be functional and we will be eager to start to verify the virtual worm. While it would be straightforward to reproduce the methods of analysis it would take a significant amount of working hours to reproduce the tools that are ready in your lab. If the code is not available that is understood, however if they are, then our eternal gratitude will be with the entire Schafer lab till the end of time! :)

Best,
Balazs

Stephen Larson

unread,
Apr 17, 2013, 11:18:56 AM4/17/13
to openworm-discuss, Ev Yemini
Great questions.  Ev's not on the list yet, so I'm cc'ing him on this email.


--

Balazs Szigeti

unread,
Apr 17, 2013, 11:21:58 AM4/17/13
to openworm...@googlegroups.com
oh, did not notice. thanks for the help! b


2013/4/17 Stephen Larson <ste...@openworm.org>

Ev Yemini

unread,
Apr 17, 2013, 12:56:56 PM4/17/13
to Stephen Larson, Andre Brown, openworm-discuss
Hi,

Andre Brown is the postdoc behind the motifs paper. I've included him on this email so he can answer all relevant questions. We've had many similar discussions in the lab and he's thought deeply with regards to them.

To be clear, the PNAS motifs paper is quite different from the one we hope will soon be forthcoming. The forthcoming paper describes a database of all experiments in a convenient format for further research. Among the data are many time-series and event-based (e.g., omega turns) features, including a frame-by-frame 49-point skeleton and Stephen's-like eigenfeatures (6 eigenfeatures adapted for worms on food). The features are further analyzed to produce 702 measures describing strain phenotype.

The motifs paper is a beautiful, unbiased algorithm for discovering behavioral motifs, exploiting these fingerprints to identify similar behaviors among worm strains. It makes heavy use of the aforementioned behavioral database.

Most of the analysis (including all the features I've described) is provided through software here:

The software receives regular updates to include new functionality. At current, I don't believe the motif-based work has been incorporated into any standalone package but Andre can discuss this further.

Best,
Ev


Balazs Szigeti

unread,
Apr 18, 2013, 4:58:41 AM4/18/13
to openworm...@googlegroups.com, Stephen Larson, Andre Brown
Dear Ev,

Thank you very much for the information, already looking forward to read the new paper! I will download the analysis software and start to play with it later today! 

Best,
Balazs


2013/4/17 Ev Yemini <ev.y...@gmail.com>

Andre Brown

unread,
Apr 19, 2013, 1:39:02 PM4/19/13
to Ev Yemini, Stephen Larson, openworm-discuss
Hi Balazs, Ev, and everyone else,

Thanks for the questions.  They get at the heart of what I'm working on now, so naturally I'm very excited about the answers, I just don't have good ones yet!  There are a couple of ways you might answer the first question.  The simplest would probably be to simply associate all the motifs with their best matches in a given segment of behaviour without regard to overlap and see what's left.  Another might be to use a greedy approach and take the best match moving left-to-right through the data without overlaps (possibly with a bias to longer motifs).  I didn't really find either of those very satisfying so I never tried them.

Instead I've been trying some other approaches to temporally chunk the time series and cluster similar segments with the goal of identifying a minimal set of templates that explain all of a given behaviour.  This looks more promising but I'm still working on some variations.  I think this presents two cool possibilities.  The first is that we can quantify the size (and bound the complexity) of the worm's behavioural repertoire.  The basic idea is to incorporate more and more behavioural data until the number of required templates plateaus.  That is, at some point, you will have seen everything the worm does and no new sequences will be surprising.  In one of the flavours I've been playing with you can get ~90% of the variance of the modes with ~500 short templates (mean length is about a second).  The templates accumulate exponentially with a time constant of about 30 mins for worms crawling on food (i.e. you've seen pretty much everything after about an hour).

The other cool possibility, as you say, is the temporal sequence of the behaviours.  Just because you've seen all the short sequences after an hour doesn't mean you've seen all of the longer sequences.  Basically now we're doing grammar inference on the sequence of states to see what comes out.  This should give some insight into how the worm structures its behaviour in general and I think also make for a nice phenotyping tool.

As to your final question, the code from the motifs paper is here:

And some sample data is here:

There's a demo script in the main Motif_Analysis directory that basically reproduces the paper on the sample data.  I think everything runs on base Matlab (maybe with the stats or bioinformatics toolboxes, but I don't think you need them).  There are a couple of mex files that you'll have to compile on your system if you're not using a mac but the c code is included and there's nothing complicated in them so 'mex filename' in Matlab should do the trick.

Let me know if you have any other questions or comments.  I'm looking forward to seeing how the OpenWorm project develops!

Andre

Balazs Szigeti

unread,
Apr 25, 2013, 10:10:35 AM4/25/13
to openworm...@googlegroups.com
Hey guys,

Thank you very much for sharing this with us. I think these questions are really the logical next steps from the Stephens article, really glad that someone is doing this work! We are super excited to read your article whenever it is published!

I have downloaded the codes and some sample data and started to play with it. So far everything is under control!:)

best,
balazs


2013/4/19 Andre Brown <abr...@mrc-lmb.cam.ac.uk>

Jim Hokanson

unread,
Jun 20, 2013, 1:01:59 AM6/20/13
to openworm...@googlegroups.com
Ev & Andre,

   I'm new to OpenWorm and hoping to help out with incorporating the worm tracker results into further analysis. Do you have any documentation on the file structure? It is nice that you have compiled the Matlab code so anyone can use it but would it be possible to share the source code? I too come from a lab where we have a lot of data and source code that is all hidden behind a local server. I'm slowly working on doing all of my coding with GitHub repos so that anyone can improve upon the code.

Thanks,
Jim

Stephen Larson

unread,
Jun 20, 2013, 1:12:43 AM6/20/13
to openworm-discuss, abr...@mrc-lmb.cam.ac.uk, Ev Yemini, jim.ho...@gmail.com
I'm cc'ing Ev and Andre explicitly here as they may not have joined the list yet.  Thanks all of you for your efforts!!

Ev Yemini

unread,
Jun 20, 2013, 11:56:56 AM6/20/13
to Stephen Larson, openworm-discuss, Andre Brown, jim.ho...@gmail.com
Hi all,

The paper has been accepted to Nature Methods but is currently under embargo till publication. I'm discussing how to go about placing all methodology in Nature's Protocol Exchange where, as I understand, it will be publicly accessible without fees. Additionally, I will be speaking about it at UCLA next week, details at the end of this email.

In the meantime, I'm setting up a website that will go through several examples on how to use the data set. Beyond this, please don't hesitate to contact me if you have any questions on how to pursue a particular idea. My goal is to make the data as easily accessible as I can. Hopefully, the combination of the methods & examples will make everything clear.

My SourceForge project is woefully out of date. I intend to upload everything to GitHub but, for now, contact me & I will send the code I have.

The only thing is, Andre & I will be occupied till just after the UCLA Worm Meeting (ending June 30th). Thereafter, I can devote my full attention to this.

All the best,
Ev

Talk details below:
Print      Close window
Session Information
Session Title: Neurobiology II: Neuronal Development   Session Type: Parallel
Session Location: Northwest Auditorium   Session Time: FRI June 28, 2013 09:00AM - 12:00NOON
Abstract Information
Program Number: 128   Presentation Time: 11:36AM-11:48AM
Abstract Content
A database of C. elegans behavioral phenotypes. Eviatar I. Yemini1,2, Laura J. Grundy2, Tadas Jucikas2, Andre E.X. Brown2, William R. Schafer2. 1) Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University Medical Center, New York, NY, USA; 2) MRC Laboratory of Molecular Biology, University of Cambridge, Cambridge, UK.




Stephen Larson

unread,
Jun 21, 2013, 9:01:57 AM6/21/13
to Ev Yemini, openworm-discuss, Andre Brown, Jim Hokanson
Congrats, Ev!  I'm contacting you to get the code you have if at all possible.  Would really appreciate it.

We can help you put the code on GitHub if you like.  You can just start a new repo and we can upload the rest for you.

BTW, I will be at the UCLA meeting -- let's meet up!

Thanks,
  Stephen

Ev Yemini

unread,
Jun 21, 2013, 1:01:26 PM6/21/13
to Stephen Larson, openworm-discuss, Andre Brown, Jim Hokanson
Thanks!

Jim Hokanson made a similar request for the code. It's not as organized as I'd like and so I think it's best to hand it over with an explanation that goes as far into the details as people would like. Perhaps we can organize some sort of e-conferencing to hand everything over with someone taking an organized set of notes?

In the meantime, I'm attaching my thesis which details the hardware & software/code to a level of detail that many would perhaps avoid ;) But I think it's a relatively easy read where one can skim the details and only address their particular interests.

See you at the worm meeting!

All the best,
Ev

Ev Yemini

unread,
Jun 21, 2013, 1:21:00 PM6/21/13
to Stephen Larson, openworm-discuss, Andre Brown, Jim Hokanson
Hi all,

As a precaution, I'm taking down my thesis just until I verify that it doesn't violate the embargo in any way. The paper accepted to Nature Methods doesn't resemble my thesis write up but there is the potential for a tiny bit of overlap. I'd prefer to have written approval for disseminating so let's hold off till then. Hopefully everyone understands.

All the best,
Ev

Stephen Larson

unread,
Jun 21, 2013, 2:46:40 PM6/21/13
to openworm-discuss, Andre Brown, Jim Hokanson, Ev Yemini
Not a problem.  Would you still be open to setting up a time to do a group discussion about a hand over?  We can set it for after UCLA.

Who else would be interested attending?  I'm in.  I'll send a poll around for times to whomever is interested and indicates so by replying to the thread.

Thanks,
  Stephen


You received this message because you are subscribed to the Google Groups "OpenWorm-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openworm-discu...@googlegroups.com.
To post to this group, send email to openworm...@googlegroups.com.
Visit this group at http://groups.google.com/group/openworm-discuss.

Ev Yemini

unread,
Jun 21, 2013, 3:29:41 PM6/21/13
to Stephen Larson, Jim Hokanson, Andre Brown, openworm-discuss

Definitely! The embargo prevents me from discussing the date of publication but, if all goes well, we shouldn't have to wait very long.

I'm excited to see other people take over. I was worried that the project would die when I moved on to my postdoc.

There's a lot of opportunity for improvement. For example, I'm hoping to show a new feature at the conference: automatically detecting defecation (a welcome relief from doing it by eye). It looks to be just a few lines of Matlab code that detect the coupled sequence of posterior then anterior body contractions.

Cheers,
Ev

Jim Hokanson

unread,
Jun 21, 2013, 4:48:35 PM6/21/13
to Ev Yemini, Stephen Larson, Andre Brown, openworm-discuss
Worm defecation, now that's some real science! I meant to mention that I had talked with Ev in private about the code. I am really curious as to what is in the presentation and how the database will work. I think it is best to wait until after his presentation and after he releases that information. Also I'd like to make a quick skim over what Ev shares before any meeting.

Jim

Stephen Larson

unread,
Jun 24, 2013, 8:46:33 PM6/24/13
to Jim Hokanson, Ev Yemini, Andre Brown, openworm-discuss, Andrew Papadopoli
Terrific all.  Whomever is interested in this meeting via Google+ hangout, please fill out this poll and we'll slot you in.  If you sign up but have been lurking on the list, please just post here and let us know your interest so we can iron out the agenda to suit everyone.


Thanks,
  Stephen
Reply all
Reply to author
Forward
0 new messages