To be honest, I'm not sure whether a standardized format is really necessary, as the output of most experiments is so simple that a .csv file will do.
But on the other hand, there is no reason _not_ to have one, and there might be some benefits, so I'm all for!
Regarding the format, it seems like you pretty much got it covered. Perhaps you could make a full specification, so that we can discuss it here and anyone can have his or her say?
Open Behavioural Format? (*.obf isn't currently used by any software as far as I can find). That would convey that this is suitable for behavioural data - I think you don't want EEG, single units and other such recordings in flat-text formats like this.
1. need for a new format
I'm not overly interested in what the actual file syntax is (yaml, xml...). I think the harder stuff related to this might be generating a structure that is a) easy enough to read for a simple experiment b) flexible enough to store the data of a complex one.
So far the PsychoPy approach has been to output csv/xlsx files to achieve (a) and python native ('pickle') files to achieve (b). I think the solution you're suggesting is on the (b) end of the scale (for now you'd have to write some script to visualise anything from the below), but is certainly more readable than python pickle files.
Actually, PsychoPy can also output log files, because some people like data organised chronologically, rather than in the logical structure of the experiment. I personally find those files harder to analyse so never use them.
What do other packages do? (these are not much more than guesses)
- Psychtoolbox: the user simply handles this themselves?
- Presentation: a log of events
- PyEPL uses a log of events
- eprime: proprietary binary format, with E-DataAid to convert to others (e.g. excel)
- OpenSesame: ?
2. how to structure it
So, if the harder part is choosing a structure, what will be the issues here? Most experiments revolve around the concept of trials of different types that are looped over. That's easy enough to handle. Your form below pretty much does it
What about something that wasn't in the standard loop (e.g. a single datapoint that precedes or follows the main loop of experiments)?
What about nested loops (multiple blocks of multiple trials)?
Or maybe the format shouldn't care *why* a particular event (e.g. trial) occured, only *when* it occurred, so it shouldn't care about the existence of loops and whether or not they nest etc.
The second issue is what to do with an experiment that doesn't fit into 'trials' e.g. recording the keypresses of a subject viewing an ambiguous stimulus wouldn't be done in trials as such, and the resulting output would be a series of events, rather than an individual response.
3. a new package for viewing/exporting and batch analysis (akin to E-DataAid)
This sounds interesting, but could snowball. I guess the question would be how to focus on the aspects that aren't already done by other packages (like excel or SPSS). Going down that route will always result in complaints that the viewer can't do <insert feature of excel/spss that hasn't been implemented yet>. Also it potentially stops users/students from learning how to use those more-general packages. If they can run the repeated-measures ANOVA in the new data viewer, they don't learn to use a stats package and then get stuck when the viewer can't do a mixed-design ANOVA. So I think it could be dangerous.
*But* there are some things that might be very useful here, like having a viewer to;
- export to other different formats (right now a psychopy 'psydat' file can then be reopened and saved to .csv or .xlsx but I bet nobody ever did that because they have to write a script for it
- combine data from multiple runs, or repeat an analysis over multiple files
- export a copy of the original experiment file. I was planning for PsychoPy to start saving the experiment inside the data file. Again, users will never make use of that unless they can load the file into a viewer and see a button that says something like "Export Experiment File"
I don't have time to look at this in detail (quite a bit to do before
going to the vision science society meeting on thursday) but I had a
look at the specimen file at the end of the document.
What springs to my mind is that the code specifying the 'stimulus' might
be better as 'condition' (or 'parameters'?) and give all the parameters
that defined the trial type (ie that varied from one trial to another)::
loop.1, trial.0002:
onset: 18.345
tag: type1
condition:
text: press 2
pos: [-2,0]
response:
key: 3
correct: False
RT: 0.444
loop.1, trial.0003:
onset: 20.500
tag: type2
condition:
text: press 2
pos: [+2,0]
response:
key: 3
correct: False
RT: 0.563
Also, I wondered whether, instead of the "loop.1 + trial.003" as the tag
you could use a parameter within you trial objects that gave the heirarchy::
trial:
heirarchy: blocks.1, trials.1
onset: 5.064
...
trial:
heirarchy: blocks.1, trials.2
onset: 8.321
...
(actually, I'm not sure under that scheme what the outer name would be,
instead of trial)
Jon
--
Dr. Jonathan Peirce
Nottingham Visual Neuroscience
This message and any attachment are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to me, and immediately delete it. Please do not use, copy or disclose the information contained in this message or in any attachment. Any views or opinions expressed by the author of this email do not necessarily reflect the views of the University of Nottingham.
This message has been checked for viruses but the contents of an attachment
may still contain software viruses which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.
On Fri, 29 Apr 2011, Jeremy Gray wrote:
> Hi all,
> gee, writing a format is more work than I thought. I'm attaching a file
> with my notes (400+ lines of notes). These are followed in the file by
> a demo (100 lines). I put a copyright on this just because I saw that
> the YAML people put a copyright on YAML. maybe it looks more official
> that way, or something.
having the copyright is the right way ;-) but with
Copyright:
- (c) 2011 Jeremy R. Gray
- This document may be freely copied, as long as it is not modified.
I cannot even forward you any in-text spell-fixes (if there would be any ;)
)... why bother with such restrictions instead of just releasing it under
some appropriate license... e.g. CC BY-SA 3.0 ?
--
=------------------------------------------------------------------=
Keep in touch www.onerussian.com
Yaroslav Halchenko www.ohloh.net/accounts/yarikoptic