using python to automate data translation process from eprime to SPSS

769 views
Skip to first unread message

Kyle

unread,
Aug 18, 2012, 7:33:05 PM8/18/12
to e-p...@googlegroups.com
Hello,
I am currently struggling with trying to automate my data translation process from eprime to SPSS.  I am not using emerge because I need to create individual spss data files for each of my subjects.  I have written sytax that automates the translation from the edat output (tab-separated) to an SPSS data file but I need to figure out a way to circumvent e-dataAid so I can automate the whole process and reduce the chance of any errors during the process.  I was planning on using python so that I could have the user select the files that they want to convert to SPSS and then it does it automatically.  From what I gathered from python help forums is that I need to parse the raw data txt document into a tab-separated file.  Is there a way to do this or documentation how to do this somewhere?  The workflow would be something like: raw data txt file --> tab separated --> spss syntax (it reads from a list file I created in spss) --> spss data file.  I think the main spot I am stuck at is parsing the data from the raw file so I can circumvent edataAid.  From there it seems like it might be somewhat straightforward.  Thank you in advance for any help you can provide.  If you have any tips for the python part of this process that would be appreciated as well, but I know the group is for e-prime and do not expect help with that.
Thank you,
-Kyle


Peter Quain

unread,
Aug 18, 2012, 8:11:24 PM8/18/12
to e-p...@googlegroups.com

I assume you are concerned with 'extra' variables and think you must use edataaid to muck about with them (If I've misunderstood your post, excuse me). At least (this works) in venerable e-prime 1. I don't know if there have been any changes that break this in subsequent incarnations ... After each subject's run try opening the .edat file (double click, of course) then choose 'Save As' from File menu, choosing 'SPSS or Statview' option. Saves as a tab-del text document (.txt documant is tab delimited by default). Your SPSS syntax can pick it up from there no worries. Don't worry about the extra variables (e.g., random seed) just work out the order of variables, and which ones you want to keep. Import the tab-del file using point and click commands and paste the syntax (not run it). Then you can edit the import syntax to name your important variables as you want them, and subsequently use 'Keep' and 'Drop' commands to keep those you want, and to order them as you please in the SPSS .sav file. Rough example follows (that you could easily include in a Python (or a WinWrap Basic ... formerly Sax Basic) script):

----- From a previous post -------------------------
We export the entire unmodified edat file to tab del text then import the whole thing into SPSS, using the point and click read text data wizard, allowing the program to modify variable names however it wishes. We save then run the syntax, then print variable names with the 'Display Labels' command, identify the variables of interest, and use the 'Keep', 'Rename' and 'Drop' subcommands of the 'Save' command to sort the columns, and provide meaningful variable names and / or labels.

Once this is done for the 1st subject's data file the syntax can be run for each subsequent file simply by changing the source file name, and the outfile names. Any combination of variables can be grabbed from the master file by a few changes to the keep, drop, and rename subcommands. Because the variable names are the same in both individual subject and merged files the syntax can be used to import and organise single subject data (to combine behavioural data with EEG files, for instance), or merged files. Some rough example syntax (minus the display labels command) for single subject data (no subject identifier included) below:

* Import e-dat info for ENUM3

GET DATA  /TYPE = TXT
 /FILE = 'C:\AAPete\PhDData\Enum3\E3_5\e3_5_edat.txt'
 /DELCASE = LINE
 /DELIMITERS = "\t"
 /ARRANGEMENT = DELIMITED
 /FIRSTCASE = 2
 /IMPORTCASE = ALL
 /VARIABLES =
 Experime A10
 Subject F1.0
 Session F1.0
 Age F2.1
 V4 F6.2
 Gender A1
 Group F1.0
 Handed A1
 RandomSe F10.2
 SessionD A10
 SessionT A8
 Block F1.0
 BlockLis F1.0
 V14 F1.0
 V15 F1.0
 Practice F1.0
 V17 F1.0
 V18 F1.0
 V19 A8
 Procedur A9
 V20 A17
 Trial F2.0
 CheckAcc F1.0
 V24 F1.0
 V25 F1.0
 Code F2.0
 CollectC F6.0
 V28 F4.0
 corransw F1.0
 Fixation F4.2
 V31 F4.2
 V32 F2.1
 None F2.1
 NoWords F1.0
 numobs F1.0
 PracList F2.1
 V37 F1.0
 V38 F2.1
 V39 A9
 Recall.A F1.0
 V40 A9
 Stim1 A9
 ThreeSyl F2.1
 ThreeWor F1.0
 TrialLis F2.1
 V46 F1.0
 V47 F2.1
 TwoSyl F1.0
 TwoWords F1.0
 Type F1.0
 Word1 A3
 .
CACHE.
EXECUTE.

Save Outfile= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat_MASTER.sav' .

Get
file= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat_MASTER.sav' .

Save Outfile= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat.sav' / Keep block trial checkacc code collectc v28 corransw  recall.a stim1.

Get
file= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat.sav' .

Compute ob = $CASENUM.
Execute.
Formats ob (F8.0).

Save Outfile= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat_2.sav' / Keep ob code corransw checkacc v28 all.

Get
file= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat_2.sav' .

Save Outfile= 'C:\AAPete\PhDData\Enum3\E3_5\E3_5_edat_3.sav' / Rename (code corransw checkacc v28 = type resp acc rt) / Drop= block To stim1.




--
You received this message because you are subscribed to the Google Groups "E-Prime" group.
To post to this group, send email to e-p...@googlegroups.com.
To unsubscribe from this group, send email to e-prime+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/e-prime/-/bx6rJBdDAKEJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Kyle

unread,
Aug 18, 2012, 11:27:23 PM8/18/12
to e-p...@googlegroups.com
Thank you for your reply Peter but that is not exactly what I was looking for and I probably did not clarify enough.  I know how to use EdataAid, I just want a way to circumvent it.  If you look in the folder where your edat files are stored there is also a .txt file with the data information.  I was wondering if I could somehow translate this file into something useable and skip edataid all together.  I am trying to automate the process so I can do it with a large amount of data all at once and ensure that it is all done the same way without having to worry about human error (like when someone accidentally forgets to uncheck the unicode box or clicks on the wrong thing).  It looks like the txt file contains the data from the experiment but it looks all jumbled up and in one column.  Someone on the python forums said that they had created script that would parse out the data from that text file and make it usable.  Unfortunately they did not really elaborate how to do that.  Thank you again for your reply Peter and I hope I helped clear things up a bit.
-Kyle

Peter Quain

unread,
Aug 19, 2012, 5:55:45 AM8/19/12
to e-p...@googlegroups.com

Don't know why you'd want to go to this effort to save a single menu driven operation (save as), but at least as a start it seems  ... That asterisks are Section delimiters (e.g., *** Header Start *** and *** Header Ends *** ... or *** LogFrame Start *** and *** LogFrame End ***); within all Sections each line records a variable name and value; that the colon (:) delimits variable name and value per line within Sections; and that Level numbers in the colon delimited format occur on lines between asterisk delimited section End and Start statements. The Experiment structure exists as a hierarchy of n levels: (descending ...) Session (Experiment) / Block / Trial/ ...  withSubTrial elements 1 ...n ... or something like that. SubTrial elements are logged attributes from a trial sequence, and in my sample file all occur as children of Level 3. Now it is a matter of constructing an algorithm which transposes this (well, the correct ...) file structure appropriately to dispense with useless data and collect single variables of interest in single columns, not multiple rows across sections. 

It is perhaps possible that you could accomplish this within SPSS using (as a starting point) String Function syntax embedded in Python script looping structures. The raw .txt file will import into SPSS as a single string variable if you define no delimiter, and the text qualifier as 'Nothing'. So in fact it is easy to circumvent edataaid in terms of getting the edat .txt format data file into SPSS, it is as you say the parsing which presents the exercise. Maybe you would be looking for SubTrial elements logged after a Level 3 Section was identified, and prior to a *** LogFrame End*** statement. I think though that it might be equally possible to parse the file using Python libraries alone (as has obviously been done by mysterious python forum person(s) with their tab-del output) ... I suppose you would look for string functions to parse the file, array commands to format the data, and write(?) commands to set the file to disk. (?) Of course, there are many ways to skin a cat. But in any way it is correctly capturing the data elements sequentially from each Level 3 (in my example) child that will perhaps win the beginning of the day at least.

Best
Peter
--
You received this message because you are subscribed to the Google Groups "E-Prime" group.
To post to this group, send email to e-p...@googlegroups.com.
To unsubscribe from this group, send email to e-prime+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/e-prime/-/9esR65DcXSkJ.

David McFarlane

unread,
Aug 20, 2012, 2:37:56 PM8/20/12
to e-p...@googlegroups.com
Kyle,

Not sure I entirely understand your request, but I gather it has
something vaquely to do with wanting to automate the processing of
batches of single data files. I realize E-DataAid does not include a
mechanism to handle *batches* of .edat files, but you might try the
free AutoIt (http://www.autoitscript.com/site/autoit/ ) as a
scripting tool to automate such a batch process using E-DataAid. If
you do that, please write back!

Addressing your more specific question, AFAIK there is no
documentation of E-Prime's data .txt format. But really, the format
is quite readable, and anyone with a modicum of skill could figure it
out just by generating a judicious set of examples and reading
them. Students here have done it from time to time (goodness knows
why), and I could do it myself, but I have never found a good reason to bother.

-----
David McFarlane
E-Prime training
online: http://psychology.msu.edu/Workshops_Courses/eprime.aspx
Twitter: @EPrimeMaster (twitter.com/EPrimeMaster)
Reply all
Reply to author
Forward
0 new messages