Extracting Data from .txt file

79 views
Skip to first unread message

Madison Paige Hunt

unread,
Dec 11, 2019, 2:06:02 PM12/11/19
to Mnemonic Similarity Task (MST)
Hi, 

I am trying to extract the test data from the .txt file into python. I have been trying to use csvread with the line of code below, but I am getting an error because the column headers are not in the first row of the textfile. I was wondering if anyone has found an easy way to extract the test data for analysis.

data = pd.read_csv("/Users/madisonhunt/Desktop/MSTlog_1185.txt", delimiter="\t", error_bad_lines=False, header=None, usecols=['Trial', 'Img', 'Cond', 'LBin', 'Resp', 'Acc', 'RT'])

Craig E.L. Stark

unread,
Dec 11, 2019, 3:03:26 PM12/11/19
to Mnemonic Similarity Task (MST)
There's a lot in the log file and Pandas' read_csv, while a great function, probably won't be able to do it all for you without a lot of work.  For example, you'll see a ton of header information and then the line "Study phase started at: DATECODE".  That's really your key that you're about to get the nicely-formatted table with the study phase data.  You'll want to do something like either reading the whole file in with Python, looking for that key-text along with the text at the end of the run, extracting that and saving as a separate file OR, find that line number and pass that into skiprows in read_csv.

For example:

import pandas as pd


def findlines(filename):
    s_starts
=[]
    t_starts
=[]
   
with open(filename) as file:
       
for linenum, linetxt in enumerate(file,1):
           
if "Study phase started at" in linetxt:
                s_starts
.append(linenum)
           
elif "Test phase started at" in linetxt:
                t_starts
.append(linenum)
   
return (s_starts, t_starts)


fname
='MST_128210.txt'
s_starts
,t_starts=findlines(fname)
n_stim_per
=64
study_len
= n_stim_per*2
test_len
= n_stim_per*3


study_data
=pd.read_csv(fname,skiprows=s_starts[0],header=0,nrows=study_len)
test_data
=pd.read_csv(fname,skiprows=t_starts[0],header=0,nrows=test_len)

The function then looks for the line numbers for any start phase(s) and test phase(s) and returns those.  Here, I hard-coded the 64-item-per, but you can grab that out of the file too if you like.  I also just grabbed the first (0th) study and test phase out of the log file, but you get the idea I hope.
         
Craig


Reply all
Reply to author
Forward
0 new messages