Extracting Data from .txt file

79 views

Skip to first unread message

Madison Paige Hunt

unread,

Dec 11, 2019, 2:06:02 PM12/11/19

to Mnemonic Similarity Task (MST)

Hi,

I am trying to extract the test data from the .txt file into python. I have been trying to use csvread with the line of code below, but I am getting an error because the column headers are not in the first row of the textfile. I was wondering if anyone has found an easy way to extract the test data for analysis.

data = pd.read_csv("/Users/madisonhunt/Desktop/MSTlog_1185.txt", delimiter="\t", error_bad_lines=False, header=None, usecols=['Trial', 'Img', 'Cond', 'LBin', 'Resp', 'Acc', 'RT'])

Craig E.L. Stark

unread,

Dec 11, 2019, 3:03:26 PM12/11/19

to Mnemonic Similarity Task (MST)

There's a lot in the log file and Pandas' read_csv, while a great function, probably won't be able to do it all for you without a lot of work. For example, you'll see a ton of header information and then the line "Study phase started at: DATECODE". That's really your key that you're about to get the nicely-formatted table with the study phase data. You'll want to do something like either reading the whole file in with Python, looking for that key-text along with the text at the end of the run, extracting that and saving as a separate file OR, find that line number and pass that into skiprows in read_csv.

For example:

import pandas as pd


def findlines(filename):
    s_starts=[]
    t_starts=[]
    with open(filename) as file:
        for linenum, linetxt in enumerate(file,1):
            if "Study phase started at" in linetxt:
                s_starts.append(linenum)
            elif "Test phase started at" in linetxt:
                t_starts.append(linenum)
    return (s_starts, t_starts)


fname='MST_128210.txt'
s_starts,t_starts=findlines(fname)
n_stim_per=64
study_len = n_stim_per*2
test_len = n_stim_per*3


study_data=pd.read_csv(fname,skiprows=s_starts[0],header=0,nrows=study_len)
test_data=pd.read_csv(fname,skiprows=t_starts[0],header=0,nrows=test_len)

The function then looks for the line numbers for any start phase(s) and test phase(s) and returns those. Here, I hard-coded the 64-item-per, but you can grab that out of the file too if you like. I also just grabbed the first (0th) study and test phase out of the log file, but you get the idea I hope.