Traceback (most recent call last):
File "/Users/JMJ/thesis_experiment copy/analysis_new.py", line 40, in <module>
new_df = df[['uniqueid', 'test_question', 'accuracy', 'task', 'task_condition', 'test_condition', 'question_number']]
File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 3030, in __getitem__
indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
File "/usr/local/lib/python3.9/site-packages/pandas/core/indexing.py", line 1266, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/usr/local/lib/python3.9/site-packages/pandas/core/indexing.py", line 1308, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['uniqueid', 'test_question', 'accuracy', 'task', 'task_condition', 'test_condition', 'question_number'], dtype='object')] are in the [columns]"
Note this is when i take the database_url that I get from heroku config and plug it into db_url for the data parsing script see here:
from sqlalchemy import create_engine, MetaData, Table
import json
import psycopg2
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
db_url = "postgres://xxxxxxx:c2a497d64a8ccbd04b67a0fe3b4cc6b9...@ec2-107-21-10-179.compute-xxxx:5432/dc88oul40tgjtq"
table_name = 'thesis_exp'
data_column_name = 'datastring'
# boilerplace sqlalchemy setup
engine = create_engine(db_url)
metadata = MetaData()
metadata.bind = engine
table = Table(table_name, metadata, autoload=True)
# make a query and loop through
s = table.select()
rows = s.execute()
data = []
#status codes of subjects who completed experiment
statuses = [3,4,5,7]
# if you have workers you wish to exclude, add them here
exclude = [0]
for row in rows:
# only use subjects who completed experiment and aren't excluded
if row['status'] in statuses and row['uniqueid'] not in exclude:
data.append(row[data_column_name])
data = [json.loads(part)['data'] for part in data]
for part in data:
for record in part:
record['trialdata']['uniqueid'] = record['uniqueid']
data = [record['trialdata'] for part in data for record in part]
df = pd.DataFrame(data)