Hello, All,
I am working with a dataset where I want to identify locations where values change. Specifically, I have a series of columns (named "color1", "color2"... to "color36") and all but one are populated with the names of two colors - one of the colors is always silver. One column in the dataset will have only 1 color in it: silver. In the example dataset below, "color1" happens to be the only-silver column. The column next to it,"color10" has teal and silver. "color33" has purple and silver (though you can't see the row with silver in this example image).
There are multiple subjects (column = "Subjects") in this dataset, and each subject completed several versions of the task (column = "ExperimentName), though in the example image you see only 1 subject (91) and 1 task (FacFac). Each task contains the same # of color columns.
I need to identify the order in which the colors change from ___ to silver, per Subject per ExperimentName.
Essentially, what I think I need to do is find the first instance of a 'silver' in each column, get its index, and then create a column that places the name of the original column (e.g. color3, color35) in order by its index (because the index will indicate when it was clicked in relation to the other colors.) The problem is that this dataset includes multiple participants and experiments, so I can't just find the first instance of a silver in a given column (that will tell me only when participant 91 for experiment FacFac had a color change); I have to find the first instance per Subject per ExperimentName.
The second problem is that I'm very new to programming and python, and am struggling with wrapping my head around this conceptually as well as syntactically.
What I have done so far is to create a table of Booleans where True (1) = Silver and False (0) = any other color.
If anyone has any ideas and/or advice, I would very much appreciate it. I want to try and accomplish this in Python to improve my skills at it and not resort to point-and-click in Excel! Thank you SO much in advance!
This is the example dataset:
Here is the relevant code so far:
data = pd.read_excel("Data.xlsx")
data.filter(like='color').isin(['silver'])
boolSilver = data.filter(like='color').isin(['silver']).astype(int)
grouped = df.groupby(['ExperimentName', 'Subject'])
df['color10'].idxmax()
for group in grouped: cols = group[1].filter(regex='color1.', axis=1).reset_index(drop=True) order = cols.apply(lambda s: s.idxmax()).sort_values() print(order.index.tolist())
cols = group[1].filter(regex='color1.', axis=1).reset_index(drop=True)
order = cols.apply(lambda s: s.idxmax()).sort_values()
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: from pandas.api.types import CategoricalDtype
In [4]: colors = ['silver', 'teal', 'purple']
In [5]: color_category = CategoricalDtype(categories=colors, ordered=False)
In [6]: df = pd.DataFrame(data={
...: 'color0': pd.Series(['silver'] * 10).astype(color_category),
...: 'color1': pd.Series(np.random.choice(colors, 10)).astype(color_category),
...: 'color2': pd.Series(np.random.choice(colors, 10)).astype(color_category),
...: 'color3': pd.Series(np.random.choice(colors, 10)).astype(color_category),
...: })
In [6]: df
Out[6]:
color0 color1 color2 color3
0 silver teal silver teal
1 silver teal silver silver
2 silver silver teal silver
3 silver teal silver teal
4 silver purple teal teal
5 silver purple silver teal
6 silver teal teal teal
7 silver teal teal teal
8 silver purple purple silver
9 silver purple purple teal
In [7]: df.apply(lambda r: r != r.shift(1), axis=1)
Out[7]:
color0 color1 color2 color3
0 True True True True
1 True True True False
2 True False True True
3 True True True True
4 True True True False
5 True True True True
6 True True False False
7 True True False False
8 True True False True
9 True True False True