Dear Keith,
Thank you very much for your valuable perspectives. If you check the dataset, I have 16004 unique items (item_id) given to 2000 test takers (person_id). But if you look at how these items haven been assigned to the persons, you realize some items have been taken by as many as 347 persons, while some have been taken by only 1 person (overall, the median frequency of using a specific item by test takers is 3 times !red flag! see figure below, there are only 5 unique items taken by >= 200 persons).
So, to make the data wide would mean two things:
(1) I could widen the data such that I have 2000 rows (persons), and `16004` columns for each unique item.
(2) I could widen the data such that I have 2000 rows (persons), and `37` columns for each item_type attempt (e.g., person # 1, took 3 of X type item, 4 of Y type item).
Option 1 for widening the data, would lead, therefore, to the majority of items being barely estimable, right? (because only few people have taken them)
Option 2 for widening the data, would lead, however, to item_type attempts (e.g., each person taking x number of Y type item, w number of Z type item etc.)
If my reasoning regarding the data widening is correct, then it may seem more reasonable to estimate item difficulty parameter really by their type, and not unique items per se, right?
If so, I'm attaching the option 2 widened dataset and would be really interested to know how you would model it and whether you agree with dataset?
Thanks, Simon
#== New widened dataset (by item_type):