Hello ADNIers,
After reviewing the
inst_about_data.pdf document, I still have a few uncertainties regarding how missing values are represented.
According to the documentation, missingness is most often indicated by the code -4, but other codes (e.g., -1) can also signify missing data. Additionally, blank fields may represent missingness as well.
I would like to confirm: do these rules apply simultaneously to the same variable, or are they variable-specific?
For example, in the DXSUM dataset, many predicate variables such as DXNORM and DXAD are described in the data dictionary with only:
1 = Yes
There is no explicit mention of 0 = No for these variables (whereas for other variables, such as DXPARK, a 0 is defined).
Upon inspecting the data, for variables like DXNORM, I observe values of 1, -4, and blanks.
My question is:
Are both -4 and blanks intended to represent missing data in these variables?
Or does a blank carry a different meaning (e.g., potentially indicating "No") in some cases?
In my current processing, I convert both blanks and -4 to NA in R. However, I am concerned that I might be conflating two distinct types of information (i.e., true missingness vs. an implicit "No").
For reference, here is a summary of DXNORM and DIAGNOSIS counts from the raw DXSUM data:
> dx_sum() |> dplyr::count(DXNORM, DIAGNOSIS)
# A tibble: 7 × 3
DXNORM DIAGNOSIS n
<int> <int> <int>
1 -4 2 1606
2 -4 3 1132
3 1 1 1130
4 NA 1 4564
5 NA 2 4574
6 NA 3 1744
7 NA NA 37
Any clarification you could provide would be greatly appreciated.
Thank you very much!
Ramiro