Threshold selection and group graph creation

Sean Ma

unread,

Feb 6, 2018, 12:03:37 PM2/6/18

to brainGraph-help

Hi Chris,

I have 2 questions to ask in regards on how `brainGraph`: 1. sets up thresholds, 2. creates graph matrices for groups.

For the 1st question regarding thresholding, can you point me to the references that shaped your design for the `threshold.by` functionality? Since I'm working primarily with fMRI data right now, are you aware of huge differences between structural and functional threshold generations?

For the 2nd question, I know graph matrices for groups in `brainGraph` are mainly created by `mean` (ie., A.norm.mean). How are the individual variations being visualized/conveyed with such method? In addition, are you aware of other methods for creating graph matrices for groups, such as ICA?

Thanks again!

Sean

Chris Watson

unread,

Feb 6, 2018, 1:46:20 PM2/6/18

to brainGr...@googlegroups.com

Hi Sean, please read Chapter 7.2 where the function create_mats is discussed.

There are 4 options for the threshold.by arguments.

consensus - this performs the so-called consensus-based thresholding. In this case, the "consensus" is among all subjects in a given group. If you read de Reus et al. [1], this is what they call a "group threshold". For an example in the literature, a popular one is van den Heuvel & Sporns [2].
density - this will result in networks with the given density. This is useful if you wish to control for network size when comparing groups. However, this will (probably) result in a different threshold being applied to different groups in order to achieve the same density. This is also known as proportional thresholding; see van den Heuvel et al. for discussion and why this probably isn't the best method [3].
mean - This will threshold the networks by keeping only connections for which "mean(A_ij) + 2SD(A_ij) > mat.thresh", where the mat.thresh value is given by the user. The "mean" and "SD" are taken *across all subjects in the entire study*. This option results in the exact same edges for all study subjects. See for example Cao et al. [4] and Gong et al. [5].
consistency - this performs the so-called consistency-based thresholding. This keeps connections only if "CV(A_ij) < thresh". Here, CV is the coefficient of variation (which is SD(x) / mean(X)); a lower CV implies a higher consistency (for a given edge) across all subjects. The "thresh" is derived from the input value(s) mat.thresh, such that it represents the CV threshold that would result in a given network density. Therefore, this also results in networks with the same edges for all study subjects. This was introduced by Roberts et al. (2017) as far as I know (see [6]).

The individual variations are not conveyed at all in creating the group mean matrices. It simply takes the mean (for each group) of the edges. I am not aware of other methods for creating group networks, but admittedly I have not been looking. I actually don't even use the group mean networks in my own work; I prefer to work with the subject-level graphs and do GLM analysis, MTPC, NBS, or mediation with those. For the precise reason that using group means results in a *loss* of information. With structural covariance networks (or any networks without subject-level data), you don't really have a choice, but if you have subject-specific data, I think you should use that.

I don't have experience with fMRI networks, but reference [3] discusses it.

I use consensus thresholding in my own work (probabilistic tractography). I have found that using density and consistency essentially remove any group differences (but I have not investigated this systematically, only at-a-glance). The mat.thresh values that I use are admittedly arbitrary, and I have chosen them to result in the networks having a similar increase in density from threshold_i to threshold_j. I should look into this for a more principled method, though.

I hope this answers your questions! Let me know if you have others.

Chris

References:

[1] https://doi.org/10.1016/j.neuroimage.2012.12.066

[2] https://doi.org/10.1523/JNEUROSCI.3539-11.2011

[3] https://doi.org/10.1016/j.neuroimage.2017.02.005

[4] https://doi.org/10.1523/JNEUROSCI.4793-12.2013

[5] https://dx.doi.org/10.1523%2FJNEUROSCI.2308-09.2009

[6] https://doi.org/10.1016/j.neuroimage.2016.09.053

--
You received this message because you are subscribed to the Google Groups "brainGraph-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/7c8cf7db-2288-45b5-b99e-08c0f5eb392e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sean Ma

unread,

Feb 6, 2018, 4:23:02 PM2/6/18

to brainGraph-help

Thank you for your in-depth reply, Chris.

In regards to thresholding, is there a custom thresholding option where user specific threshold levels are passed into the `creat_mats()` function?

I've adopted your sample code in section 7.2.3 where I neglected passing in values for `threshold.by` option and had the `sub.thresh` set to 0 to relieve subject constraint.

Does this imply my subject adjacency matrix correlation values will be thresholded by the 3 levels: 0.1, 0.3, and 0.5 explicitly? Or is it still defaulting to the `consensus` option?

Thanks again for your help!

Sean

thresholds <- c(0.1, 0.3, 0.5)
atlas <- atlas.name     #needs to be in path for `eval(parse(text=atlas))`
sub.thresh <- 0
modality <- 'fmri'

# creating graph matrices from files
my.mats <- create_mats(f.A,
                       modality = modality,
                       mat.thresh = thresholds,
                       sub.thresh = sub.thresh,
                       inds = inds)

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-he...@googlegroups.com.
To post to this group, send email to brainGr...@googlegroups.com.

Chris Watson

unread,

Feb 6, 2018, 6:18:13 PM2/6/18

to brainGr...@googlegroups.com

Hi Sean, the effect of your code would be to keep, for each subject, only those edges with values greater than c(0.1, 0.3, 0.5) respectively. Since the default is "consistency" and you set "sub.thresh=0", then there is no subject constraint and you are essentially thresholding by those raw values irrespective of how many other subjects also exceed the value for a given A_ij element.

You can test if it works properly with just one threshold as an example, by

X.sub <- ifelse(X > 0.1, X, 0)

It should retain all elements meeting that criteria.

Chris

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/c3b49b13-dde0-48a4-8f00-04f570ffb160%40googlegroups.com.

Sean Ma

unread,

Feb 15, 2018, 1:07:49 PM2/15/18

to brainGraph-help

Hi Chris,

I would like to follow up on this topic and report a different behavior I've found for the "sub.thresh=0" type of thresholding (which should be thresholding by the raw values).

I have included an example code below to show my testing and its results. I also have attached 2 duplicated conmat files for replication.

Test code:

# reading conmat 2 files
f.A <- list.files("~/Projects/thresh_test", '*.txt', full.names = T)
f.A

# initiating thresholds


thresholds <- c(0.1, 0.3, 0.5)

sub.thresh <- 0
modality <- 'fmri'


inds <- list(1:2)



# creating graph matrices from files


test.mats <- create_mats(f.A,


                       modality = modality,
                       mat.thresh = thresholds,
                       sub.thresh = sub.thresh,
                       inds = inds)

# comparing 2 original duplicated conmats from A
setdiff(test.mats$A[,,1], test.mats$A[,,2])

# comparing 2 original duplicated conmats from A.norm
setdiff(test.mats$A.norm[,,1], test.mats$A.norm[,,2])

# comparing original A to A.norm
setdiff(test.mats$A[,,1], test.mats$A.norm[,,1])

# comparing original A to A.norm.sub (/w threshold=0.1)
setdiff(test.mats$A[,,1], test.mats$A.norm.sub[[1]][,,1])

# finding element values smaller than threshold=0.1 in original conmat
which(test.mats$A[,,1] < 0.1)

# finding element values smaller than threshold=0.1 in original conmat
which(test.mats$A.norm.sub[[1]][,,1] < 0.1)

Below are the results matching each segment of comparison code:

> # comparing 2 original duplicated conmats from A
> setdiff(test.mats$A[,,1], test.mats$A[,,2])
numeric(0)
> 
> # comparing 2 original duplicated conmats from A.norm
> setdiff(test.mats$A.norm[,,1], test.mats$A.norm[,,2])
numeric(0)
> 
> # comparing original A to A.norm
> setdiff(test.mats$A[,,1], test.mats$A.norm[,,1])
numeric(0)
> 
> # comparing original A to A.norm.sub (/w threshold=0.1)
> setdiff(test.mats$A[,,1], test.mats$A.norm.sub[[1]][,,1])
 [1] 0.007186012 0.087695136 0.010113229 0.052246395 0.059311042 0.049008269 0.084500544 0.015638971 0.072155727 0.034845783
[11] 0.899023654 0.041251892 0.052992066 0.067633726 0.181263116 0.029645438 0.088317549
> 
> # finding element values smaller than threshold=0.1 in original conmat
> which(test.mats$A[,,1] < 0.1)
 [1]   5   6  13  14  19  20  21  27  28  33  34  35  36  37  41  42  47  48  49  50  51  52  55  56  57  58  59  60  64  65
[31]  66  67  68  71  72  73  74  79  80  81  82  83  86  87  88  95  96 101 102 103 109 110 115 116 117 118 123 124 130 131
[61] 132 137 139 145 146 147 148 149 150 153 154 159 160 161 162 163 167 168 169 170 171 172 174 178 179 180 183 184 185 186
[91] 193 194
> 
> # finding element values smaller than threshold=0.1 in original conmat
> which(test.mats$A.norm.sub[[1]][,,1] < 0.1)
 [1]   5   6  13  14  19  20  21  27  28  33  34  35  36  37  41  42  47  48  49  50  51  52  55  56  57  58  59  60  64  65
[31]  66  67  68  71  72  73  74  79  80  81  82  83  86  87  88  95  96 101 102 103 109 110 115 116 117 118 123 124 130 131
[61] 132 137 139 145 146 147 148 149 150 153 154 159 160 161 162 163 167 168 169 170 171 172 174 178 179 180 183 184 185 186
[91] 193 194

I think what really struck me was the last comparison where the list that contained values thresholded at 0.1 ( `test.mats$A.norm.sub[[1]][,,1]` ) actually contained values that are smaller than 0.1.

Can you help me confirm this behavior as well as the correct option values I should be using to achieve raw value thresholding?

Thank you for your help and guidance on this matter!

Best,

Sean

p01_ZPos_corr_copy.txt

p01_ZPos_corr.txt

Chris Watson

unread,

Feb 15, 2018, 2:08:36 PM2/15/18

to brainGr...@googlegroups.com

Hi Sean, I am starting to look into this, but could you tell me if "A.bin" shows a value of "1" where the actual matrix value is < 0.1? i.e.,

which(A.bin[[1]][, , 1] == 1)

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/e1d0f5cd-e300-4f86-b3e1-c85ef9eae018%40googlegroups.com.

Chris Watson

unread,

Feb 15, 2018, 2:32:52 PM2/15/18

to brainGr...@googlegroups.com

Hi Sean, actually can you try the following:

which(test.mats$A.norm.sub[[1]][, , 1] > 0 & test.mats$A.norm.sub[[1]][, , 1] < 0.1)

and see if it returns "integer(0)".

Sean Ma

unread,

Feb 16, 2018, 2:50:42 PM2/16/18

to brainGraph-help

Hi Chris,

Yes, it returns "integer(0)" and I think that was not I'm trying to get to from my local observation. I got too carried away and forgot the "0" values and thought I have presented a case for you.

To provide you a case where I have seen raw thresholding not holding up, I am re-attaching a data set (data.zip) which I have been working on as well as the code for the `A.norm.sub` matrices calculation.

# initiating thresholds
# thresholds <- c(0.1, 0.3, 0.5)
thresholds <- rev(seq(0.1, 0.95, 0.05))

sub.thresh <- 0
modality <- 'fmri'

# trying another way to define inds
groups <- c('Pre', 'Post')
covar <- data.frame(Study.ID = c('p21','p22','p32','p38'), Group = rep(groups, each = 4))
covar <- data.table::data.table(covar)
covar[, Group := as.factor(Group)]
setkey(covar, Group, Study.ID)
inds <- lapply(seq_along(groups), function(x) covar[, which(Group == groups[x])])   # group indicator; list



# creating graph matrices from files


test.mats <- brainGraph::create_mats(f.A,


                                     modality = modality,
                                     mat.thresh = thresholds,
                                     sub.thresh = sub.thresh,
                                     inds = inds)

The `thresholds` should have 18 levels ranging from

> 0.95 0.90 0.85 0.80 0.75 0.70 0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10

If I extract the 1st `thresholds` level (at 0.95) in the `test.mats$A.norm.sub[[1]][, , 1]` matrix, I get the following values:

> test.mats$A.norm.sub[[1]][,,1]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
[1,] Inf 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[2,] 0 Inf 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[3,] 0 0 Inf 0.5943696 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[4,] 0 0 0.5943696 Inf 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[5,] 0 0 0.0000000 0.0000000 Inf 0.7173606 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[6,] 0 0 0.0000000 0.0000000 0.7173606 Inf 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[7,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 Inf 0.6354096 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[8,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.6354096 Inf 0.0000000 0.0000000 0 0 0.0000000 0.0000000
[9,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 Inf 0.5459645 0 0 0.0000000 0.0000000
[10,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.5459645 Inf 0 0 0.0000000 0.0000000
[11,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 Inf 0 0.0000000 0.0000000
[12,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 Inf 0.0000000 0.0000000
[13,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 Inf 0.5847243
[14,] 0 0 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0 0 0.5847243 Inf

You can see there are values which are smaller than 0.95 that are still present in this thresholded matrix. This is also the same for other threshold levels as well. I have also attached a saved copy of my `test.mat` list in this post (thresh_conmat.RDA).

Can you help see if you can replicate this issue and point to me where I made an mistake? Thanks again for your help!

Sean

data.zip

threshold_conmat.RDA

Chris Watson

unread,

Feb 16, 2018, 5:55:57 PM2/16/18

to brainGr...@googlegroups.com

Hi Sean,

1) It doesn't look like you correctly specify "inds". Since it depends on the ordering of "covars", *in addition to* the ordering of the text files on the filesystem, you should instead be typing

setkey(covars, Study.ID, Group)

since on the filesystem (mine at least, a Linux system), the ordering is

p21_scan1 Pre

p21_scan3 Post

p22_scan1 Pre

p22_scan3 Post

etc.

That is,
> inds[[1]] <- c(1, 3, 5, 7)

> inds[[2]] <- c(2, 4, 6, 8)

The key point is that the order of the files on the filesystem, which is the order in which the data are loaded by R, is most important. The reason I had introduced this "inds" variable was to allow different users/groups to have their own set-up, instead of conforming to the code I had written to work with my own projects.

2) The effect you are seeing is due to "sub.thresh=0". This means that, for a given patient group, the values will be kept if *even a single subject in that group* exceeds the threshold. Take the first threshold, 0.95:

which(test.mats$A.norm.sub[[1]][, , 1] < .95 & test.mats$A.norm.sub[[1]][, , 1] > 0, arr.ind=TRUE)

The first value is (4, 3); so we can look at the values for all subjects:

> test.mats$A.norm.sub[[1]][4, 3, ]
[1] 0.5943696 0.7960921 0.8850027 0.9502833 1.2280962 0.9053104 1.5152909 1.4893588

As you can see, values [4, 5, 7, 8] all exceed the given threshold of 0.95. So that means that *all values at the (4, 3) location* will be kept.

----

If you could tell me the end result you would like, I could be of better help. The reason the function behaves this way is because (according to the papers I've read), most thresholding is done to *groups* of subjects, and not thresholding individual subjects. However, I think that is functionality that could be introduced if it is desired. And please let me know if any clarity is needed for these problems. Since I am so "close" to the code, I may not be aware of stumbling blocks for other users.

Chris

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/c043af6d-1df2-4ad9-900c-43ca3e75e458%40googlegroups.com.

Sean Ma

unread,

Feb 17, 2018, 6:20:50 AM2/17/18

to brainGraph-help

Hi Chris,

Thanks for your further explanation. I think this explains the spurious behavior that I am seeing in my analysis.

For point 1), I was indeed confused on how `brainGraph` is matching the physical files with the info from the `covar` dataframe that one defines and therefore I had another question posted to further confirm that.

As for changing `setkey` to `setkey(covars, Study.ID, Group)`, I would like to ask if the `covar` df is already sorted in the matching order with `list.files`, ie.

p21_scan1 Pre

p21_scan3 Post

p22_scan1 Pre

p22_scan3 Post

...

Do I still need to `setkey` in anyway? Seems like I can get `inds` defined directly with `seq_along` to the desired `inds[[1]] = c(1, 3, 5, 7)`, `inds[[2]] = c(2, 4, 6, 8)` order.

Is there any specific reason why `setkey` is being used here other than reordering the data.table? I've adopted this from section 7.1.1 in your user guide.

For point 2), I fully understand how `sub.thresh=0` is behaving now. So as long as any subject in the full dataset (not just in 1 group) has an element value greater than the threshold, then the same element for all subjects will be kept.

Based on this understanding, if I would like the threshold to be imposed at the individual connectivity matrix level (ie., apply threshold for each subject conmat), then is there a quick hack you can point me to while you decide whether to include this in your next version update of `brainGraph`?

I was reading your `create_mat` function and think I can get away of redefining a new function for my purpose as the output I only need is the `A.norm.sub` matrix. I assume what needs to be redefined will be from the lines of 169 in `create_mat`:

      # Back to a list of arrays for all subjects
      A.norm.sub <-
        lapply(seq_along(mat.thresh), function(z)
               lapply(seq_along(inds), function(x)
                      array(sapply(inds[[x]], function(y)
                                   ifelse(A.inds[[z]][[x]] == 1, A.norm[, , y], 0)),
                            dim=dim(A.norm[, , inds[[x]]]))))
      A.norm.sub <- lapply(A.norm.sub, function(x) do.call(abind, x))

Chris Watson

unread,

Feb 17, 2018, 9:19:13 AM2/17/18

to brainGr...@googlegroups.com

1) Ultimately, it will depend on your project setup. That is, brainGraph does not "match the physical files with the info in covars". For example, let's say that you *do* have different directories (on the filesystem) for different groups, each containing connectivity matrices for your subjects. The naming convention for this example will be "SID_mat.txt".

If I have 2 groups, and they are named "Control" and "Patient", then I would either make sure that the order in "covars.csv" (the file itself) matches that, or I would make sure the order is set when it is loaded into R. For this example, I would type "setkey(covars, Group, Study.ID)"; because "Control" comes before "Patient" alphabetically.
If I have *the exact same 2 groups*, but instead I decide to name them "TD" (typically-developing) and "Patient", then I will still use the same "setkey" command. The Study ID's in the filesystem will still be sorted *within* group, but now all of the "SID_mat.txt" from the "Patient" group will be imported *first*.
If I have a single directory with all "SID_mat.txt" files, then of course it is a moot point and a simple "setkey(covars, Study.ID)" will suffice; or, again, ensure that the ordering in "covars.csv" (or "covars.xls", or whatever) is already in the correct order.

I vaguely remember a very early version of the function looking at the character vector of filenames to extract the Study.ID; however, since I think using "full.names=TRUE" as the option to "list.files" is good practice, there is no single implementation that will work for all possible project set-ups. Of course I could impose the constraint that *all* data files be "SID_XYZ", and use some sort of "basename" command ("basename" being a Bash/Linux/Unix utility) to extract the filename itself, but I ultimately decided to leave the creation of "inds" up to the user. I can try to be more clear in the User Guide.

----

2) I assumed you were interested in that functionality; i.e., do a simple thresholding for each subject, irrespective of the rest of the group data. In that case, I think a solution for you would be pretty simple. First, there is a hidden function in the package, called "read.array", that is used to read all the filenames into an array (it's just a simple helper function). You can access it with the *triple colon* operator, as I show below. You can check out the code in the R console or see the R file at lines 265-273. You correctly assume that line 169 is the main function call that you want. The following lines of code should be sufficient for your function (I exclude the input checks done in "create_mats", lines 91-95 of the R file).

A <- brainGraph:::read.array(f.A)

Nv <- nrow(A)

A[is.nan(A)] <- 0

A.norm.sub <- lapply(seq_along(mat.thresh), function(z)

lapply(seq_along(inds), function(x)

array(sapply(inds[[x]], function(y)

ifelse(A[, , y] > z, A[, , y], 0))))

A.norm.sub <- lapply(A.norm.sub, function(x) do.call(abind::abind, x))

And then you will probably want to symmetrize these matrices (since this is what "igraph" does when creating graphs, anyway). But it isnot strictly necessary.

for (i in seq_along(mat.thresh)) A.norm.sub[[i]] <- symmetrize_array(A.norm.sub[[i]], ...)

where the "..." will be your choice of "symm.by", the argument to the function "symmetrize_mats" (default: "max").

----

This should get you your desired result. Please let me know.

Chris

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/338cd98d-e56f-4c03-b750-c04bd2cf919d%40googlegroups.com.

Sean Ma

unread,

Feb 21, 2018, 9:42:27 AM2/21/18

to brainGraph-help

Hi Chris,

Thanks for your detailed explanation and sample code.

There were 2 minor bugs that I have fixed (highlighted below) and was able to threshold each individual's correlation matrix with raw values. I've restructured it and pasted it below for others interested as well as those not so familiar with nested `apply()` functions like me to be more readable.

# restructured code for raw value thresholding
A.norm.sub <- lapply(seq_along(thresh), function(z) {
  # cat("Current threshold: ", thresh[z], "\n")
  lapply(seq_along(inds), function(x) {
    array(sapply(inds[[x]], function(y) {
      # cat(" --- current inds: ", inds[[x]][y], "\n")
      ifelse(A[, , y] > thresh[z], A[, , y], 0) 
    })
    ,dim = dim(A[, , inds[[x]]]))
  })
})

There are 2 points I would like to follow up on:

1. The thresholded `A.norm.sub` output list seems to be already "symmetrized". At least that's the case when I feed in a "symmetrized" conmat file. So latter part of your code to symmetrize the output wasn't needed this case.

2. The other point is regarding the first part of your response,

If I have a single directory with all "SID_mat.txt" files, then of course it is a moot point and a simple "setkey(covars, Study.ID)"

The "setkey(covars, Study.ID)" seems to do the trick, but it does affect later indexing behavior for the `covar` data.table when extracting subject ID in your sample code included below - `covar[groups[i], Study.ID[k]]`. I had to re-index the dataframe instead to extract the subject ID (the commented line below).

      # calculating all graph metrics and setting graph attributes
      g.tmp <- set_brainGraph_attr(g.tmp, atlas, modality = 'fmri',
                                   weighting = 'pearson', threshold = thresholds[j],
                                   subject = covar[groups[i], Study.ID[k]], group = groups[i],
                                   #subject = covar$Study.ID[inds[[i]][k]], group = groups[i],
                                   use.parallel = FALSE, A = A.norm.sub[[j]][, , inds[[i]][k]])

Similarly, the sample code for cross-checking `Study.ID` for individual's graph - `x <- all.equal(sapply(g.norm[[i]][[1]], graph_attr, 'name'), covar[groups[i], Study.ID]) `

# Double checking `Study.ID` for individual subject graph and removing it if found same.
for (i in seq_along(groups)) {
  g.norm[[i]] <- fnames[[i]] <- vector('list', length = length(thresholds))
  for (j in seq_along(thresholds)) {
    fnames[[i]][[j]] <- list.files(savedir, sprintf('g%i_thr%02i.*', i, j), full.names = T) 
    g.norm[[i]][[j]] <- lapply(fnames[[i]][[j]], readRDS)
  } 
  x <- all.equal(sapply(g.norm[[i]][[1]], graph_attr, 'name'), covar[groups[i], Study.ID]) 
  if (isTRUE(x)) lapply(fnames[[i]], file.remove)
}
saveRDS(g.norm, file = file.path(savedir, 'g.norm.rds'))

Since I'm not so familiar with `data.table` (which I want to learn more), is there a better way to index the 2 `covar` variables in the sample codes above with a `data.table` method if `setkey()` only has `Study.ID` within it?

Thanks so much again for your help!

Best,

Sean

Sean Ma

unread,

Feb 21, 2018, 2:12:48 PM2/21/18

to brainGraph-help

Hi Chris,

I was verifying my data generated by your Graph creation sample code in Section 7.3, and was perplexed on why you are indexing the `A` matrix in the `graph_from_adjacency_matrix()
` and `set_brainGraph_attr()` calculations through

A.norm.sub[[j]][, , inds[[i]][k]])

when sequencing along group > thresholds > k (inds[[i]]) (assuming we are using the 2 group example above where inds[[1]] <- c(1, 3, 5, 7) & inds[[2]] <- c(2, 4, 6, 8))

`A.norm.sub` from `my.mats` should now be stacked by thresholded conmats in group orders, ie. [f.A[1], f.A[3], f.A[5], f.A[7], f.A[2], f.A[4], f.A[6], f.A[8]] rather than original `list.files()` order in `f.A`. Therefore, by using `inds[[i]][k]` to index `A.norm.sub` you actually are pulling the stacked thresholded `A.norm.sub` conmat matrices using the `f.A` conmat order.

Does this make sense? Or did I make a mistake somewhere when interpreting your code?

Thanks again for your help!

Sean

Chris Watson

unread,

Feb 21, 2018, 4:33:49 PM2/21/18

to brainGr...@googlegroups.com

Hi Sean, thanks for spotting those bugs!

1. Are your input matrices already symmetric? I cannot think of another reason why A.norm.sub would be symmetric.

2.
If your input files (and `covars` table) are only ordered by *Study.ID*, as opposed to "Group then Study.ID", then you can remove the "groups[i]" part. That should get you the correct ordering.
If you wanted to do it "the data.table way", the code could be "covars[k, Study.ID]"
Otherwise, you would have to issue another "setkey" command before the loop to create graphs; specifically, "setkey(covars, Group, Study.ID)"

...

[Message clipped]

Sean Ma

unread,

Feb 21, 2018, 4:38:50 PM2/21/18

to brainGraph-help

Thanks, Chris!

To respond to your 2 points.

1. Yes, my input matrices are already symmetric. I assume that is why I won't need to "symmetrize" my conmats.

2. I've searched and indeed confirmed the `data.table` way "covars[k, Study.ID]" and is working.

One last issue I have is the A.norm.sub[[j]][, , inds[[i]][k]]) issue in my following post. I will wait for your response.

Thanks again!

Sean

Chris Watson

unread,

Feb 21, 2018, 6:47:19 PM2/21/18

to brainGr...@googlegroups.com

Hi Sean, no you are not mistaken. The code works if all of the Study.ID's are already in order (and therefore the matrix files, as well). I will think of code that is more flexible and get back to you.

Chris

--
You received this message because you are subscribed to the Google Groups "brainGraph-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/bd52ec1e-c3bf-49a2-942a-07439e491933%40googlegroups.com.

Chris Watson

unread,

Feb 22, 2018, 2:31:58 PM2/22/18

to brainGr...@googlegroups.com

Hi Sean, thank you again for doing some due diligence! As I mentioned in the last email, the code works perfectly well in the case that matfiles and covars are *already* ordered by:

1. All of Group 1, ordered by Study.ID

2. All of Group 2, ordered by Study.ID

3. etc.

This would be the case of the example I gave in the User Guide, since "Control" (Group 1) comes alphabetically before "Patient" (Group 2), and the matfiles were in different directories. So in fact, I *was* imposing a constraint on the ordering of matfiles, although this was not my intent.

In order to get your "A.norm.sub" arrays to match the input order, you may use the following code:

for (i in seq_along(mat.thresh)) {

A2 <- array(0, dim=dim(A.norm.sub[[i]])) # A2 is just a temp variable

A2[, , unlist(inds)] <- A.norm.sub[[i]]

A.norm.sub[[i]] <- A2
}

I have not checked, but doing this should make the graph creation code work properly (i.e., match up with the correct subject).

This issue was not applicable when "threshold.by=density" or "threshold.by=mean".

Chris

Sean Ma

unread,

Feb 26, 2018, 10:52:30 AM2/26/18

to brainGraph-help

Hi Chris,

Thanks again for your reply!

I have confirmed this `unlist(inds)` solution works if different groups of subjects were mixed within the same directory. I also want to confirm there is no error with your original code if you explicitly separate out groups of subjects into different folders at the first place.

Thanks again!

Sean

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-he...@googlegroups.com.
To post to this group, send email to brainGr...@googlegroups.com.

Chris Watson

unread,

Feb 26, 2018, 11:43:06 AM2/26/18

to brainGr...@googlegroups.com

Great, thanks for reporting back.

Chris

To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-help+unsubscribe@googlegroups.com.
To post to this group, send email to brainGraph-help@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/3672f088-cda8-4c61-aae0-85f7b298c606%40googlegroups.com.

Reply all

Reply to author

Forward