Normalization of Graph Indices

46 views
Skip to first unread message

Philipp Riedel

unread,
Aug 2, 2019, 4:31:03 PM8/2/19
to brainGraph-help
Hi everybody,

I was wondering what would be the correct way to compute normalized graph measures.  Chapter 12.1. in the brainGraph User Guide explains how to get Lp.norm and Cp.norm.  However, how do I compute E.global.norm and mod.norm in an identical fashion given their random indices in 'rand.dt' (I just picked the name based on the code in 12.1.1)?

I thought about taking the mean and standard deviation of the 'E.global' in rand.dt ("E.global.rand") regardless of Subject.ID and regardless of Group but with regard to density threshold and just normalize by computing: E.global.norm <- (E.global.subj - mean(E.global.rand))/sd(E.global.rand).

But I'm not sure if there is any reason as to why analysis_random_graphs() computes a kNumRand number of random graphs for each density threshhold separately for each Subject.ID and if I should take that into account?  The subject's specific matrix shouldn't affect the random graph generation [in the case that threshold.by = 'density' was used in create.mats()], should it?  

Thanks,
Philipp

Chris Watson

unread,
Aug 3, 2019, 3:48:42 AM8/3/19
to brainGr...@googlegroups.com
Hi Philipp,
I think the easiest way would be to adapt the code from the "small.world"
function. So something like the following, assuming that "g" is a *list* of graphs for each
subject, and "rand" is a *list of lists* of the N random graphs for that subject.

E.global.wt <- sapply(g, efficiency, 'global')
E.glob.rand <- colMeans(sapply(rand, sapply, efficiency, 'global'))
E.glob.norm <- E.global.wt / E.glob.rand

Let me know if that works, or if that is even what you are looking for.

Chris

On Fri, Aug 02, 2019 at 03:31 PM, Philipp Riedel <philipp.rie...@gmail.com> wrote:

> from: Philipp Riedel <philipp.rie...@gmail.com>
> date: Fri, Aug 02 01:31 PM -07:00 2019
> to: brainGraph-help <brainGr...@googlegroups.com>
> reply-to: brainGr...@googlegroups.com
> subject: [brainGraph-help] Normalization of Graph Indices
>
> Hi everybody,
>
> I was wondering what would be the correct way to compute normalized graph
> measures. Chapter 12.1. in the brainGraph User Guide explains how to get
> Lp.norm and Cp.norm. However, how do I compute E.global.norm and mod.norm
> in an identical fashion given their random indices in 'rand.dt' (I just
> picked the name based on the code in 12.1.1)?
>
> I thought about taking the mean and standard deviation of the 'E.global' in
> rand.dt ("E.global.rand") regardless of Subject.ID and regardless of Group
> but with regard to density threshold and just normalize by computing:
> E.global.norm <- (E.global.subj - mean(E.global.rand))/sd(E.global.rand).
>
> But I'm not sure if there is any reason as to why analysis_random_graphs()
> computes a kNumRand number of random graphs for each density threshhold *separately
> for each Subject.ID* and if I should take that into account? The subject's
> specific matrix shouldn't affect the random graph generation [in the case
> that threshold.by = 'density' was used in create.mats()], should it?
>
> Thanks,
> Philipp
>
> --
> You received this message because you are subscribed to the Google Groups "brainGraph-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-he...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/c1935a98-9d4a-4a0e-9376-4fb6669f986f%40googlegroups.com.

Philipp Riedel

unread,
Aug 3, 2019, 1:29:46 PM8/3/19
to brainGraph-help
Hi Chris,

Thanks.  I think this is what I was looking for.  I have 3 more questions though:

(1) I am little confused why you are using "E.global.wt", because I'm actually interested in the unweighted index and this is what the rest of the code seems to address, isn't it?

(2) Is the "rand" variable (a *list of lists* of the N random graphs for that subject) created somewhere in the process of 'rand_vars <- analysis_random_graphs(g, kNumRand, savedir=outdir, clustering=clustering)'?  Or do I have to create this list from the "rand1_thr01_subj001.rds" etc files?  Would it be a list including [all subjects][all thresholds][kNumRand number of random graphs]?

(3) Is it common to "normalise" the indices by just dividing by "E.glob.rand" (the mean of the index in the kNumRand number of random graphs)?  Standard deviation of the random graphs is not taken into account?

Forgive all the inquires.  And thanks for your help.  I really appreciate it.
Philipp



On Saturday, August 3, 2019 at 12:48:42 AM UTC-7, Chris Watson wrote:
Hi Philipp,
I think the easiest way would be to adapt the code from the "small.world"
function. So something like the following, assuming that "g" is a *list* of graphs for each
subject, and "rand" is a *list of lists* of the N random graphs for that subject.

E.global.wt <- sapply(g, efficiency, 'global')
E.glob.rand <- colMeans(sapply(rand, sapply, efficiency, 'global'))
E.glob.norm <- E.global.wt / E.glob.rand

Let me know if that works, or if that is even what you are looking for.

Chris

On Fri, Aug 02, 2019 at 03:31 PM, Philipp Riedel <philipp.ri...@gmail.com> wrote:

> from: Philipp Riedel <philipp.ri...@gmail.com>
> date: Fri, Aug 02 01:31 PM -07:00 2019
> to: brainGraph-help <brainGr...@googlegroups.com>
> reply-to: brainGr...@googlegroups.com
> subject: [brainGraph-help] Normalization of Graph Indices
>
> Hi everybody,
>
> I was wondering what would be the correct way to compute normalized graph
> measures.  Chapter 12.1. in the brainGraph User Guide explains how to get
> Lp.norm and Cp.norm.  However, how do I compute E.global.norm and mod.norm
> in an identical fashion given their random indices in 'rand.dt' (I just
> picked the name based on the code in 12.1.1)?
>
> I thought about taking the mean and standard deviation of the 'E.global' in
> rand.dt ("E.global.rand") regardless of Subject.ID and regardless of Group
> but with regard to density threshold and just normalize by computing:
> E.global.norm <- (E.global.subj - mean(E.global.rand))/sd(E.global.rand).
>
> But I'm not sure if there is any reason as to why analysis_random_graphs()
> computes a kNumRand number of random graphs for each density threshhold *separately
> for each Subject.ID* and if I should take that into account?  The subject's
> specific matrix shouldn't affect the random graph generation [in the case
> that threshold.by = 'density' was used in create.mats()], should it?  
>
> Thanks,
> Philipp
>
> --
> You received this message because you are subscribed to the Google Groups "brainGraph-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to brainGr...@googlegroups.com.

Chris Watson

unread,
Aug 3, 2019, 6:33:56 PM8/3/19
to brainGr...@googlegroups.com
1. Sorry, I've been using weighted global efficiency a lot for one of my projects and I typed it out of habit.
2. It would be all random graphs for subjects at a single density/threshold. So it might be called "rand1_thr01.rds"
3. I am not sure, but that is how normalized efficiency and characteristic path length are calculated. You could certainly use the mean and standard deviation of the null distribution, similar to how a z-score is calculated. I.e., 
(x -  mean(x)) / sd(x)
0
To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-he...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/5257fa0e-bb21-4383-8e42-381fc8ebb21e%40googlegroups.com. See

Philipp Riedel

unread,
Aug 4, 2019, 2:43:50 PM8/4/19
to brainGraph-help
Thanks.  I had some trouble adapting the code the way you suggested and ended up just pulling mean(E.global) and mean(mod) from the rand.dt to calculate E.global.norm and mod.norm.

However, I came across a peculiarity I didn't expect.  I have three groups of subjects (2 patient, 1 control).  I used density thresholding for the matrices.  Later, I created 20 random graphs per subject using analysis_random_graphs().  In the resulting rand.dt, the random graph measures (Cp, Lp, E.global, mod) significantly differ between the three groups.  Given that all subjects have the same number of edges per threshold (density thresholding) and the graphs are randomly created based on density, how can that be the case?  Am I missing an important detail on how the random graphs are created?

Right now, I tend to normalize the graph measures of each subject (irrespective of group) by dividing by the mean of the random measures in the control group only.

Thanks,
Philipp


Chris Watson

unread,
Aug 5, 2019, 3:32:17 AM8/5/19
to brainGr...@googlegroups.com
I think there may be a significant difference because you created only
20. It depends "how significant" the difference is. What statistical
test are you using to determine significance? What is the code you
used? Did you make sure to calculate means and get the correct values
per group and density? e.g., you could do something like

rand.dt[, kruskal.wallis(E.global ~ as.numeric(Group))$p.value, by=density]

Furthermore, the *pattern* of connections contributes to the metrics.
Granted, it is reasonable to expect that any differences would
"average out" with multiple random graphs, but I would have to see the
data to know more.
>>> email to brainGr...@googlegroups.com <javascript:>.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/brainGraph-help/5257fa0e-bb21-4383-8e42-381fc8ebb21e%40googlegroups.com
>>>
>>> <https://groups.google.com/d/msgid/brainGraph-help/5257fa0e-bb21-4383-8e42-381fc8ebb21e%40googlegroups.com?utm_medium=email&utm_source=footer>.
>>>
>>> See
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "brainGraph-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to brainGraph-he...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/brainGraph-help/097895b6-2a88-41e5-9214-77b80ef58ed4%40googlegroups.com.
>

Philipp Riedel

unread,
Aug 5, 2019, 9:49:51 AM8/5/19
to brainGraph-help
Hi Chris,

Thanks for the quick reply.

I used ezANOVA(), but using a non-parametric test as you suggested results in very low p-values as well:


rand.dt[, kruskal.test(E.global ~ as.numeric(Group))$p.value, by=threshold]


    threshold           V1
 1:         1 1.167238e-23
 2:         2 3.373995e-37
 3:         3 1.645668e-38
 4:         4 1.335406e-47
 5:         5 3.148767e-50
 6:         6 2.059087e-55
 7:         7 3.512499e-59
 8:         8 2.105415e-64
 9:         9 6.152468e-67
...

Because I have about 40 subjects in each group, I am comparing 800 random graphs per group.  That seems to be a sufficient amount.

This is how I created the random graphs:

rand_vars <- analysis_random_graphs(g, 20, savedir = outdirA, clustering = FALSE)

rand.dt <- rand_vars$rand


"g" came from:

##########################

#    Graph Creation      #

##########################


## STEP 1 ##


setwd(savedir)


g.group <- g <- fnames <- vector('list', length=length(groups))


for (i in seq_along(groups)) {

for (j in seq_along(thresholds)) {

    print(paste0('Threshold ', j, '/', length(thresholds), '; group ', i, '; ', format(Sys.time(), '%H:%M:%S')))

for (k in seq_along(inds[[i]])) {

g.tmp <- graph_from_adjacency_matrix(A.norm.sub[[j]][, , inds[[i]][k]], mode = 'undirected', diag = F, weighted = T)

      g.tmp <- set_brainGraph_attr(g.tmp, atlas = atlas, modality = modality, weighting = 'pearson', threshold = thresholds[j], subject = covars[groups[i], Study.ID[k]], group = groups[i], use.parallel=FALSE, A = A.norm.sub[[j]][, , inds[[i]][k]])

      saveRDS(g.tmp, file = paste0(savedir, sprintf('g%i_thr%02i_subj%03i%s', i, j, k, '.rds')))

}

}

}


## STEP 2 ##


for (i in seq_along(groups)) {

g[[i]] <- fnames[[i]] <- vector('list', length = length(thresholds))

for (j in seq_along(thresholds)) {

    fnames[[i]][[j]] <- list.files(savedir, sprintf('g%i_thr%02i.*', i, j), full.names=T)

    g[[i]][[j]] <- lapply(fnames[[i]][[j]], readRDS)

    }

x <- all.equal(sapply(g[[i]][[1]], graph_attr, 'name'), covars[groups[i], Study.ID])

if (isTRUE(x)) lapply(fnames[[i]], file.remove)

}

saveRDS(g, file = paste0(savedir, 'g.rds'))


I will try to set up data for a subset of the sample.  However, it might take a while, because analysis_random_graphs() runs forever on my machine.



On Monday, August 5, 2019 at 12:32:17 AM UTC-7, Chris Watson wrote:
I think there may be a significant difference because you created only
20. It depends "how significant" the difference is. What statistical
test are you using to determine significance? What is the code you
used? Did you make sure to calculate means and get the correct values
per group and density? e.g., you could do something like

rand.dt[, kruskal.wallis(E.global ~ as.numeric(Group))$p.value, by=density]

Furthermore, the *pattern* of connections contributes to the metrics.
Granted, it is reasonable to expect that any differences would
"average out" with multiple random graphs, but I would have to see the
data to know more.

Philipp Riedel

unread,
Aug 5, 2019, 11:28:28 AM8/5/19
to brainGraph-help
I uploaded the code and the data to the Dropbox folder we've been using previously.  Thanks!

Chris Watson

unread,
Aug 6, 2019, 2:12:17 AM8/6/19
to brainGr...@googlegroups.com
I looked at the data. It looks like the mean values for those
variables per group are very close to one another. They are
*statistically* significantly different, but they are likely not of
*practical/clinical* significance. For example, I ran

rand.dt[threshold == 1, summary(lm(E.global ~ Group))]

The parameter estimates are all very small (< 0.01), and the r-squared
is extremely small (~0.06). They are indeed statistically significant,
but then again this is technically repeated measures data (i.e., there
are 20 data points per subject & threshold) so the model is not exact.
Another thing to consider is only creating 20 per subject; I recall
reading that 100 is the minimum for stable estimates, but I do not
remember which paper it was.

Either way, the between--group differences are very small and of the
same order of magnitude as (or smaller than) the observed
between-group differences.

RE the runtime: yes, "analysis_random_graphs" runs every step from
random graph generation to small-world calculations to rich-club
calculations to simple calculation of some popular graph metrics (such
as modularity). So it is a huge convenience for the user who would be
running all those commands anyway, but the runtime increases quite a
bit. The solution to this issue, specifically, is to run on a computer
with more cores and/or higher CPU clock speeds, and necessarily more
RAM.
> email to brainGraph-he...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/brainGraph-help/6d4fa1b2-e44e-49e2-8153-0f8d07c2e308%40googlegroups.com.
>

Philipp Riedel

unread,
Aug 6, 2019, 9:03:13 AM8/6/19
to brainGraph-help
Thanks for taking the time and explaining.  That makes sense.  However, it still has a huge effect on the "group" differences of the normalized graph measures.  Since "group" should not have an effect on the random measures, wouldn't it be reasonable to normalize each subject's measure by dividing it by the mean of all random graphs that were generated by analysis_random_graphs() (at the respective threshold, but irrespective of group and subject)?  This might only be feasible for density thresholding but would speed up the process and stability of the normalization (because there will be kNumRand * length(Subject.ID) * Group random graphs to normalize with).  10 random graphs per subject should be sufficient then.  Just a thought.  

I agree with you that analysis_random_graphs() is very convenient.  Usually, I just let it run over night and it will have finished running by the next morning without any issue.

Thanks,
Philipp


On Monday, August 5, 2019 at 11:12:17 PM UTC-7, Chris Watson wrote:
I looked at the data. It looks like the mean values for those
variables per group are very close to one another. They are
*statistically* significantly different, but they are likely not of
*practical/clinical* significance. For example, I ran

rand.dt[threshold == 1, summary(lm(E.global ~ Group))]

The parameter estimates are all very small (< 0.01), and the r-squared
is extremely small (~0.06). They are indeed statistically significant,
but then again this is technically repeated measures data (i.e., there
are 20 data points per subject & threshold) so the model is not exact.
Another thing to consider is only creating 20 per subject; I recall
reading that 100 is the minimum for stable estimates, but I do not
remember which paper it was.

Either way, the between--group differences are very small and of the
same order of magnitude as (or smaller than) the observed
between-group differences.

RE the runtime: yes, "analysis_random_graphs" runs every step from
random graph generation to small-world calculations to rich-club
calculations to simple calculation of some popular graph metrics (such
as modularity). So it is a huge convenience for the user who would be
running all those commands anyway, but the runtime increases quite a
bit. The solution to this issue, specifically, is to run on a computer
with more cores and/or higher CPU clock speeds, and necessarily more
RAM.

Philipp Riedel

unread,
Aug 6, 2019, 8:01:01 PM8/6/19
to brainGraph-help
I ran analysis_random_graphs() for kNumRand = 100.  The group differences are still there, the p-values even lower.  It seems like a systematic artifact to me, I just don't know if it's within analysis_random_graphs() or within the list "g" I took from the first analysis of the individual graph measures.

Chris Watson

unread,
Aug 6, 2019, 9:02:12 PM8/6/19
to brainGr...@googlegroups.com
Hi Philipp, I think I understand better what you have been proposing after
thinking about it a bit more. And in fact, I thought of another reason for the
differences you're seeing and wrote a function a long time ago to deal with
those kinds of issues. I'll try to explain better below and refer to your
previous emails.

In your initial email and throughout the thread, you mentioned using the mean
and/or SD of all random graphs. I think this is a reasonable approach; if you
think about a measure like IQ that is standardized to "mean = 100; sd = 15", the
standard scores are calculated for *everyone*. And under the null hypothesis,
you would (as you had suggested) assume that the graph metrics for any given
subject's randomized graph should be the same.

*But*, regarding your quotes "the subject's specific matrix shouldn't affect the
random graph generation" and "...group should not have an effect on the random
measures" (cf. your initial email and your email earlier today),
this is not quite true. Particularly for (observed) graphs created from
correlation- or covariance-based data. This is actually why I wrote the
"sim.rand.graph.clust" function which can be used by passing "clustering=T"; it
uses an algorithm from Bansal et al. (2009) to generate equivalent random graphs
with both identical degree distributions *and* matching transitivity/clustering
coefficient. So I think that the group differences you are seeing are due to the
"simple" random graph method not accounting for higher-order interactions (i.e.,
transitivity).

The reason "Lp.norm" and "Cp.norm" are typically calculated by dividing the
observed metric by the mean of the *subject-specific* random metrics is because
they are used specifically to determine whether a subject's graph has
small-world organization; it is not necessarily related to groupwise
differences.

However, since you have subject-level data I don't think it is necessary to
worry about the random graph metrics anyway. You could do a GLM/ANOVA/etc. of
the observed metrics directly to test for group differences. It depends on what
you are interested in testing; if you want to know if a given subject's global
efficiency is different from that of an equivalent random graph, then you would
"normalize" by the mean global efficiency of *that subject's* random graphs
only.

Sorry for the long email and any confusion it may lead to. But I hope this
clarifies things for you. I think you can safely skip the random graph steps if
you are most interested in testing group differences in global efficiency, and
instead use "brainGraph_GLM" (or "mtpc") to directly test. Let me know if you
have more questions.

Chris

On Tue, Aug 06, 2019 at 08:03 AM, Philipp Riedel <philipp.rie...@gmail.com> wrote:

> from: Philipp Riedel <philipp.rie...@gmail.com>
> date: Tue, Aug 06 06:03 AM -07:00 2019
> to: brainGraph-help <brainGr...@googlegroups.com>
> reply-to: brainGr...@googlegroups.com
> subject: Re: [brainGraph-help] Normalization of Graph Indices
>> https://groups.google.com/d/msgid/brainGraph-help/6d4fa1b2-e44e-49e2-8153-0f8d07c2e308%40googlegroups.com.
>>
>> >
>>
>
> --
> You received this message because you are subscribed to the Google Groups "brainGraph-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to brainGraph-he...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/brainGraph-help/181a4d2d-8f1e-4bdb-bc85-c082adb1a218%40googlegroups.com.

Philipp Riedel

unread,
Aug 7, 2019, 9:02:07 AM8/7/19
to brainGraph-help
Hi Chris

Your message was helpful and not confusing at all.  A lot of studies report normalized graph measures for group comparisons and cite van Wijk et al. (2010) [https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0013701].  But I guess you're right that since I'm having the same N for nodes (same atlas) and the same k (same density thresholds) across subjects, it is not crucial to normalize.  Comparison to other studies might be compromised though.  Hence, I will run the "sim.rand.graph.clust" function ("clustering=T" in "analysis_random_graphs") at a later time on a computer with a higher CPU and report here whether the group differences in the random graphs are still apparent.

Thanks,
Philipp

Philipp Riedel

unread,
Aug 7, 2019, 9:46:27 AM8/7/19
to brainGraph-help
Hi Chris,

I actually have two more short questions:

(1) Why are both weighted und unweighted E.global and mod computed by the library but only unweighted Cp and Lp?  E.global.wt seems to be an interesting measure to look at, but to report it without any weighted Cp and Lp seems incomplete, unless there is a rationale.  For example, does weighting not have such a big impact on Cp and Lp?  

(2) I Fisher-z-transformed the covariance matrices after Pearson correlation of the time series.  But I still use weighting = 'pearson' in  set_brainGraph_attr().  That shouldn't affect the graphs, should it?

Thanks,
Philipp  

Chris Watson

unread,
Aug 28, 2019, 9:56:44 PM8/28/19
to brainGr...@googlegroups.com
Hi Philipp, I just noticed that I never replied to this, so I apologize.

1. The reason is basically "historical"; i.e., there wasn't
functionality in igraph so I never added it. It is done through
"set_brainGraph_attrs", though. For weighted graphs it returns both a
graph- and vertex-level attribute called "Lp.wt". I will add a
function to do this in the next release.

2. No, the "weighting" argument is strictly for your own record. i.e.,
it is stored so that if you need to look back at the data in the
(distant) future, you will be able to tell what kind of data it
represents. You can assign any valid character string you want for
this argument.

Chris
>> <javascript:>> wrote:
>>
>> > from: Philipp Riedel <philipp.ri...@gmail.com <javascript:>>
>> > date: Tue, Aug 06 06:03 AM -07:00 2019
>> > to: brainGraph-help <brainGr...@googlegroups.com <javascript:>>
>> > reply-to: brainGr...@googlegroups.com <javascript:>
>> https://groups.google.com/d/msgid/brainGraph-help/181a4d2d-8f1e-4bdb-bc85-c082adb1a218%40googlegroups.com.
>>
>>
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "brainGraph-help" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to brainGraph-he...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/brainGraph-help/8091b8e3-1b42-44a6-b648-3afbed292bcc%40googlegroups.com.
>
Reply all
Reply to author
Forward
0 new messages