categorial explanatory variables

Charles Wan

unread,

Nov 27, 2024, 5:53:42 AM11/27/24

to eventnet-users

Hi all,

The data I am trying to analyze involves a bipartite network of team members and tasks. I would like to be able to create explanatory variables (statistics) that are categorical. For example, if three of the team members are working on a task at t=0, what is the probability that one of the remaining team members would pivot to this task at t=1. Is there a way to implement this in eventnet? Maybe with DHE_NEIGHBOR_STAT?

Thank you for any and all help.

Best,

Charles

juergen lerner

unread,

Nov 28, 2024, 1:32:56 AM11/28/24

to eventnet-users

Dear Charles,

I'm not a hundred percent sure whether I've understood the data and objectives correctly. But here is my suggestion based on how I interpret it.

You want to estimate the probability, or rate, of events connecting a person A to a task X. You also know that some persons are in the same team (a dyadic relation "team_member" on person-person pairs) and also that some of the persons are already connected to some tasks (a dyadic relation "connected" on person-task pairs). Then one of the explanatory variable should count how many team members of person A are already connected to task X.

If this interpretation is correct, the covariate can be specified as a closure statistic (e.g., DHE_CLOSURE_STAT). I would count the number of "third" nodes A1, A2, ... such that the source node of the dyad (A) is team_member of the third node and the third node is connected to the target node of the dyad (X). Many variations are possible: counting the number of such third nodes, dichotomizing this count (is there any such third node?), etc.

The statistics of type DHE_NEIGHBOR_STAT could be used, e.g., to count the number of neighbors (team_members) of A, or to aggregate some node-level attribute on these neighbors - but not to check how many of these neighbors are already connected to task X.

I don't really understand why the covariate should be categorical. In my interpretation it would be a count, or a binary covariate.

My interpretation assumes that there are several teams, so that some persons are team members and some are not. If on the other hand, all persons in you data are in the same team then the effect would just count the number of persons already connected to task X. Then it could be implemented by DHE_NEIGHBOR_STAT on the target node.

I'm also not sure whether the events "person connects to task" are always dyadic, that is, one person node interacts with one task node. If this is the case, hyperedge statistics would not be needed and you could model it as a dyadic REM (in contrast to a RHEM). On the other hand, dyadic events could also be modeled with a RHEM - it's just not necessary.

I hope it helps.

Best wishes

Juergen

Charles Wan

unread,

Nov 30, 2024, 1:35:47 AM11/30/24

to eventne...@googlegroups.com

Dear Jürgen,

Many thanks for your reply. Your interpretation is correct and apologies about not having been clearer.

While any given person is always connected to one task only, there may be multiple persons connected to the same task at any given time. In this case a RHEM would still make sense with a hyperedge connecting one or more persons to a task?

There is only one team. If DHE_NEIGHBOR_STAT is used on the target node, then presumably what is being estimated is the probability/rate of events connecting person A to task X given that in the previous timestep(s), a number of persons (which could include person A) have been connected to task X, and not the probability/rate of events connecting person A to task X given that in the previous timestep(s), a number of person A's teammates (excluding person A) have been connected to task X?

Yes, I meant a binary covariate rather than a categorical covariate. Would it be possible to construct a binary covariate -- e.g., whether the specific number of teammates of person A being connected to task X in previous timestep(s) is 3? That is, we are interested in testing whether groupings of persons-task (hyperedges) of specific magnitudes have an impact on the probability/rate of events in how the rest of the team (exclusive of the grouping) decides to allocate their time and effort.

Best,

Charles

--
You received this message because you are subscribed to a topic in the Google Groups "eventnet-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/eventnet-users/KyslwwOl_pY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to eventnet-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/eventnet-users/2c62faa8-3ccf-402e-9c17-6afbef9c0dd1n%40googlegroups.com.

juergen lerner

unread,

Dec 1, 2024, 3:29:03 AM12/1/24

to eventnet-users

Dear Charles,

Let me provide my suggestions one by one below.

While any given person is always connected to one task only, there may be multiple persons connected to the same task at any given time. In this case a RHEM would still make sense with a hyperedge connecting one or more persons to a task?

Yes, a RHEM would make sense. The hyperevents contain one or more persons and always one task (you have to use the observation generator "COND_SIZE_DHE_OBS" to ensure that the sampled non-events have the same number of person and task nodes).

I'm not totally sure but I think that your observations "(group of) person(s) connect(s) to task" are time-stamped events. That is, the persons contribute something to the task at a given point in time. The other interpretation would be that the persons could connect to a task - and then stay connected until they disconnect. In the latter case you would have to model these change events ("connect_to_task", "disconnect_from_task"). But I think that the first interpretation is the more plausible one. (Of course it depends on your data / social setting / system.)

There is only one team. If DHE_NEIGHBOR_STAT is used on the target node, then presumably what is being estimated is the probability/rate of events connecting person A to task X given that in the previous timestep(s), a number of persons (which could include person A) have been connected to task X, and not the probability/rate of events connecting person A to task X given that in the previous timestep(s), a number of person A's teammates (excluding person A) have been connected to task X?

Yes, good point. If there is only one team - *and* if all pairs of persons are team members of each other (that is, every person is a member of this single team) - then the DHE_NEIGHBOR_STAT would still count A (if A is already connected to X). You could "correct" this by also defining a statistic counting the number of persons in the hyperedge ({A1,A2, ...}, X) that are already connected to X (use DHE_SUB_REPETITION_STAT with the aggregation function set to SUM) and then subtract this count from the value returned by DHE_NEIGHBOR_STAT.

The alternative is to use DHE_CLOSURE_STAT with dyad attribute name 1 set to "team_member" and dyad attribute name 2 set to "prior_connection_to_task" (a binary dyadic attribute). But if all persons are team members of each other, this is inefficient compared to the other possibility (depending on many aspects of your data and objectives).

Yes, I meant a binary covariate rather than a categorical covariate. Would it be possible to construct a binary covariate -- e.g., whether the specific number of teammates of person A being connected to task X in previous timestep(s) is 3? That is, we are interested in testing whether groupings of persons-task (hyperedges) of specific magnitudes have an impact on the probability/rate of events in how the rest of the team (exclusive of the grouping) decides to allocate their time and effort.

Statistics in eventnet can have a "function to transform statistic value", providing a limited list of functions, among them "DICHOTOMIZE" which you could use to binarize a count (for instance, to return one if the count is larger than a given value and zero else). But there is no function that returns one if and only if the count is equal to a given value (e.g., equal to 3, neither larger nor smaller). However, in most use cases these functions to transform the statistic value are not needed at all since the output of eventnet is typically modeled/analyzed by some statistical or data analysis software - and you can simply transform output values in that software.

I hope it helps.

Best wishes,

Juergen

Charles Wan

unread,

Dec 6, 2024, 7:04:06 AM12/6/24

to eventne...@googlegroups.com

Dear Jürgen,

Many thanks for your very helpful response and engagement. Regarding the below, I’m confused —aren’t the two statistics (the one counted by DHE_NEIGHBOR_STAT and the one counted by DHE_SUB_REPETITION_STAT) the same? What am I missing?

Shouldn’t it rather be DHE_NEIGHBOR_STAT (how many persons A’s are already connected to a task X1) subtracted by a (binary) statistic of whether a particular person A1 is already connected to the task X1, and this would give us a statistic of the number of persons other than a particular person A1 already connected to the task X?

Best,

Charles

juergen lerner

unread,

Dec 7, 2024, 4:44:25 AM12/7/24

to eventnet-users

Dear Charles,

The two statistics (DHE_NEIGHBOR_STAT and DHE_SUB_REPETITION_STAT) are not the same.

You compute statistics for given hyperedges of the form ({A1,A2,...},X), that is any number of actor nodes A... and exactly one task node X.

With DHE_NEIGHBOR_STAT you can for instance compute the number of all actors that are already (or at the moment) connected to the task X.

With DHE_SUB_REPETITION_STAT you can compute the number of actors *in the given set {A1,A2, ...}* that are already connected to task X. I think to recall that your goal is to get the number of *other* actors (not those who are currently attaching to X) who are already connected to X. So it would be the value of the neighbor statistic minus the value of the subset repetition statistic.