School Leader Collaboration and Gender: A Social Network Analysis using ERGMs
Introduction
As stated in Carolan’s (2013) chapter on Network Data and Statistical Models, using statistical techniques to model social network data are important as they allow you to employ social network analysis in ways that “move beyond description toward explanation.” These inferential statistics can be used to make predictions from the data and test propositions about network-related properties. As Carolan noted, the statistical approach is concerned on the reproducibility of the network data and understanding the probablistic realizations of an underlying true tendency.
In order to analyze this probability in network studies, simulations are used. Exponential random graph models, or ERGMs, provide a way to determine whether network properties occur by chance as a result of other network properties or whether the observed properties are unlikely given other parameters in the network.
So….ERGMs - what are they, and what are they used for? 🤔 I’m still not totally sure either, but we will both find out together by diving into some data.
The Data
In Daly’s Network of School District Leaders data, information regarding the leadership network were collected at two school districts over 3 consecutive years. The particular dataset being used here describes the collaboration network of 43 school leaders in year 3 of the three-year study. It is a directed valued (weighted) network using a 5 point scale from 0-4 with higher values indicating more frequent collaborations. It also contains attribute data gender (male, (1=yes, 0=no)), efficacy (on a 9-point Likert scale), trust (on a 7-point Likert scale), and whether or not the school leader is at the school or district level (1=yes, 0=no).
Summary Statistics for Daly’s School Leader Network
Purpose
In order to more fully understand how ERGMs work, we will try to answer to the question:
Is a school leader’s gender predictive of the number of ties he/she has in the year 3 collaborative network?
We can compare our findings with that of the analysis conducted in Carolan (2013) which showed that gender is not significantly related to either sending or receiving a tie for the confidential exchange network. Again, with the use of ERGMs we can ascertain whether gender is significant in explaining the ties in the collaboration network of the school leaders and predictive of the outcomes.
Methods
In order to get the data ready for ERGM analysis, we will use the following packages - readxl
, tidyverse
, igraph
, tidygraph
, ggraph
- and do the following:
- Dichotomize the matrix data
- Create Network Graph Object
- Look at some descriptive statistics on our network
- Visualize our network
For the ERGM analysis, we will use the statnet
package to do the following:
- Create network object for ERGM analysis
- Run summaries of our model
- Model using ERGM
- Look at Goodness of Fit
- Examine Monte-Carlo Markov Chain (MCMC) Diagnostics to see if model is good representation of our network
Code
Read In Data
First we read in the data using readxl
which is take from Carolan’s (2013) student resource’s companion site. The first data set contains node information about the school leaders and their attributes and the second is a square adjacency matrix that contains information about the ties in the network.
# read in data nodes
<- read_excel("school leader attributes.xlsx", col_types = c("text",
leader_collab_nodes "numeric", "numeric", "numeric", "numeric"))
# read in data leader matrix
<- read_excel("school leader collab adj matrix.xlsx", col_names = FALSE) leader_collab_matrix
Adjacency Matrix & Graph Object
Next, we dichotomize the information (make it either 0 or 1 for ease of analysis) in the leader collaboration matrix.
# dichotomize matrix
<- leader_collab_matrix %>%
leader_matrix as.matrix()
<= 2] <- 0
leader_matrix[leader_matrix
>= 3] <- 1 leader_matrix[leader_matrix
Then we add the row names from the node data into the leader matrix
# add row names
rownames(leader_matrix) <- leader_collab_nodes$ID
colnames(leader_matrix) <- leader_collab_nodes$ID
Then create the adjacency matrix and convert to a standard edge-list for analysis
# get edges
<- graph.adjacency(leader_matrix, diag = FALSE)
adjacency_matrix
# convert matrix to standard edge-list
<- get.data.frame(adjacency_matrix) leader_edges
However, in order to calculate our descriptive statistics and plot our network, we will use the tidygraph
function tbl_graph
to create a network graph object for calculating our descriptive statistics and plotting our network.
# create network graph object
<- tbl_graph(edges = leader_edges, nodes = leader_collab_nodes) leader_graph
Descriptive Stats
Let’s look at some descriptive statistics of our network. In order to do that, we will calculate degrees of centrality, in- and out-degree, to see who sent and received the most ties.
# Node degree
<- leader_graph %>%
leader_measures activate(nodes) %>%
mutate(in_degree = centrality_degree(mode = "in")) %>%
mutate(out_degree = centrality_degree(mode = "out"))
ID | EFFICACY | TRUST | DISTRICT/SITE | MALE | in_degree | out_degree |
---|---|---|---|---|---|---|
1 | 6.06 | 4.00 | 1 | 0 | 12 | 6 |
2 | 6.56 | 5.63 | 1 | 0 | 2 | 1 |
3 | 7.39 | 4.63 | 1 | 0 | 13 | 10 |
4 | 4.89 | 4.00 | 1 | 0 | 5 | 7 |
5 | 6.06 | 5.75 | 0 | 1 | 9 | 1 |
6 | 7.39 | 4.38 | 0 | 0 | 6 | 1 |
7 | 5.56 | 3.63 | 0 | 1 | 6 | 2 |
8 | 7.50 | 5.63 | 1 | 1 | 20 | 24 |
9 | 7.67 | 5.25 | 0 | 0 | 10 | 5 |
10 | 6.64 | 4.78 | 0 | 0 | 9 | 17 |
11 | 5.06 | 5.88 | 0 | 1 | 7 | 23 |
12 | 7.44 | 3.75 | 0 | 1 | 7 | 1 |
13 | 8.28 | 5.13 | 0 | 0 | 6 | 8 |
14 | 5.67 | 5.50 | 0 | 0 | 8 | 15 |
15 | 6.72 | 5.63 | 0 | 1 | 12 | 12 |
16 | 4.61 | 3.75 | 1 | 0 | 8 | 1 |
17 | 7.17 | 5.13 | 0 | 0 | 6 | 2 |
18 | 5.00 | 4.13 | 1 | 1 | 12 | 12 |
19 | 7.89 | 3.88 | 0 | 1 | 9 | 1 |
20 | 7.17 | 5.75 | 0 | 1 | 6 | 7 |
21 | 7.44 | 5.38 | 0 | 1 | 9 | 5 |
22 | 4.94 | 3.75 | 1 | 0 | 9 | 7 |
23 | 8.50 | 5.50 | 0 | 0 | 7 | 3 |
24 | 5.22 | 5.13 | 1 | 0 | 7 | 6 |
25 | 6.94 | 4.63 | 1 | 0 | 6 | 8 |
26 | 7.00 | 4.88 | 0 | 0 | 9 | 2 |
27 | 5.06 | 4.50 | 1 | 0 | 12 | 8 |
28 | 7.61 | 4.50 | 1 | 1 | 11 | 42 |
29 | 5.67 | 4.13 | 1 | 1 | 4 | 2 |
30 | 5.50 | 4.88 | 0 | 1 | 5 | 0 |
31 | 6.78 | 4.25 | 0 | 1 | 6 | 1 |
32 | 5.88 | 4.13 | 1 | 0 | 8 | 3 |
33 | 5.89 | 5.13 | 0 | 0 | 8 | 7 |
34 | 5.12 | 4.25 | 1 | 1 | 12 | 17 |
35 | 7.44 | 3.88 | 0 | 1 | 7 | 1 |
36 | 7.72 | 5.75 | 1 | 0 | 8 | 12 |
37 | 6.64 | 4.78 | 1 | 1 | 11 | 32 |
38 | 7.56 | 5.75 | 1 | 1 | 14 | 29 |
39 | 8.00 | 5.63 | 0 | 0 | 6 | 0 |
40 | 7.33 | 5.38 | 0 | 0 | 6 | 0 |
41 | 8.17 | 4.13 | 0 | 0 | 7 | 4 |
42 | 8.06 | 5.25 | 0 | 1 | 9 | 7 |
43 | 6.72 | 3.88 | 0 | 0 | 8 | 10 |
Looks like leader 8 is on 🔥
Here is a summary of the node measure we have just calculated:
# node measures
<- leader_measures %>%
node_measures activate(nodes) %>%
data.frame()
ID | EFFICACY | TRUST | DISTRICT.SITE | MALE | in_degree | out_degree | |
---|---|---|---|---|---|---|---|
Length:43 | Min. :4.610 | Min. :3.630 | Min. :0.0000 | Min. :0.0000 | Min. : 2.000 | Min. : 0.000 | |
Class :character | 1st Qu.:5.670 | 1st Qu.:4.130 | 1st Qu.:0.0000 | 1st Qu.:0.0000 | 1st Qu.: 6.000 | 1st Qu.: 1.500 | |
Mode :character | Median :6.780 | Median :4.780 | Median :0.0000 | Median :0.0000 | Median : 8.000 | Median : 6.000 | |
NA | Mean :6.649 | Mean :4.783 | Mean :0.4186 | Mean :0.4419 | Mean : 8.419 | Mean : 8.419 | |
NA | 3rd Qu.:7.470 | 3rd Qu.:5.440 | 3rd Qu.:1.0000 | 3rd Qu.:1.0000 | 3rd Qu.: 9.500 | 3rd Qu.:11.000 | |
NA | Max. :8.500 | Max. :5.880 | Max. :1.0000 | Max. :1.0000 | Max. :20.000 | Max. :42.000 |
Click on the tabs to find summaries of the following measures grouped by whether a leader is at the school (0) or district level (1):
In and out degree
DISTRICT.SITE | n | mean | sd |
---|---|---|---|
0 | 25 | 7.520000 | 1.661325 |
1 | 18 | 9.666667 | 4.228753 |
DISTRICT.SITE | n | mean | sd |
---|---|---|---|
0 | 25 | 5.40000 | 5.993051 |
1 | 18 | 12.61111 | 11.712934 |
Looks like school leaders at the district level sent out way more ties than at the school level but the results were pretty comparable for receiving ties between school and district.
Trust
DISTRICT.SITE | n | mean | sd |
---|---|---|---|
0 | 25 | 4.9044 | 0.7191666 |
1 | 18 | 4.6150 | 0.6880514 |
Efficacy
DISTRICT.SITE | n | mean | sd |
---|---|---|---|
0 | 25 | 7.022000 | 0.9534149 |
1 | 18 | 6.131667 | 1.1151906 |
Plot
This is the fun part - let’s plot our network using ggraph
! 💹
You can see the density of the network/number of ties in the sociogram. It seems like there might be some correlation of gender to the number of ties that are sent out. We can explore further using ERGMs.
# Visualization
ggraph(leader_measures, layout = "kk") + geom_node_point(aes(size = in_degree, alpha = out_degree,
colour = factor(`DISTRICT/SITE`), shape = factor(MALE))) + geom_node_text(aes(label = ID),
repel = TRUE) + geom_edge_link(color = "grey", arrow = arrow(type = "closed",
angle = 15, length = unit(0.5, "mm")), show.legend = FALSE) + theme_graph()
ERGM
Now it is time for the pièce de résistance, the ERGM.
We’ll first have to create a network object to run our summaries and models. The statnet
package command as.network
will take care of this for us:
# create network object
<- as.network(leader_edges, vertices = leader_collab_nodes) leader_network
As stated in the vignette for the package, it is good practice to run summary stats on the network measures before running the model so you understand (🤔) what you are looking at. We’ll use the summary
function:
# Estimate ERGM Model, good practice to run summary first
summary(leader_network ~ edges + mutual) %>%
kable(caption = "Summary of Leader Network: Edges and Tie Reciprocation") %>%
kable_paper("hover", full_width = T, fixed_thead = T)
x | |
---|---|
edges | 362 |
mutual | 91 |
As you can see, there are 362 edges, and 91 were mutual dyad pairs.
The first model we will run will look at just the network structure, the edges and reciprocity (or mutual). We will set a specific seed so we can reproduce our results:
The results of the model is shown below:
summary(ergm_mod_1)
:
Callergm(formula = leader_network ~ edges + mutual)
:
Monte Carlo Maximum Likelihood Results
Pr(>|z|)
Estimate Std. Error MCMC % z value -1.95952 0.08525 0 -22.99 <1e-04 ***
edges 1.95741 0.19500 0 10.04 <1e-04 ***
mutual ---
: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Signif. codes
: 2504 on 1806 degrees of freedom
Null Deviance: 1699 on 1804 degrees of freedom
Residual Deviance
: 1703 BIC: 1714 (Smaller is better. MC Std. Err. = 1.206) AIC
For this network, it seems like the parameter ‘mutual’ or reciprocated ties between school leaders occurs more than would be expected by chance, very similar to results found in the confidential network analysis. The negative estimate of the number of edges implies the probably of a collaborative tie in year 3 is low, again much like the confidential network analysis.
Now let’s look at our specific research question, is gender predictive of the number of ties a school leader has in the year 3 collaborative network. To figure this out, we’ll add the co-variate male
using the nodefactor
specification.
First, we run our summary statistics on the model:
summary(leader_network ~ edges + mutual + gwesp(0.25, fixed = T) + nodefactor("MALE"))
edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
362.0000 91.0000 431.5217 395.0000
Then add the co-variate information using the nodefactor
specification:
<- ergm(leader_network ~ edges + mutual + gwesp(0.25, fixed = T) + nodefactor("MALE")) ergm_3
Here are the results of the model:
summary(ergm_3)
:
Callergm(formula = leader_network ~ edges + mutual + gwesp(0.25,
fixed = T) + nodefactor("MALE"))
:
Monte Carlo Maximum Likelihood Results
Pr(>|z|)
Estimate Std. Error MCMC % z value -5.22419 0.35661 0 -14.650 < 1e-04 ***
edges 1.42892 0.19020 0 7.513 < 1e-04 ***
mutual 0.25 2.20071 0.27991 0 7.862 < 1e-04 ***
gwesp.fixed..1 0.15007 0.04992 0 3.006 0.00265 **
nodefactor.MALE---
: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Signif. codes
: 2504 on 1806 degrees of freedom
Null Deviance: 1546 on 1802 degrees of freedom
Residual Deviance
: 1554 BIC: 1576 (Smaller is better. MC Std. Err. = 0.9141) AIC
In comparison with the confidential network analysis, there seems to be a slight, but this time moderately statistically significant, tendency towards male leaders being more likely to send or receive ties in the year 3 data.
Goodness of Fit
Before we leave the world of ERGMs, we want to make sure the model we created is a “good fit” of the data, or “how well it reproduces the observed global network properties that are not in the model.” To do this, we will use the gof()
function.
<- gof(ergm_3) ergm_3_gof
The black line, which is our actual observed network measure, more or less follows the blue aggregate measures generated from the simulations. The model could probably be a little better, but hey we’ll take it. 🤷🏿
MCMC Diagnostics
Last but not least, we should look at the Monte-Carlo Markov Chain (MCMC) diagnostics to really make sure our model simulation would actually produce a similar network to the one that was observed. We do this by running the mcmc.diagnostics()
function.
mcmc.diagnostics(ergm_3)
Sample statistics summary:
Iterations = 1843200:7561216
Thinning interval = 4096
Number of chains = 1
Sample size per chain = 1397
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
edges 1.2341 31.314 0.8378 1.9608
mutual 0.1875 9.477 0.2536 0.5102
gwesp.fixed.0.25 1.8149 41.423 1.1083 2.5004
nodefactor.MALE.1 1.8604 29.277 0.7833 1.3837
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
edges -67.00 -18.00 4.000 24.00 56.00
mutual -18.00 -6.00 0.000 6.00 18.00
gwesp.fixed.0.25 -87.25 -23.18 4.815 31.98 74.89
nodefactor.MALE.1 -59.00 -17.00 3.000 22.00 57.00
Are sample statistics significantly different from observed?
edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
diff. 1.2340730 0.1875447 1.8149324 1.8604152
test stat. 0.6293765 0.3676094 0.7258652 1.3444751
P-val. 0.5291026 0.7131645 0.4679214 0.1787948
Overall (Chi^2)
diff. NA
test stat. 7.0079517
P-val. 0.1383314
Sample statistics cross-correlations:
edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
edges 1.0000000 0.8219976 0.9955156 0.7550109
mutual 0.8219976 1.0000000 0.8312809 0.6410802
gwesp.fixed.0.25 0.9955156 0.8312809 1.0000000 0.7669329
nodefactor.MALE.1 0.7550109 0.6410802 0.7669329 1.0000000
Sample statistics auto-correlation:
Chain 1
edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
Lag 0 1.0000000 1.0000000 1.0000000 1.00000000
Lag 4096 0.6388993 0.4456455 0.6038312 0.45171927
Lag 8192 0.4632409 0.3405171 0.4365858 0.26873912
Lag 12288 0.3470160 0.2346855 0.3231989 0.17792970
Lag 16384 0.2438624 0.1818677 0.2249331 0.10618711
Lag 20480 0.1765139 0.1210351 0.1604500 0.07718246
Sample statistics burn-in diagnostic (Geweke):
Chain 1
Fraction in 1st window = 0.1
Fraction in 2nd window = 0.5
edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
-1.408 -1.893 -1.457 -1.165
Individual P-values (lower = worse):
edges mutual gwesp.fixed.0.25 nodefactor.MALE.1
0.1589864 0.0583848 0.1451023 0.2440729
Joint P-value (lower = worse): 0.6055984 .
MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
The charts look roughly balanced, so we’ll call this a day - our model did not fail to converge.
Conclusion
By using ERGMs, we have established a slight correlation between gender and the number of ties in the year-3 collaborative network of school leaders. This differs from the year-3 confidential ties network which showed no significant correlation. These results warrant further investigation into what role gender may play in the formation of these networks and how these relationships effect the learning environment within these schools and districts.
Final Thoughts
This is the last assignment in completion of the of the Learning Analytics graduate certificate program at North Carolina State University. 🥳🎉I want to thank our Professor @sbkellogg for such a great learning experience and all the great colleagues/classmates I’ve had throughout the program that helped me to be better in #RStats.
Por último, pero no menos importante, les quisera agraceder a mis niñas pequeñitas por su paciencia cuando mami hacia su ‘trabajo por la computadora.’ Las quiero mucho ❤️.
About Me
By day, I am Assistant Director for the Global Education Office at Duke University. By (late) night, I mess around with data. Follow me on Twitter @sorayaworldwide