---
title: Analysing the "Divided They Blog" network with R/igraph
subtitle: (ACSPRI Summer Program 2017, 6-10 February 2017)
author: "Robert Ackland"
date: "4 February 2017"
output: pdf_document
graphics: yes
---
## Introduction
In this exercise we will analyse the "Divided They Blog"(40 A-listers) dataset.
First, load up the igraph package.
```{r eval=TRUE}
library(igraph)
```
Now we will create a data frame containing the edgelist, and then create our igraph graph object.
```{r eval=TRUE}
#load edgelist in csv file format
edge_dat <- read.csv("DividedTheyBlog_40Alist_Edges.csv",header=TRUE)
#igraph likes two-column matrix format
el <- as.matrix(edge_dat)
#create igraph graph object from the edgelist
g <- graph.edgelist(el,directed=TRUE)
```
We can get descriptive information about the network:
```{r eval=TRUE}
g
```
This informs us that there are 39 nodes and 363 edges in the network. It tells us that our graph is *D*irected, *N*amed, the edges are not *W*eighted, and it is not a *B*ipartite graph.
```{r eval=TRUE}
#Note: there is only 39, not 40, because one vertex (blog.johnkerry.com) is an isolate and hence not yet in the network...
length(V(g)$name)
#So let's add that vertex manually
g <- add.vertices(g,1,name="blog.johnkerry.com")
```
Next, we can visualise the network by plotting it directly in R:
```{r eval=TRUE}
png("figures/divided.png", width=800, height=700)
plot(g,edge.width=1.5,edge.curved=.5,edge.arrow.size=0.5)
dev.off()
```
This results in the following:
\begin{center}
\includegraphics{figures/divided.png}
\end{center}
Next we will do some more descriptive analysis:
```{r eval=TRUE}
#list of nodes
V(g)
#list of edges
E(g)
#accessing particular node
V(g)[2]
#accessing particular edge
E(g)[1]
#list of "name" (node) attributes - use head() to print the first 5
head(V(g)$name)
#number of nodes in network
vcount(g)
#another way
length(V(g))
#number of edges
ecount(g)
#another way
length(E(g))
#list of the node attributes
list.vertex.attributes(g)
#list of the edge attributes (we don't have any)
list.edge.attributes(g)
```
We will now look at some measures of node centrality:
```{r eval=TRUE}
#node indegree
head(degree(g, mode="in"))
#node outdegree
head(degree(g, mode="out") )
#top-5 nodes, based onindegree
V(g)[order(degree(g, mode="in"), decreasing=T)[1:5]]
#closeness centrality
head(closeness(g))
#betweenness centrality
head(betweenness(g))
```
##Getting attributes into the network
```{r eval=TRUE}
#load attributes in csv file format
attr <- read.csv("DividedTheyBlog_40Alist_Vertices.csv",header=TRUE)
#We are now going to create a vertex attribute called "Stance" by extracting the value
#of the column "Stance" in the attributes file when the Vertex matches the
#vertex name.
#First , lets look at the first 5 vertex names using head()
head(V(g)$name) #head() prints the first 5 elements
#the vertex names in the attributes data frame
head(attr$Vertex)
length(attr$Vertex) #we have all 40 of the vertices here
#match searches for each of the vertex names (in the igraph object) and returns their
#row position in the attributes data frame
match(V(g)$name,attr$Vertex)
#so this says that "mypetjawa.mu.nu" is row 2 of attr$Vertex, "wizbangblog.com" is in
#row 17 etc. (confirm for yourself that this is the case)
#so match returns an integer vector (indicating the correct rows in the data frame)
#this is used to return a character vector of "Stance" that is in the correct order
#and can be input as a new vertex attribute in the graph object
V(g)$Stance=as.character(attr$Stance[match(V(g)$name,attr$Vertex)])
head(V(g)$Stance)
```
Now let's plot the network again, this time using the vertex attribute "Stance" for the node colour:
```{r eval=TRUE}
#the vertex attribute "color" will be used by the plot function for node color
V(g)$color <- ifelse(V(g)$Stance=="conservative","red","blue")
png("figures/divided2.png", width=800, height=700)
plot(g,edge.width=1.5,edge.curved=.5,edge.arrow.size=0.5)
dev.off()
```
This results in the following:
\begin{center}
\includegraphics{figures/divided2.png}
\end{center}
##Calculating the homophily index
In igraph we will calculate the mixing matrix using a function written by Gary Weissman (see: https://gist.github.com/gweissman/2402741, http://www.babelgraph.org/wp/?p=351)
```{r eval=TRUE}
mixmat <- function(mygraph, attrib, use.density=TRUE) {
require(igraph)
# get unique list of characteristics of the attribute
attlist <- sort(unique(get.vertex.attribute(mygraph,attrib)))
numatts <- length(attlist)
# build an empty mixing matrix by attribute
mm <- matrix(nrow=numatts,
ncol=numatts,
dimnames=list(attlist,attlist))
# calculate edge density for each matrix entry by pairing type
# lends itself to parallel if available
el <- get.edgelist(mygraph,names=FALSE)
for (i in 1:numatts) {
for (j in 1:numatts) {
mm[i,j] <- length(which(apply(el,1,function(x) {
get.vertex.attribute(mygraph, attrib, x[1] ) == attlist[i] &&
get.vertex.attribute(mygraph, attrib, x[2] ) == attlist[j] } )))
}
}
# convert to proportional mixing matrix if desired (ie by edge density)
if (use.density) mm/ecount(mygraph) else mm
}
mixmat(g, "Stance", use.density=FALSE)
```
Now, let's calculate the homophily index for conservatives.
```{r eval=TRUE}
#create the mixing matrix
mm <- mixmat(g, "Stance", use.density=FALSE)
#population share of conservative bloggers
w_c <- length(which(V(g)$Stance=="conservative"))/length(V(g))
w_c #OK, this dataset is not too interesting for calculating homophily....
#homogeneity index of conservative bloggers
H_c <- mm[1,1]/(mm[1,1]+mm[1,2])
H_c #76.5% of conservative blogger ties are directed to other conservatives
#Homophily index of conservative bloggers
Hstar_c <- (H_c-w_c)/(1-w_c)
Hstar_c #conservatives display slight tendency towards homophily
```