Thursday, November 12, 2015

Network Graph

Network Data Analysis: Online University
  1. Load the dataset called studentNetwork.RData. Read the datasets negative-words.txt and positive-words.txt into R. Here is a link: studentNetwork.RDataView in a new window It is also in the Files section
#dir <- "C:/Users/Bhupendra Mishra/Desktop/donotbackup/"
load("C:/Users/Bhupendra Mishra/Desktop/donotbackup/studentNetwork.RData")
  1. Make a plot of the network
plot(studentNetwork, main = "Student NEtwork")

  1. How many nodes and edges are there in studentNetwork?
## Network attributes:
##   vertices = 205
##   directed = FALSE
##   hyper = FALSE
##   loops = FALSE
##   multiple = FALSE
##   bipartite = FALSE
##  total edges = 203 
##    missing edges = 0 
##    non-missing edges = 203 
##  density = 0.009708274 
## Vertex attributes:
##  Course_of_Study:
##    character valued attribute
##    attribute summary:
##          Business         Fine_Arts      Liberal_Arts Physical_Sciences 
##               109                 4                68                 6 
##        Technology 
##                18 
##  Sex:
##    character valued attribute
##    attribute summary:
##   F   M 
##  99 106 
##  StudentID:
##    integer valued attribute
##    205 values
##  Tweets:
##    character valued attribute
##    attribute summary:
##    the 10 most common values are:
##                                                    abnormal|arbitrary|better-than-expected|dirt-cheap|foolish|lawful|lonesome|pretty|supremely|trump|unconditional|unthinkable 
##                                                                                                                                                                              1 
##                                       abominably|affably|benefit|enchant|enraptured|finagle|fugitive|gleeful|ingenious|nourish|premier|priceless|rapturously|vexingly|wasteful 
##                                                                                                                                                                              1 
##                abominate|adventuresome|affluent|blatantly|conveniently|dummy-proof|hedonistic|idol|improvement|irking|laudable|refresh|rumbling|silent|sweetness|titillatingly 
##                                                                                                                                                                              1 
##                                                                                   abort|altruistically|barbarously|disgruntle|faith|imaginative|indebted|ingenious|unwatchable 
##                                                                                                                                                                              1 
## abound|amenable|anomalous|baffling|dominates|drab|enjoy|flawlessly|happily|humorous|illness|reaffirm|shiny|stupendously|taboo|thoughtfulness|treasure|well-being|well-educated 
##                                                                                                                                                                              1 
##                                                                          abound|clouding|comfortable|expansive|glorious|impartial|principled|reforming|statuesque|troubled|woo 
##                                                                                                                                                                              1 
##                                                                        absurdly|aspiration|brainwash|clear|ergonomical|eye-catch|immaculate|inevitable|nurturing|punk|rumbling 
##                                                                                                                                                                              1 
##                                                                                                       abundant|entranced|hoodwink|outperforms|regress|solemn|thriving|upseting 
##                                                                                                                                                                              1 
##                                              acclaimed|accomplishment|believable|boisterous|breach|flawlessly|fondness|frail|hooray|idolized|peerless|randomly|spew|temptingly 
##                                                                                                                                                                              1 
##                                                                                                       accolade|calming|calumniation|cure|effusively|offending|saint|stupendous 
##                                                                                                                                                                              1 
##  Year:
##    numeric valued attribute
##    attribute summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   3.000   2.732   4.000   6.000 
## No edge attributes
Answer: We have total 205 nodes and 203 edges
  1. What are the attributes of this network object? What do they each contain?
Answer: There are four attributes and they contains as follows: 1. Course_of_study(Business, Fine_Arts, Liberal_Arts, Physical_Science, Technology) 2. Sex(F, M) 3. StudentID(205 integer value) 4. Tweets(character values)
5, What proportion of the students are Studying Business? What proportion are in their 2nd Year?
symbol = c(6,5,3,7,4,2) = studentNetwork%v%"Sex"
##   F   M 
##  99 106
barplot(table(, main = "Sex of student", col=symbol)

##          Business         Fine_Arts      Liberal_Arts Physical_Sciences 
##               109                 4                68                 6 
##        Technology 
##                18
barplot(table(student.course_study), main="course of study of student", col=symbol)

##  1  2  3  4  5  6 
## 62 40 42 25 24 12
barplot(table(student.year), main="Year of study of student", col=symbol)
Write a function that splits the pipe delimited string into a vector of single words, counts which ones are positive and which ones are negative, assigns a score of +1 for each positive word and -1 for each negative word, and sums them for a total score.
#Ceate legend for network graph,12) [match(, c("M","F"))]
symbol.course_study=c(1,2,3,4,5)[match(student.course_study,c("Business", "Fine_Arts", "Liberal_Arts", "Physical_Sciences", "Technology"))]

plot(studentNetwork, vertex.sides =, vertex.rot = 45, vertex.cex = 2, vertex.col = symbol[student.year], edge.lwd = 2, cex.main = 1, displayisolates = TRUE, main = "Network Diagram - Student Year")

legend("bottomright", c("Year1", "Year2", "Year3", "Year4", "Year5", "Year6"), fill = symbol, cex=0.6)
Adjacency matrix: Netword of Nodes and their interconnection can be represented with adjacency matrix The Adjacency matrix of a finete graph G on n vertices is the n x n matrix where the non-diagonal entry a(ij) is the number #of edges from vertex i to vertex j, and the diagonal entry a(ij), depending on convention, is either once or twice the number of edges (loops) from vertex i to itself. Undirected graphs often use the latter convention of counting loops twrice, whereas directed graphs typically use the former convention. There exists a unique adjacency matrix for each isomorphism class of graphs and it is not the adjacency matrix of any other isomorphism class of graphs. In the special case of finite simple graph. The adjacency matrix is a (0,1)-matrix with zeroz on its diagonal. If the graph is undirected, the adjacency matrix is symmetric
#Creating Adjacency Matrix
Degree: A Node’s degree in an undirected network is defined as its number of edges to other nodes<-degree(student.matrix)
##   [1] 26  8  0  0  2  0  0  6  8  0  4  0  4  2  8  4  2  8  2  0  6 12  2
##  [24]  0 14  0  4  0  4  8  4  2  2  4  0  6  0  2  0  2  0  0  6  6  0  0
##  [47] 18  0  0  0  6  8  4  6 18  4  2  4  6  4  6  0  4 10  4  8  0  2  0
##  [70]  6  2  0  0 10  4  4  4  2 10  0  2  2  4  0  0  2 20  4  4  2  2  8
##  [93]  2  0  0 14  2  2  6 10  2  8  4 10  4  0  0  6  4 10  2  2  0  6  6
## [116]  0  2  0  0  0  2  2 14  6  2  0 10  2  6  0  4  4  2  6  0 10  8  6
## [139] 12 10  2  6  0  2  0  2  2  2  6  8  4  0  6  0  2  2  8  8  0 16  6
## [162]  0  0  6  8  2  2  0  0  2  0  0  8  2  0  2  0  6  8  2  2  4  6  0
## [185] 10  2  8  0 14  8  4  4  2  6  6  4  0  2  2  2  4  2  0  6  2
  1. Plot a histogram of the scores. What does it indicate?
hist(, col=symbol, main="Distribution of Nodes' Degree", ylab="Number of Students", xlab="Numbder of Connections")
Betweenness: A deeper measure of network structure is obtained through betweenness. Betweenness is a centrality measure of a node/vertex within a graph Nodes that occur on many shortest paths between other nodes have heigher betweenness than those that do not
student.betweenness <- betweenness(student.matrix)

plot(student.betweenness, col="green", main="Betweenness Centrality", ylab="Betweenness")

n.words <- read.table("C:/Users/Bhupendra Mishra/Desktop/donotbackup//negative-words.txt", header=TRUE, quote="\"")
p.words <-read.table("C:/Users/Bhupendra Mishra/Desktop/donotbackup//positive-words.txt", header=TRUE, quote="\"")
tweet.score <- foreach(i=1:205, .combine='rbind') %dopar% 
   words<-unlist(strsplit(student.tweets[i], split ='\\|'))
 #  View(words)
   pos.matches = match(words, unlist(p.words))
  # View(pos.matches)
   neg.matches = match(words, unlist(n.words))
   pos.matches = !
   neg.matches = !
   score = sum(pos.matches) - sum(neg.matches)
hist(tweet.score, main="Sentiment analysis of the Students",xlab="Tweet Score",col=symbol)

  1. Do the distribution between 2nd Year and 4th Year students look different? How about between those studying Business and those studying Technology?
#Histogram of Tweet Scores for year 1 and year 2 respectively
par(mfrow = c(1,2))
hist(tweet.score[student.year==2], main="Sentiment - Year 2", xlab="Tweet Score", col=symbol)

hist(tweet.score[student.year==4], main="Sentiment - Year 4", xlab="Tweet Score", col=symbol)

#Histogram of Tweets Scorces for Business and Technolgy Students respectively

hist(tweet.score[student.course_study=="Business"], main="Sentiment - Business", xlab="Tweet Score", col=symbol)

hist(tweet.score[student.course_study=="Technology"], main= "Sentiment - Technology", xlab="Tweet Score", col=symbol)