Now that we've got all the data in one place and in a common format, we can start having fun. I've been working on network problems recently for my research, so I thought I'd whip up a few graphs.
Here's a quick graph of the game network for the Homestory Cup #3, just to make sure everything's working before we look at the whole dataset. Here the nodes represent players, and each line connecting two players represents one game.
Everything looks good, nothing out of the ordinary. You can see the groups of 4 pretty clearly, with the players who advanced out of group stages in the middle.
Now let's try something more difficult. I thought it might be really cool to look at the whole game network of the TLPD. However, there are a huge number of players who have played only a handful of games, and so plotting all the players becomes overwhelming. Instead, I've limited the sample to a k-core or a k-degenerate graph, which is basically the largest subset of players who have played games against at least k other opponents. I tried out a bunch of values, but k=12 seemed to work the best (you're welcome to experiment on your own). The nodes are represented by the player name, which is colored to represent their most played race:
(edit: this graph originally included redundant games between pairs of players. I've removed the duplicates, which biased the network against the Koreans, who don't have as many games on record)
This graph shows quite clearly the three different scenes, the Koreans on the right, Europeans on bottom-left, and Americans on bottom-right. Even though these three are fairly distinct, there's a lot of cross-over, especially for players in the middle.
That's all I've got for today, getting the data into R took longer than I thought. For my next analysis, I'm going to be working on a new algorithm to rank players based on their performance and the difficulty of their opponents that might replace ELO. I've almost finished working out the math, but my plan is to use Metropolis-Hastings to maximize the likelihood of the realized game outcomes based on the players skill levels. I'm still a little new to Bayesian methods, so ideas/comments would be much appreciated!
PS: Here's the R code I used to consolidate the data and generate the network graphs.
+ Show Spoiler [R Code] +
rm(list=ls())
options("stringsAsFactors"=FALSE)
setwd(dir)
int <- read.csv("tlpd_international.csv")
int <- int[ ,1:12]
int$edition <- "International"
kor <- read.csv("tlpd_korean.csv")
kor <- kor[ ,1:12]
kor$edition <- "Korean"
beta <- read.csv("tlpd_beta.csv")
beta <- beta[ ,1:12]
beta$edition <- "Beta"
tlpd <- rbind(int,kor,beta)
write.csv(tlpd, file = "tlpd.csv", append = FALSE)
library(igraph)
getRaceColors <- function(players,tlpd){
races <- c()
for (player in players){
games <- c(tlpd[tlpd$Winner==player,8],tlpd[tlpd$Loser==player,11])
t <- length(games[games=="T"])
z <- length(games[games=="Z"])
p <- length(games[games=="P"])
if (t > z){
races <- c(races,"#00005d")
}
else if (z > p){
races <- c(races,"#890000")
}
else {
races <- c(races,"#006e2f")
}
}
return(races)
}
event = "2011 Homestory Cup #3"
hsc = tlpd[tlpd$Tournament == event,]
hsccondensed = hsc[ ,c(7,10)]
hscgraph <- graph.data.frame(hsccondensed,directed=FALSE)
plot(hscgraph,vertex.label=unique(c(hsc$Winner,hsc$Loser)),layout=layout.kamada.kawai(hscgraph),
vertex.size=20,vertex.label.cex=0.7,main="Homestory Cup #3 Game Network")
tlpdcondensed = tlpd[ ,c(7,10)]
tlpdgraph <- graph.data.frame(tlpdcondensed,directed=FALSE)
V(tlpdgraph)$label <- unique(c(tlpd$Winner,tlpd$Loser))
V(tlpdgraph)$size <- 0
V(tlpdgraph)$label.cex <- 0.75
cores <- graph.coreness(tlpdgraph)
tlpdgraph2 <- subgraph(tlpdgraph,as.vector(which(cores>30))-1)
V(tlpdgraph2)$label.color <- getRaceColors(V(tlpdgraph2)$label,tlpd)
plot(tlpdgraph2,layout=layout.fruchterman.reingold(tlpdgraph2),main="TLPD Game Network",
sub="Only players with 30+ games against top opponents are shown (30-core)",margin=c(-1,-1,-1,-1))
options("stringsAsFactors"=FALSE)
setwd(dir)
int <- read.csv("tlpd_international.csv")
int <- int[ ,1:12]
int$edition <- "International"
kor <- read.csv("tlpd_korean.csv")
kor <- kor[ ,1:12]
kor$edition <- "Korean"
beta <- read.csv("tlpd_beta.csv")
beta <- beta[ ,1:12]
beta$edition <- "Beta"
tlpd <- rbind(int,kor,beta)
write.csv(tlpd, file = "tlpd.csv", append = FALSE)
library(igraph)
getRaceColors <- function(players,tlpd){
races <- c()
for (player in players){
games <- c(tlpd[tlpd$Winner==player,8],tlpd[tlpd$Loser==player,11])
t <- length(games[games=="T"])
z <- length(games[games=="Z"])
p <- length(games[games=="P"])
if (t > z){
races <- c(races,"#00005d")
}
else if (z > p){
races <- c(races,"#890000")
}
else {
races <- c(races,"#006e2f")
}
}
return(races)
}
event = "2011 Homestory Cup #3"
hsc = tlpd[tlpd$Tournament == event,]
hsccondensed = hsc[ ,c(7,10)]
hscgraph <- graph.data.frame(hsccondensed,directed=FALSE)
plot(hscgraph,vertex.label=unique(c(hsc$Winner,hsc$Loser)),layout=layout.kamada.kawai(hscgraph),
vertex.size=20,vertex.label.cex=0.7,main="Homestory Cup #3 Game Network")
tlpdcondensed = tlpd[ ,c(7,10)]
tlpdgraph <- graph.data.frame(tlpdcondensed,directed=FALSE)
V(tlpdgraph)$label <- unique(c(tlpd$Winner,tlpd$Loser))
V(tlpdgraph)$size <- 0
V(tlpdgraph)$label.cex <- 0.75
cores <- graph.coreness(tlpdgraph)
tlpdgraph2 <- subgraph(tlpdgraph,as.vector(which(cores>30))-1)
V(tlpdgraph2)$label.color <- getRaceColors(V(tlpdgraph2)$label,tlpd)
plot(tlpdgraph2,layout=layout.fruchterman.reingold(tlpdgraph2),main="TLPD Game Network",
sub="Only players with 30+ games against top opponents are shown (30-core)",margin=c(-1,-1,-1,-1))