Terran’s relative racial advantage of being placed in the top 50 and top 100 on the ladders in Korea and the US is 1.6-1.9 times (or 60%-90%) greater than what be expected given race choice base rate differences in diamond ladders. This suggests that for some reason, regardless of differences in regional race preference, Terran is just shy of being twice as likely as the average of zerg and protoss to be in the top 50 or top 100.
However, and more interestingly, zerg and tosses representation in the top 50 and 100 differ CONSIDERABLY based on whether or not you look at Korea or NA ladders. In fact, they more than reverse their statistics. Whereas toss in NA is right around what would be expected in both top 50 and top 100. Zerg has a major relative disadvantage in NA, being around 40-50% less likely to be placed in the top 50/100 when compared to Korea where zerg has a 60-70% advantage of being placed in the top 50 or 100 controlling for how many people play each race. This is pretty interesting given that Korea plays zerg the least of any of the major regions. Toss’s disadvantage in korea (.57), is almost as great as zerg’s disadvantage in NA (.44).
Read long post below for more details, caveats, limitations, and practical implications.
Original Post:
Below is an analysis that I did using relative risk ratios to account for the fact that 1.) different numbers of people play different races, 2.) that different numbers of races are represented differently in different countries, and finally 3.) that different numbers of races are differentially represented in the top 50 and top 100 by region. I did this because a recent post by a blizzard employee addressed that there are balance differences across regions and should be considered in discussions of who or who is not OP/IMBA. Another poster posted a different analysis that was also interesting (http://www.teamliquid.net/forum/viewmessage.php?topic_id=144694), but didn’t really get at what I was curious about, and is confounded by the fact that win % is screwy because blizzard tries to force your win% to 50%. There are several decent posts on that thread btw, so check it out. I used a simple statistical procedure called relative risk analysis.
The procedure I used is modified slightly from most text book explanations to accommodate the question that I had. It is very simple and explained below. I have included a summary (above) for people that feel this is TLDR. For people that are interested, want to argue, correct me, or replicate after patches etc., I have explained why I used relative risk and how I used this statistic.
What is Relative Risk?
Relative Risk is a statistic that is used in psychology and medical science to identify risk (e.g. smokers are 20 times as likely as non smokers to develop cancer), disproportionate representation (e.g. African Americans are twice as likely to be identified as having an emotional disturbance in the pre-school public education system as their white counterparts), or athletic advantage (e.g. players are 1.5 times as likely to win against a particular team on their home field when compared to away games).
What is my question?
I was interested to see, controlling for differences in representation by region, what is the statistical likelihood that a given race would be represented in the top 50 and top 100 players in North America and Korea. Specifically, I hypothesized that Terran would be overrepresented in the top 50 and top 100, controlling for the differences in base rates of diamond play (i.e. controlling for the fact that terran/toss are overrepresented and zerg is underrepresented), and that zerg would be underrepresented in diamond controlling for the fact that not many people play zerg. I thought that these would be consistent across regions. Data suggests that I was right about the first hyp, and wrong about the second

Method & Results:
I calculated relative risk (hereafter called relative advantage) of diamond level players being represented in the top 100 and top 50, controlling for differences in the race representation of all diamond players. This allows me to answer the question “How each race does, controlling for the fact that some races (e.g. terran & protoss) will be overrepresented by the diamond base rate, affect probability of being represented in the top 100 and 50 players on the ladder”. I chose diamond as my base rate because diamond suggests there is at least some skill or knowledge of competitive play. I used the web page http://rts-sanctuary.com/ to get data.
To calculate a relative advantage I first find the percent of players playing a particular race in the diamond level within a region. So for example, a few days ago (when I did these analyses)in NA it is ~34% toss, 29%terran, and ~23% zerg. I’ll call this base rate. I then divide the diamond top 50&100 percentage by this base rate (so top50%/baserate%). In NA, the top 50 representation is ~36% toss,~48 terran, and 14%~zerg.
I can then calculate a ratio that will tell me whether or not a race is over or underrepresented in the top 50&100, based on what would be expected given their base rate. A value of 1 using this statistic tells me that the race is represented as would be expected given the races diamond level base rates. For example, in NA if I divide the top 50 zerg % (which was %14 when I did this) by the base rate of 23%, I get the number .6, which suggests that zerg is 40% less likely (or .4 times as likely) to be represented in the top 50 than what would be expected given their overall Diamond representation. Terran’s value of 1.6 suggests that this race is 1.6 times as likely (or 60% more likely) to be placed in the top 50 given their overall representation in diamond. (toss’s value of 1 indicates they are represented around what would be expected).
Finally, I can get a relative advantage statistic by dividing a target race by the comparison races. If I divide terran’s relative representation statistic by the AVERAGE zerg and toss statistic, I can get the relative advantage statistic I’m interested in. In NA, this value is 1.9, which suggests that compared to toss’ and zerg’s base rate, terran is almost twice as likely to be represented in the top 50 players. You could reverse this and put terran and protoss’ average in the denominator and zerg in the numerator and get the relative disadvantage for zerg, which would be .44.
I can do the same thing in Korea too. In Korea, terran’s relative advantage statistic is lower, but around the same magnitude—1.5. HOWEVER, if you look at zerg and toss you find something interesting: their relative representation statistic are practically reversed, with toss at around 50% less likely to be represented in the top 50, and zerg 60% MORE likely to be represented in the top 50. In fact, zergs relative advantage statistic is 1.7, which suggests they are 1.7 times as likely to be represented in top 50 controlling for base rates, than the average of Terran and protoss in Korea.
Despite these results being pretty interesting, I was worried that they may have been due to the fact that top 50 provides too small of a sample size to get stable estimates, so I re-did this in the top 100, which should provide more stable estimates. To do this my denominator will stay the same in the relative representation statistic, but my numerator will change to include the next 50 people. Top 100 race percentages in NA are .31, .45, and .22 for toss, terran, zerg. In korea they are .29, .42, and .27 for toss, terran, zerg. Doing the same statistics as above, a similar trend holds. Terran holds a slight advantage in both countries, whereas in NA (1.6X)this advantage is larger than in korea (1.3X). As in the top 50, zerg and toss’s representation is drastically different (and therefore their relative advantage/disadvantage statistic differ also), when comparing NA (.77) to Korea (1.25).
Limitations:
I’m unaware how accurate RTS sanctuary’s data is compared to what Blizzard has in their data base.
While these statistics are useful for examining representation, they do not provide the ability to make accurate causal inferences. I also didn’t calculate confidence intervals or p values, so generalizability is weakened. Granted these estimates will change frequently. We can simply say that, on average, across America and Korea, Terran is overrepresented in the top 100. We can also say that the relative advantage or disadvantage of zerg and protoss depends on the region that you are from, with NA’s placing relatively more toss in the top 50/100 and Korea placing a shockingly high amount of zerg in the top 50/100.
It is noteworthy that this is even more shocking given that Korea has the LOWEST base rate of diamond zerg players in all of the major regions (only bested by LA). The reason why these statistics are the way they are will be left to speculation/opinion/or anecdotal observation unless blizzard allows us to data mine other statistics, as there is no hard and fast way to “prove” why this is. Some reasons are probably more plausible than others, but I am prefer not to make these implications.
Practical Implications for NA Ladderers:
If you are a pragmatist and do not necessarily care why these statistics are the way they are, yet want practical implications for laddering, here are some:
1.) Play terran instead of the race you are playing
2.) If you are a zerg and do not want to play terran, watch Korean replays instead of American replays
3.) You will probably be better off watching Korean replays regardless of the race you play
