Back in August, I was kicking back watching some of the top Korean pros battling it out on the MLG Raleigh stream. It's such a pleasure to see SC2 becoming an international esport, and it warms my heart to see the competition between Koreans and foreigners at MLG. It warms my heart even more to see foreigners winning over Koreans - go HuK! However, as I watched throughout the final day of MLG Raleigh (a.k.a. battle of the Koreans), something seemed amiss. First, Puma seemed to be getting supply blocked multiple times in every game. Then, NaDa started to look sloppy; in one game he was floating 1600 minerals while on one base! Hurricane Irene was passing through the area, and the amount of lag was getting serious, so it seemed only natural to attribute these slips to the less than ideal playing environment.
However, even taking that into account, I was a little shocked. From regularly watching GSL I felt that the top Koreans usually macroed much better than this. And how could NaDa, the macro-God, ever be floating that much money? His macro skills are off-the-charts! But what "charts" am I talking about? How would one actually go about proving that a player is macroing better or worse than they usually are? And if one had a reliable way of assessing that, wouldn't it be interesting to see how different players and regions of the world compare. At the time I was stumped; there didn't seem to be any straightforward way of measuring how well a player was performing in this respect.
Fast forward a few weeks, and I had just completed a large study of macro across the North American ladder. One of the outcomes of this analysis was a scale called the Spending Quotient (SQ), which measures numerically how well a player performed in terms of keeping their unspent resources low relative to their income (resource collection rate). Of course, this was not a comprehensive measure of macro, but it seemed to be the type of metric I was looking for.
Many readers suggested that it would be interesting to use this scale to compare pros from different ladder regions, and to see just how insanely the world's best players score on this scale. So, that's exactly what I set about doing!
This time around, I was fortunate enough to have some help in collecting data. Kudos goes to Flew for his superb work collecting data for Europe.
To begin the analysis, we collected ladder data for top level players on the American ladder. First, the order of the top 10 players in Grandmaster league was recorded. For each of these players, we then viewed their last 30 ladder 1v1 games. For most pros, this covered a time period of days to weeks. From each game, the following data were collected:
• Average Unspent Resources
• Average Income (Resource Collection Rate)
We then computed the SQ for each of the 300 collected games, and calculated an average SQ score for each player.
This procedure was then repeated for the European ladder, and for other well-known players on both the American and European ladders who were not in the top 10 at the time of data collection.
We would very much like to extend this analysis to include players on the Korean, Chinese and South East Asian servers. If you are interested in contributing to this analysis and have an account on one of these servers, please PM me for more details. The names of contributors will be added to the article's header.
An important first question is whether SQ is even an appropriate scale for comparing top level players. Since SQ is only a measure of spending efficiency, it is quite possible that increased harassment results in lower SQ scores. Since higher level players are better at both macro and harassment, it is feasible that the two effects cancel out, meaning SQ can no longer distinguish differences in spending skill at the highest levels.
To investigate this, I tested whether the top 10 players on the NA ladder score significantly better SQs than the top 100 players. Previously, I had collected the last 3 games from each of the top 100 players on the American ladder, yielding a total of 300 games. Here, I compared the SQ scores for those 300 games to the 300 games collected from the top 10 players.
As can be seen above, the top 10 players score significantly higher SQ scores than the top 100 players on average (p<0.0001, two-tailed t-test). Particularly impressive is the fact that the gap between the top 10 (avg. SQ = 88) and the top 100 (avg. SQ = 82), is almost as large as the gap between the top 100 and masters (avg. SQ = 72). This finding is encouraging, since it shows that even at the top level, SQ continues to correlate with overall skill. Also interesting to note is the fact that there is smaller variation in SQ at higher skill levels; SQ standard deviation is 12 for masters, 11 for GM top 100, and 9 for GM top 10.
Plotting unspent resources against income also showed robust differences between the top 10 and top 100 players. Both groups followed an approximately exponential relationship between unspent resources and income, but with different levels of spending efficiency. Here, green corresponds to Masters, red corresponds to top 100, and blue corresponds to top 10.
Next, I tabulated the data from all the American accounts for which data were collected (including the top 10 and the other notable pros). The players were then ordered by average SQ score over their last 30 games.
Note that these are all extremely high average scores. However, it is interesting to note the amount of variability, even at the very highest levels of play.
The same analysis was performed for the European top 10, and other notable pros. The results are tabulated below, with players ordered by average SQ score over their last 30 games.
Coming soon.
Coming soon.
Coming soon.
Before getting into the analysis, there are numerous caveats that must be considered.
1) While SQ generally increases with skill, it is possible that different playstyles may lead to higher or lower SQ scores. And it may be that different playstyles are more common in particular regions. For example, let's suppose two base all-ins result in a higher SQ score, and two base all-ins are more common on a particular server. The average SQ score may be inflated for that region, despite the players not being inherently any better at spending. I have not found any correlations between SQ and game duration in any of the leagues, but that does not entirely rule out this possibility.
2) In my previous analysis, I found that each race scored equally well in terms of SQ at all levels of the ladder. However, it is possible that at the absolute highest level, the races run into different ceilings on spending. This would introduce a bias against certain players, and may even slightly affect the average SQ for Grandmaster league between regions if there are different racial compositions. I don't yet have enough data to test this hypothesis.
3) If SQ is indeed balanced for all races, it is nonetheless possible that different match ups could differentially affect SQ. For example, Protoss could score the same as the other races on average, but score worse in PvP and better in PvT and PvZ. This is an analysis that I've not yet performed, so I don't know whether this is the case or not.
4) It's important to remember that many of the top players in each region are frequently in other regions (e.g., Korea). Furthermore, players often play on multiple regions. This means the top 10 for a particular region is not necessarily representative of the depth of the skill pool in each region. Caution needs to be exercised in analyzing the results - they should not be used to conclude that players from region A are better than players from region B.
5) It's been said before, but I'll say it again: SQ is not everything. The Spending Quotient does not take any strategic factors into account. It simply measures how well a player spends their money relative to all other players at the same level of income. When averaged across many games, it has been shown to correlate well with overall skill, and it measures one of the core skills. But it does not directly measure anything else.
With those points all in mind, let's go ahead. To compare the spending performances for the top players between ladder regions, I plotted the distributions of SQ scores for the top 10 ladder-ranked players in each region.
As you can see, the distributions for the American and European servers are actually very similar. However, the average SQ is slightly higher for America than Europe (88 vs. 84), and the difference is statistically significant (p<0.0001m two-tailed t-test).
Two players appear on both the European and American servers. These are ToD (currently based in Europe) and EGDeMusliM (currently based in the US). These players provide a unique opportunity to test whether SQ is robust between servers, especially when playing on a non-local server (presumably with slightly higher lag).
In both cases, the players score very similarly on both servers. Neither difference is found to be statistically significant (p = 0.4 for EGDeMusliM; p = 0.9 for ToD). EGDeMusliM's standard deviation is a little larger for the American games than for the European games; this may just be a quirk of the 30 game sample.
Below is the leaderboard for all pro players that have been analyzed to date with an average SQ greater than 80. Colors correspond to regions, and some players appear on multiple servers.
EGDeMusliM, who appears near the top of this list on two servers, seems to be almost as good at spending as he is at breaking his arm.
Comparing SQ scores on the ladder is one thing. But ideally, one would want to compare pros playing in the exact same environment, such as a tournament. To address this, I also analyzed the SQ score for all top 8 players at MLG Orlando 2011. I did this by searching through the match histories of the players during the tournament - a different way of following the action live!
Below are the summary statistics for each player, presented as player cards.
This was a really nice opportunity to compare players from multiple regions, and it generated some interesting findings. Chiefly:
1) Pros are incredibly good at spending
Well, duh. But still, it's impressive to see just how good. Bomber scored 115 in one game at MLG. Try doing that! The ladder analysis turned up only two scores better than this: 116 by EGdeMusliM, and an astounding 121(!) by DroneKing (LiquidRet). For those interested, that last score corresponded to a Resource Collection Rate of 2239 and an Average Unspent Resources of just 653. Truly incredible efficiency.
2) Pros turn it up to 11 at tournaments
One might expect that pros macro even better in a tournament environment than on the ladder. This appears to be true, at least based on a comparison of IdrA's ladder and tournament performances:
The difference in means is statistically significant (p = 0.02, two-tailed t-test). It would be intriguing to extend this analysis to include other pros to see the gulf between their ladder and tournament performances. It may even be that nerves cause some players to macro worse at tournaments. In the case of IdrA, however, the ladder-Gracken doesn't seem to be at full capacity.
3) Protoss scored lower SQs at MLG
Among the top 8 players, Protoss scored notably lower than other races. While this is an extremely small sample of players, I found this surprising. Especially since this sample included two of the best Protoss players in the world (HuK and MC). Whether this down to their particular playstyles, skill sets, or some sort of systematic SQ bias against Protoss due to the race's mechanics will require additional data from other top Protoss players to test. It certainly seems possible for Protoss players to score extremely high SQs in individual games, but the highest average SQ observed for a Protoss player to date is 89 (ToD). If no higher SQs are found once the Korean ladder is analyzed, it may be necessary to adjust the way SQ is scored at the top level.
4) Spending ain't everything
Of the top 8 players, the lowest SQ belongs to the winning player, HuK! This shouldn't be so surprising, since there are clearly other factors that contribute to overall skill, and HuK is particularly good in the micro department. This doesn't mean SQ isn't a useful metric, but it does emphasize that it is not a comprehensive measure of skill - and we shouldn't expect it to be.
I hope you find this analysis interesting! I think it turned up a bunch of interesting findings, and again it confirms that SQ is a good metric for self-assessment. However, it also opens some interesting questions, including whether SQ is being fair to Protoss at the highest level. Since the production cycle works differently for different races, it is possible that the races run into different ceiling values of SQ at the very highest level. This doesn't seem to be an important factor for regular Grandmaster players down to Bronze, but for pros it may be important to consider.
I look forward to hearing feedback from the TL community. I'd be especially interested to hear whether people would be interested in seeing further analyses like this in the future, and if there are any suggestions. Finally, if anybody could help to collect data for the remaining servers, I would be very thankful.