|
I like competition statistics. Such statistics make a competitive scene that much more interesting, and a very simple opportunity exists for tournaments to offer another layer of map statistics that make their presentation of map balance stats more informative.
Watching the Korean Weekly, I often see the map match-up statistics. Sometimes, however, the sample size is so small it's hard to judge how "balanced" the map is. While not perfect, a simple Chi-Square Test can assign these numbers a p-value, and is a calculation that excel is completely capable of.
The Chi-square test compares observed numbers to expected numbers, accounts for sample size, and provides a p-value that is a reflection of how likely the results would be if given the expected probability ratios were to hold up over the long term
http://en.wikipedia.org/wiki/Chi-squared_test
excel: =CHISQ.TEST([observed array],[expected array])
Here are a few sample numbers for which calculated p-values for using Excel's Chi-Square test
For reference, the usual convention is for p<.05 => statistically significant => map is "imbalanced"
12 v 6 => p = 0.157 => not imbalanced 5 v 1 => p = 0.10 => not imbalanced 42 v 60 => p = .075 => not necessarily imbalanced 88 v 90 => p = .88 => not imbalanced 120 v 90 => p = .038 => imbalanced
Ladder statistics, because of the elaborate matchmaking system, would not be 100% appropriate to digest this way. However, it is likely appropriate given players within the highest level of play that are randomly matched up. If they already have the numbers, they're probably in a pre-formatted excel file. A Chi-square test should be an easy-to-implement piece of data that makes the stats more valuable and interesting.
|
While I applaud the effort to bring some rigor to statistical analyses of balance, let us remember a few things:
1) The value of p = 0.05 for significance is for a single analysis. Once we start making multiple comparisons, we need to adjust the p-value appropriately. Even if the null hypothesis of perfect balance is true, if we look at statistics for 20 maps, we would expect 1 map to yield p < 0.05 by pure chance.
2) The value of p = 0.05 is arbitrary. The cut-off value for any statistical analysis should be chosen with two things in mind:
(i) How important is it to find a difference? In other words, are we more worried about type I errors (claiming the game is imbalanced when it is really balanced) or type II errors (claiming the game is balanced when it is really imbalanced)?
(ii) What is the Bayesian prior probability? Based on all the evidence to date, how likely is our null hypothesis that the game is balanced? My interpretation is that the game is quite balanced (i.e., the probability of the null hypothesis being true is pretty high), in which case I need to see a very low p-value to make me reconsider that position.
3) The p-value is not the probability that the null hypothesis is true (i.e., that the game is balanced). It is the probability that we would observe a given result if the null hypothesis were true. This is why the Bayesian prior is important; without it, the p-value alone is meaningless.
|
The limits of the p value are definitely worth noting. There may be a way for the p-values to be used in a way that makes more sense.
Instead of using the p-value to declare "this map is balanced" or "this map is imba," we could compare map to map and be able to say "this map is more likely to be balanced." The p-value wouldn't be ideal for arguing "Map A is more/less balanced than Map B," but I think that framing in terms of "more likely to be balanced" becomes both interesting and more useful.
|
|
|
|