Map stats: testing whether a map is balanced

Bill307

Canada9103 Posts

December 25 2005 08:58 GMT

With the unveilling of PGT's map stats (here: http://www.pgtour.net/ladder.stats.php ) I decided to try to answer the question: is map M balanced? To do this, I decided to employ the knowledge I learned in statistics last term. Note that some of this might be wrong, and I encourage any up-and-coming statisticians in the audience to correct me if I do something wrong by accident.

We all know that there is a certain amount of variance in the win %s on maps. For example, suppose map M is balanced for some matchup. If 4 games in this matchup are played on M, we expect 2 wins and 2 losses for each race. But we know there is a chance we will get 3-1 or even 4-0, just by luck. So when we see a 47.7% win ratio for PvZ (as is the case with Luna at the moment), we have to wonder: is it possible the map is perfectly balanced for PvZ, but by random luck zerg ended up winning more matches than protoss?

Unfortunately we don't know what exactly this random variance is. But we can estimate it using statistics.

(WARNING: DO NOT READ IF MATH IS BORING!)

Let's say a win = 1 and a loss = 0, and the probability of a win is "pwin". Also, let n = #wins + #losses. Then we have:

s^2 = [ (#wins)*(1 - pwin)^2 + (#losses)*(pwin)^2 ] / [ n - 1 ]

This is our estimate for the variance of the data.

Now, we want to test our hypothesis that the probability of a win "pwin" = 0.5, i.e. the map is perfectly balanced. To do this, we'll ask the question: what is the probability of getting a result that is more extreme than a 47.7% win ratio? Statistical theory tells us that we know the probability distribution of the following quantity:

#wins/n - pwin
---------------
sqrt(s^2 / n)

(it follows a Student's t distribution on n-1 degrees of freedom, for the curious)

This lets us calculate the probability we want.

(THIS PART EXPLAINS HOW TO CALCULATE RESULTS YOURSELF: SKIP TO THE BOTTOM IF YOU JUST WANT THE RESULTS I'VE ALREADY DONE!)

Now here's an actual explanation of HOW to do this with PGTour's map stats. You will need the statistical software "R" ( http://cran.r-project.org/bin/windows/base/ ). Actually any statistical software should do, but my instructions will use R.

Let's check whether PvZ is perfectly balanced on Luna The Final. I've done the calculations in Mathematica:

So in the end we have the number -1.02046 . Start up R and enter: "pt(-1.02046, 506)" where 506 = n - 1. This gives us the value 0.153999 which is the probability of geting a number even lower than -1.02046. This is also the probabilty of getting a win ratio even worse than 47.7%, which is precisely what we want.

Since 0.153999 is bigger than 0.1 and smaller than 0.9, by convention in statistics we say that we have "no evidence" against our hypothesis. In other words, it is entirely possible that Luna is perfectly balanced in PvZ, and that the win % 47.7 differs from 50% by pure luck and random variance.

Now let's try a more extreme example: PvZ on Rush Hour, which has a win ratio of 41.5% after 1522 PvZ matches. Again I've done the calculations in Mathematica:

This time we have -6.66229 . In R we write "pt(-6.66229, 1521)" to get the probability of getting an even worse win ratio than 41.5%: 1.878543 x 10^-11, a VERY small number. Basically, after 1522 games are played, we should NEVER see a win ratio of 41.5% if PvZ is perfectly balanced on the map.

Maybe the actual probability of winning PvZ is 45% rather than 50%? But doing the calcuations with pwin=0.45, we end up with the probability 0.002617247, which is still "strong evidence against the hypothesis" (in other words, the chance of winning PvZ is even less than that).

Let's try to find a best-case and worst-case scenarios for PvZ using this method

(there's a better way to do this, called finding a "confidence interval" for pwin, but I don't feel like doing it right now).

pwin = 43% : probability of more extreme result is 0.1113971
pwin = 40% : probability of more extreme result is 0.1243038
(R returns 0.8756962, but because 40% is lower than 41.5%, we're actually interested in the probability 1 - 0.8756962 = 0.1243038)

(RESULTS START HERE)

So in the best case, the probability of winning PvZ on Rush Hour is 43%, and in the worst case it is 40%. And Rush Hour is definitely NOT balanced for PvZ in general

. (but keep in mind, this data was collected from all of PGTour!)

And that's it. Now anyone (given enough motivation) can ask the question "is map M balanced for this matchup?" and answer it for themselves

.

I would like to try this method given some professional game map stats (e.g. from OGN) to evaluate whether certain maps are balanced on the professional level. But unfortunately I don't have any of these stats

. If anyone wants me to do these calculations based on some promap stats, feel free to post the stats here and I'll do it

. Also, if anyone wants to see confidence intervals for the probability of winning on different maps, I can do that too.

In closing, here are some more hypothesis tests for pwin = 0.5 (perfect balance) on several maps and in all matchups:

Legend:
matchup wins:losses (win%) - balanced? (probability of observing a more extreme win ratio)

Lost Temple 2.4
PvZ 113:128 (46.9%) - balanced (0.1679511)
PvT 286:276 (50.9%) - balanced (0.3367943)
TvZ 242:235 (50.7%) - balanced (0.3744892)

Luna The Final
PvZ 242:265 (47.7%) - balanced (0.153999)
PvT 471:347 (57.6%) - definitely NOT balanced (8.273895e-06)
TvZ 318:266 (54.5%) - probably not balanced (0.01598572)

Rush Hour 2.0
PvZ 631:891 (41.5%) - definitely NOT balanced (1.878543e-11)
PvT 1346:1171 (53.5%) - definitely NOT balanced (0.0002479879)
TvZ 1027: 981 (51.1%) - balanced (0.1524416)

(note that by "balanced" what I really mean is, "there is no evidence here to show that the map is not balanced")

Before you jump to conclusions, don't forget that
a) these stats are based on ALL PGTour games, and
b) these calculations tell us PvZ and PvT on Rush Hour are both definitely not 50/50, but PvZ is pretty badly imbalanced (41.5/58.5) whereas PvT isn't really that bad (46.5/53.5). This is why it might be nice to have confidence intervals instead: an interval like 53%-54% might be acceptable imbalance, wheras 40%-43% might be unacceptable.

Ghin

United States2391 Posts

December 25 2005 09:04 GMT

Why does this matter?

I tried to think of a nice way to say it but I couldn't.

Empyrean

16982 Posts

December 25 2005 09:12 GMT

I actually enjoyed reading that, something I probably shouldn't admit.

By the way, my math topic died a while ago. I had more problems :[

Bill307

Canada9103 Posts

December 25 2005 09:12 GMT

On December 25 2005 18:04 Ghin wrote:
Why does this matter?

I tried to think of a nice way to say it but I couldn't.

The statistics part is just for the mathematically-curious. The results are for the map-balance-curious. For example, now people will realize that the 47.7% win ratio for PvZ on Luna is NOT evidence to say that Luna is imbalanced PvZ.

Edit: Furthermore, OGN and MBCGame map stats can be particularly misleading because the # of games is so small. For example, if you saw a map's stats were 5:10 PvZ, you might conclude that the map must be imbalanced for PvZ. But actually, if we run 5:10 through this statistical method, we find that this is NOT evidence to show imbalance either. (although it would definitely affect my liquibets

)

Pat

Canada721 Posts

December 25 2005 09:45 GMT

I think that a minimum of like... 300 games by match should be needed to be able to conclude anything (or even more). With a 300 games per matchups, you can kinda ''assume'' that players's skills are displayed in a ''normal curve''.

ManaBlue

Canada10458 Posts

December 25 2005 09:50 GMT

Cool. Nice presentation Bill. I guess they really do teach you kids something down the road at UW.

gravity

Australia1847 Posts

December 25 2005 09:53 GMT

Even if an imbalance isn't statistically proven, that doesn't mean you can't see one anyway, when you combine the stats with more subjective analysis.

Pat

Canada721 Posts

December 25 2005 09:56 GMT

If you have enough games (300 + IMO) I think that these stats will be pretty accurate about balancing.

Liquid`Jinro

Sweden33719 Posts

December 25 2005 10:05 GMT

Blah, luna pvt is FAIRLY balanced.. It's just that it's a million times easier for protoss on lower levels -,-

hasuprotoss

United States4612 Posts

December 25 2005 10:08 GMT

#10

I guess using these stats could help show what makes a map balanced. However, there are a lot of things that go into these things. Like Luna's PvT imbalance could be caused by things that LT has/lacks. You can't point your finger at one thing. Is it just the cliff over the nat? The middle not being buildable? Maybe lack of islands? All of these things combine.

SoMuchBetter

Australia10606 Posts

December 25 2005 10:14 GMT

#11

are the stats from the previous season still available? we could make better assumptions if we had those

doodoohead101

United States2 Posts

December 25 2005 10:27 GMT

#12

The probabilities make sense. But you can use the exact distribution (binomial) and avoid the approximation using the t-distribution. For example, the 95% confidence interval for PvZ on LT is:
qbinom(c(.025, .975), 113 + 128, .5) = [105, 136]. So the 113 toss wins is within these bounds, we can't reject the "balanced" hypothesis here. (As a comparison, the exact p-value in this case is pbinom(113, 113 + 128, .5) = 18.359% ~~ Bill's 16.7%) As you concluded, Rush hour PvZ does not look balanced: the confidence interval is qbinom(c(.025, .975), 631 + 891, .5) = [723, 799]. 631 is far below the 95% lower bound of 723.

Pat: the point of these calculations is that they take sample size into account, so there is no need for rule of thumb numbers like "300+".

Bill307

Canada9103 Posts

December 25 2005 10:58 GMT

#13

Ooooh good, we DO have at least one person who's into stats here

. And he shows how unnecessary all that work was for determining whether or not we can argue that maps are imbalanced based on the stats

.

"qbinom(c(.025, .975), 113 + 128, .5)"

This didn't even cross my mind. I have failed at trying to be a statistician on TLnet

RowdierBob

Australia13004 Posts

December 25 2005 11:17 GMT

#14

Nice post Bill. Im shocked that PvT on Luna is so imba. Are you going to do this with other maps? It'd be interesting to see which are most/least balanced.

Bill307

Canada9103 Posts

December 25 2005 11:34 GMT

#15

On December 25 2005 20:17 RowdierBob wrote:
Nice post Bill. Im shocked that PvT on Luna is so imba. Are you going to do this with other maps? It'd be interesting to see which are most/least balanced.

Yes, I could. But I'll use doodoo's method since it's a lot simpler and gives better precision

Bill307

Canada9103 Posts

December 25 2005 12:13 GMT

#16

On December 25 2005 18:53 gravity wrote:
Even if an imbalance isn't statistically proven, that doesn't mean you can't see one anyway, when you combine the stats with more subjective analysis.

On December 25 2005 19:05 FrozenArbiter wrote:
Blah, luna pvt is FAIRLY balanced.. It's just that it's a million times easier for protoss on lower levels -,-

Agreed with both.

On December 25 2005 18:56 Pat wrote:
If you have enough games (300 + IMO) I think that these stats will be pretty accurate about balancing.

These statistical methods take the # of games into account. You don't need to have 300+ games to make a conclusion: 50-0 for example would be very damning for a map

.

Also, I think you'd find that with 300 games, the stats are actually not THAT accurate. For example, with stats as extreme as 133:167 (44.3%), the actual winning chance on the map could realistically be anywhere from 38.7% to 50%. Move up to 1000 games -- 443:667 (44.3%) -- and the actual winning chance can still lie anywhere between 41.3% and 47.4% (there is only a 5% chance that the actual winning chance is outside of this range, while anything inside the range is fair game). And there's a pretty big difference between 41vs59 and 47vs53 odds of winning.

LordOfDabu

United States394 Posts

December 25 2005 13:52 GMT

#17

It'd be nice if there was a way to restrict PGT's statistics for players of a certain level. I wouldn't think it'd be too difficult to implement, either. Since I believe most games on PGT are going to be at the lower levels (look how many people are in the D channels compared to the higher ones), a map that's imbalanced at higher levels of play may not be accurately presented here since it's balanced at lower levels (which are the majority of games played on it).

mitsy

United States1792 Posts

December 25 2005 13:59 GMT

#18

part 1

PGT - Rush Hour 2.0 [06] 9896 (3.3x more than luna)
PvZ 40.8/59.2
PvT 53.5/46.5
TvZ 51.7/48.3

PGT - Luna The Final [06] 2985 (1.5x more than lotem)
PvZ vs 302 326 48.1/51.9
PvT vs 553 432 56.1/43.9
TvZ vs 378 344 52.4/47.6

PGT - Lost Temple 2.4 [06] 1994 (2.1x more than r-point)
PvZ vs 135 162 45.5/54.5
PvT vs 358 349 50.6/49.4
TvZ vs 304 291 51.1/48.9

PGT - R - Point 1.0 [06] 931 (1.4x more than p2h)
PvZ vs 62 51 54.9/45.1
PvT vs 212 182 53.8/46.2
TvZ vs 88 76 53.7/46.3

PGT - Plains to Hill 2.1 [06] 684 (3.4x more than forte2)
PvZ vs 90 100 47.4/52.6
PvT vs 108 100 51.9/48.1
TvZ vs 65 68 48.9/51.1

it's safe to cutoff this far. this is the biggest drop until the "unplayed" maps.

PGT - Neo Forte 2.1 [06] 201 (1.4x more than rov)
PGT - Ride of Valkyries [06] 145 (1.3x more than gaia)
PGT - Gaia 1.0 [06] 111 (1.3x more than pa)
PGT - ParanoidAndroid1.0 [06] 87 (1.1x more than requiem)
PGT - Neo Requiem 2.0 [06] 78 (1.3x more than azalea)
PGT - Azalea 1.0 [06] 59 (1.4x more than azalea)
PGT - Forte 1.0 [06] 41 (1.4x more than nost)
PGT - Nostalgia 1.3 [06] 29 (1.3x more than bs)
PGT - Blade Storm 1.5 [06] 22 (1.6x more than estrella)
PGT - Estrella 1.0 [06] 14 (1.3x more than 815)
PGT - Sin 815 2.0 [06] 11 (1.4x more than cult)
PGT - Cultivation Period [06] 8 (1.3x more than namja)
PGT - Namja Iyagi [06] 6 (1.2x more than hunters)
PGT - The Hunters-Gamei [06] 5 (2.5x more than emnity)
PGT - Enmity 1.1 [06] 2 (2x more than usan)
PGT - Usan Nation [06] 1 (n/a)

part 2

least balanced to most
59.2 ZvP Rush Hour 2.0
56.1 PvT Luna The Final
54.9 PvZ R - Point 1.0
54.5 ZvP Lost Temple 2.4
53.8 PvT R - Point 1.0
53.7 TvZ R - Point 1.0
53.5 PvT Rush Hour 2.0
52.6 ZvP Plains to Hill 2.1
52.4 TvZ Luna The Final
51.9 ZvP Luna The Final
51.9 PvT Plains to Hill 2.1
51.7 TvZ Rush Hour 2.0
51.1 TvZ Lost Temple 2.4
51.1 ZvT Plains to Hill 2.1
50.6 PvT Lost Temple 2.4

most balanced to least
49.4 TvP Lost Temple 2.4
48.9 ZvT Lost Temple 2.4
48.9 TvZ Plains to Hill 2.1
48.3 ZvT Rush Hour 2.0
48.1 PvZ Luna The Final
48.1 TvP Plains to Hill 2.1
47.6 ZvT Luna The Final
47.4 PvZ Plains to Hill 2.1
46.5 TvP Rush Hour 2.0
46.3 ZvT R - Point 1.0
46.2 TvP R - Point 1.0
45.5 PvZ Lost Temple 2.4
45.1 ZvP R - Point 1.0
43.9 TvP Luna The Final
40.8 PvZ Rush Hour 2.0

protoss, easiest to hardest
56.1 PvT Luna The Final
54.9 PvZ R - Point 1.0
53.8 PvT R - Point 1.0
53.5 PvT Rush Hour 2.0
51.9 PvT Plains to Hill 2.1
50.6 PvT Lost Temple 2.4
48.1 PvZ Luna The Final
47.4 PvZ Plains to Hill 2.1
45.5 PvZ Lost Temple 2.4
40.8 PvZ Rush Hour 2.0

zerg, easiest to hardest
59.2 ZvP Rush Hour 2.0
54.5 ZvP Lost Temple 2.4
51.9 ZvP Luna The Final
51.1 ZvT Plains to Hill 2.1
52.6 ZvP Plains to Hill 2.1
48.9 ZvT Lost Temple 2.4
48.3 ZvT Rush Hour 2.0
47.6 ZvT Luna The Final
46.3 ZvT R - Point 1.0
45.1 ZvP R - Point 1.0

terran, easiest to hardest
53.7 TvZ R - Point 1.0
52.4 TvZ Luna The Final
51.7 TvZ Rush Hour 2.0
51.1 TvZ Lost Temple 2.4
49.4 TvP Lost Temple 2.4
48.9 TvZ Plains to Hill 2.1
48.1 TvP Plains to Hill 2.1
46.5 TvP Rush Hour 2.0
46.2 TvP R - Point 1.0
43.9 TvP Luna The Final

maps summary

Plains to Hill 2.1 (bad zvp) "most balanced?"
-3rd hardest match for protoss (pvz)

Lost Temple 2.4 (bad zvp) "2nd most balanced?"
-2nd hardest match for protoss (pvz)
-2nd easiest match for zerg (zvp)

Luna The Final (bad tvp, tvz, and zvp) "3rd most balanced?"
-2nd worst balance matchup of all matchups on all these maps (pvt)
-easiest match for protoss (pvt)
-2nd easiest match for terran (tvz)
-hardest match for terran (tvp)
-3rd easiest match for zerg (zvp)
-3rd hardest match for zerg (zvt)

Rush Hour 2.0 (bad zvp, tvz, and tvp) "4th most balanced?"
-worst balance matchup of all matchups on all these maps (zvp)
-hardest match for protoss (pvz)
-easiest match for zerg (zvp)
-3rd easiest match for terran (tvz)
-3rd hardest match for terran (tvp)

R - Point 1.0 (bad tvp, zvp, and tvz) "5th most balanced?"
-2nd easiest match for protoss (pvt)
-3rd easiest match for protoss (pvz)
-2nd hardest match for zerg (zvt)
-hardest match for zerg (zvp)
-easiest match for terran (tvz)
-2nd hardest match for terran (tvp)
-3rd worst matchup (pvz)