With the unveilling of PGT's map stats (here: http://www.pgtour.net/ladder.stats.php ) I decided to try to answer the question: is map M balanced? To do this, I decided to employ the knowledge I learned in statistics last term. Note that some of this might be wrong, and I encourage any up-and-coming statisticians in the audience to correct me if I do something wrong by accident.
We all know that there is a certain amount of variance in the win %s on maps. For example, suppose map M is balanced for some matchup. If 4 games in this matchup are played on M, we expect 2 wins and 2 losses for each race. But we know there is a chance we will get 3-1 or even 4-0, just by luck. So when we see a 47.7% win ratio for PvZ (as is the case with Luna at the moment), we have to wonder: is it possible the map is perfectly balanced for PvZ, but by random luck zerg ended up winning more matches than protoss?
Unfortunately we don't know what exactly this random variance is. But we can estimate it using statistics.
(WARNING: DO NOT READ IF MATH IS BORING!)
Let's say a win = 1 and a loss = 0, and the probability of a win is "pwin". Also, let n = #wins + #losses. Then we have:
This is our estimate for the variance of the data.
Now, we want to test our hypothesis that the probability of a win "pwin" = 0.5, i.e. the map is perfectly balanced. To do this, we'll ask the question: what is the probability of getting a result that is more extreme than a 47.7% win ratio? Statistical theory tells us that we know the probability distribution of the following quantity:
#wins/n - pwin --------------- sqrt(s^2 / n)
(it follows a Student's t distribution on n-1 degrees of freedom, for the curious)
This lets us calculate the probability we want.
(THIS PART EXPLAINS HOW TO CALCULATE RESULTS YOURSELF: SKIP TO THE BOTTOM IF YOU JUST WANT THE RESULTS I'VE ALREADY DONE!)
Now here's an actual explanation of HOW to do this with PGTour's map stats. You will need the statistical software "R" ( http://cran.r-project.org/bin/windows/base/ ). Actually any statistical software should do, but my instructions will use R.
Let's check whether PvZ is perfectly balanced on Luna The Final. I've done the calculations in Mathematica:
So in the end we have the number -1.02046 . Start up R and enter: "pt(-1.02046, 506)" where 506 = n - 1. This gives us the value 0.153999 which is the probability of geting a number even lower than -1.02046. This is also the probabilty of getting a win ratio even worse than 47.7%, which is precisely what we want.
Since 0.153999 is bigger than 0.1 and smaller than 0.9, by convention in statistics we say that we have "no evidence" against our hypothesis. In other words, it is entirely possible that Luna is perfectly balanced in PvZ, and that the win % 47.7 differs from 50% by pure luck and random variance.
Now let's try a more extreme example: PvZ on Rush Hour, which has a win ratio of 41.5% after 1522 PvZ matches. Again I've done the calculations in Mathematica:
This time we have -6.66229 . In R we write "pt(-6.66229, 1521)" to get the probability of getting an even worse win ratio than 41.5%: 1.878543 x 10^-11, a VERY small number. Basically, after 1522 games are played, we should NEVER see a win ratio of 41.5% if PvZ is perfectly balanced on the map.
Maybe the actual probability of winning PvZ is 45% rather than 50%? But doing the calcuations with pwin=0.45, we end up with the probability 0.002617247, which is still "strong evidence against the hypothesis" (in other words, the chance of winning PvZ is even less than that).
Let's try to find a best-case and worst-case scenarios for PvZ using this method (there's a better way to do this, called finding a "confidence interval" for pwin, but I don't feel like doing it right now).
pwin = 43% : probability of more extreme result is 0.1113971 pwin = 40% : probability of more extreme result is 0.1243038 (R returns 0.8756962, but because 40% is lower than 41.5%, we're actually interested in the probability 1 - 0.8756962 = 0.1243038)
(RESULTS START HERE)
So in the best case, the probability of winning PvZ on Rush Hour is 43%, and in the worst case it is 40%. And Rush Hour is definitely NOT balanced for PvZ in general . (but keep in mind, this data was collected from all of PGTour!)
And that's it. Now anyone (given enough motivation) can ask the question "is map M balanced for this matchup?" and answer it for themselves .
I would like to try this method given some professional game map stats (e.g. from OGN) to evaluate whether certain maps are balanced on the professional level. But unfortunately I don't have any of these stats . If anyone wants me to do these calculations based on some promap stats, feel free to post the stats here and I'll do it . Also, if anyone wants to see confidence intervals for the probability of winning on different maps, I can do that too.
In closing, here are some more hypothesis tests for pwin = 0.5 (perfect balance) on several maps and in all matchups:
Legend: matchup wins:losses (win%) - balanced? (probability of observing a more extreme win ratio)
Luna The Final PvZ 242:265 (47.7%) - balanced (0.153999) PvT 471:347 (57.6%) - definitely NOT balanced (8.273895e-06) TvZ 318:266 (54.5%) - probably not balanced (0.01598572)
(note that by "balanced" what I really mean is, "there is no evidence here to show that the map is not balanced")
Before you jump to conclusions, don't forget that a) these stats are based on ALL PGTour games, and b) these calculations tell us PvZ and PvT on Rush Hour are both definitely not 50/50, but PvZ is pretty badly imbalanced (41.5/58.5) whereas PvT isn't really that bad (46.5/53.5). This is why it might be nice to have confidence intervals instead: an interval like 53%-54% might be acceptable imbalance, wheras 40%-43% might be unacceptable.
On December 25 2005 18:04 Ghin wrote: Why does this matter?
I tried to think of a nice way to say it but I couldn't.
The statistics part is just for the mathematically-curious. The results are for the map-balance-curious. For example, now people will realize that the 47.7% win ratio for PvZ on Luna is NOT evidence to say that Luna is imbalanced PvZ.
Edit: Furthermore, OGN and MBCGame map stats can be particularly misleading because the # of games is so small. For example, if you saw a map's stats were 5:10 PvZ, you might conclude that the map must be imbalanced for PvZ. But actually, if we run 5:10 through this statistical method, we find that this is NOT evidence to show imbalance either. (although it would definitely affect my liquibets )
I think that a minimum of like... 300 games by match should be needed to be able to conclude anything (or even more). With a 300 games per matchups, you can kinda ''assume'' that players's skills are displayed in a ''normal curve''.
I guess using these stats could help show what makes a map balanced. However, there are a lot of things that go into these things. Like Luna's PvT imbalance could be caused by things that LT has/lacks. You can't point your finger at one thing. Is it just the cliff over the nat? The middle not being buildable? Maybe lack of islands? All of these things combine.
The probabilities make sense. But you can use the exact distribution (binomial) and avoid the approximation using the t-distribution. For example, the 95% confidence interval for PvZ on LT is: qbinom(c(.025, .975), 113 + 128, .5) = [105, 136]. So the 113 toss wins is within these bounds, we can't reject the "balanced" hypothesis here. (As a comparison, the exact p-value in this case is pbinom(113, 113 + 128, .5) = 18.359% ~~ Bill's 16.7%) As you concluded, Rush hour PvZ does not look balanced: the confidence interval is qbinom(c(.025, .975), 631 + 891, .5) = [723, 799]. 631 is far below the 95% lower bound of 723.
Pat: the point of these calculations is that they take sample size into account, so there is no need for rule of thumb numbers like "300+".
Ooooh good, we DO have at least one person who's into stats here . And he shows how unnecessary all that work was for determining whether or not we can argue that maps are imbalanced based on the stats .
"qbinom(c(.025, .975), 113 + 128, .5)"
This didn't even cross my mind. I have failed at trying to be a statistician on TLnet .
Nice post Bill. Im shocked that PvT on Luna is so imba. Are you going to do this with other maps? It'd be interesting to see which are most/least balanced.
On December 25 2005 20:17 RowdierBob wrote: Nice post Bill. Im shocked that PvT on Luna is so imba. Are you going to do this with other maps? It'd be interesting to see which are most/least balanced.
Yes, I could. But I'll use doodoo's method since it's a lot simpler and gives better precision .
On December 25 2005 18:53 gravity wrote: Even if an imbalance isn't statistically proven, that doesn't mean you can't see one anyway, when you combine the stats with more subjective analysis.
On December 25 2005 19:05 FrozenArbiter wrote: Blah, luna pvt is FAIRLY balanced.. It's just that it's a million times easier for protoss on lower levels -,-
Agreed with both.
On December 25 2005 18:56 Pat wrote: If you have enough games (300 + IMO) I think that these stats will be pretty accurate about balancing.
These statistical methods take the # of games into account. You don't need to have 300+ games to make a conclusion: 50-0 for example would be very damning for a map .
Also, I think you'd find that with 300 games, the stats are actually not THAT accurate. For example, with stats as extreme as 133:167 (44.3%), the actual winning chance on the map could realistically be anywhere from 38.7% to 50%. Move up to 1000 games -- 443:667 (44.3%) -- and the actual winning chance can still lie anywhere between 41.3% and 47.4% (there is only a 5% chance that the actual winning chance is outside of this range, while anything inside the range is fair game). And there's a pretty big difference between 41vs59 and 47vs53 odds of winning.
It'd be nice if there was a way to restrict PGT's statistics for players of a certain level. I wouldn't think it'd be too difficult to implement, either. Since I believe most games on PGT are going to be at the lower levels (look how many people are in the D channels compared to the higher ones), a map that's imbalanced at higher levels of play may not be accurately presented here since it's balanced at lower levels (which are the majority of games played on it).
PGT - Rush Hour 2.0 [06] 9896 (3.3x more than luna) PvZ 40.8/59.2 PvT 53.5/46.5 TvZ 51.7/48.3
PGT - Luna The Final [06] 2985 (1.5x more than lotem) PvZ vs 302 326 48.1/51.9 PvT vs 553 432 56.1/43.9 TvZ vs 378 344 52.4/47.6
PGT - Lost Temple 2.4 [06] 1994 (2.1x more than r-point) PvZ vs 135 162 45.5/54.5 PvT vs 358 349 50.6/49.4 TvZ vs 304 291 51.1/48.9
PGT - R - Point 1.0 [06] 931 (1.4x more than p2h) PvZ vs 62 51 54.9/45.1 PvT vs 212 182 53.8/46.2 TvZ vs 88 76 53.7/46.3
PGT - Plains to Hill 2.1 [06] 684 (3.4x more than forte2) PvZ vs 90 100 47.4/52.6 PvT vs 108 100 51.9/48.1 TvZ vs 65 68 48.9/51.1
it's safe to cutoff this far. this is the biggest drop until the "unplayed" maps.
PGT - Neo Forte 2.1 [06] 201 (1.4x more than rov) PGT - Ride of Valkyries [06] 145 (1.3x more than gaia) PGT - Gaia 1.0 [06] 111 (1.3x more than pa) PGT - ParanoidAndroid1.0 [06] 87 (1.1x more than requiem) PGT - Neo Requiem 2.0 [06] 78 (1.3x more than azalea) PGT - Azalea 1.0 [06] 59 (1.4x more than azalea) PGT - Forte 1.0 [06] 41 (1.4x more than nost) PGT - Nostalgia 1.3 [06] 29 (1.3x more than bs) PGT - Blade Storm 1.5 [06] 22 (1.6x more than estrella) PGT - Estrella 1.0 [06] 14 (1.3x more than 815) PGT - Sin 815 2.0 [06] 11 (1.4x more than cult) PGT - Cultivation Period [06] 8 (1.3x more than namja) PGT - Namja Iyagi [06] 6 (1.2x more than hunters) PGT - The Hunters-Gamei [06] 5 (2.5x more than emnity) PGT - Enmity 1.1 [06] 2 (2x more than usan) PGT - Usan Nation [06] 1 (n/a)
part 2
least balanced to most 59.2 ZvP Rush Hour 2.0 56.1 PvT Luna The Final 54.9 PvZ R - Point 1.0 54.5 ZvP Lost Temple 2.4 53.8 PvT R - Point 1.0 53.7 TvZ R - Point 1.0 53.5 PvT Rush Hour 2.0 52.6 ZvP Plains to Hill 2.1 52.4 TvZ Luna The Final 51.9 ZvP Luna The Final 51.9 PvT Plains to Hill 2.1 51.7 TvZ Rush Hour 2.0 51.1 TvZ Lost Temple 2.4 51.1 ZvT Plains to Hill 2.1 50.6 PvT Lost Temple 2.4
most balanced to least 49.4 TvP Lost Temple 2.4 48.9 ZvT Lost Temple 2.4 48.9 TvZ Plains to Hill 2.1 48.3 ZvT Rush Hour 2.0 48.1 PvZ Luna The Final 48.1 TvP Plains to Hill 2.1 47.6 ZvT Luna The Final 47.4 PvZ Plains to Hill 2.1 46.5 TvP Rush Hour 2.0 46.3 ZvT R - Point 1.0 46.2 TvP R - Point 1.0 45.5 PvZ Lost Temple 2.4 45.1 ZvP R - Point 1.0 43.9 TvP Luna The Final 40.8 PvZ Rush Hour 2.0
protoss, easiest to hardest 56.1 PvT Luna The Final 54.9 PvZ R - Point 1.0 53.8 PvT R - Point 1.0 53.5 PvT Rush Hour 2.0 51.9 PvT Plains to Hill 2.1 50.6 PvT Lost Temple 2.4 48.1 PvZ Luna The Final 47.4 PvZ Plains to Hill 2.1 45.5 PvZ Lost Temple 2.4 40.8 PvZ Rush Hour 2.0
zerg, easiest to hardest 59.2 ZvP Rush Hour 2.0 54.5 ZvP Lost Temple 2.4 51.9 ZvP Luna The Final 51.1 ZvT Plains to Hill 2.1 52.6 ZvP Plains to Hill 2.1 48.9 ZvT Lost Temple 2.4 48.3 ZvT Rush Hour 2.0 47.6 ZvT Luna The Final 46.3 ZvT R - Point 1.0 45.1 ZvP R - Point 1.0
terran, easiest to hardest 53.7 TvZ R - Point 1.0 52.4 TvZ Luna The Final 51.7 TvZ Rush Hour 2.0 51.1 TvZ Lost Temple 2.4 49.4 TvP Lost Temple 2.4 48.9 TvZ Plains to Hill 2.1 48.1 TvP Plains to Hill 2.1 46.5 TvP Rush Hour 2.0 46.2 TvP R - Point 1.0 43.9 TvP Luna The Final
maps summary
Plains to Hill 2.1 (bad zvp) "most balanced?" -3rd hardest match for protoss (pvz)
Lost Temple 2.4 (bad zvp) "2nd most balanced?" -2nd hardest match for protoss (pvz) -2nd easiest match for zerg (zvp)
Luna The Final (bad tvp, tvz, and zvp) "3rd most balanced?" -2nd worst balance matchup of all matchups on all these maps (pvt) -easiest match for protoss (pvt) -2nd easiest match for terran (tvz) -hardest match for terran (tvp) -3rd easiest match for zerg (zvp) -3rd hardest match for zerg (zvt)
Rush Hour 2.0 (bad zvp, tvz, and tvp) "4th most balanced?" -worst balance matchup of all matchups on all these maps (zvp) -hardest match for protoss (pvz) -easiest match for zerg (zvp) -3rd easiest match for terran (tvz) -3rd hardest match for terran (tvp)
R - Point 1.0 (bad tvp, zvp, and tvz) "5th most balanced?" -2nd easiest match for protoss (pvt) -3rd easiest match for protoss (pvz) -2nd hardest match for zerg (zvt) -hardest match for zerg (zvp) -easiest match for terran (tvz) -2nd hardest match for terran (tvp) -3rd worst matchup (pvz)
what about the strength of the players playing each other? is that taken into account in any way? or did you just make the assumption, that this factor is of no importance when you have a great enough database?
edit: for example: when a player, who will reach A rank, bashes through the lower ranks, those games can hardly give any information about the balance on the played maps. imho, only the games in which both opponents are on a somewhat equal skill level produce usable statistical data.
PGT - Rush Hour 2.0 [06] 9896 (3.3x more than luna) PvZ 40.8/59.2 PvT 53.5/46.5 TvZ 51.7/48.3
PGT - Luna The Final [06] 2985 (1.5x more than lotem) PvZ vs 302 326 48.1/51.9 PvT vs 553 432 56.1/43.9 TvZ vs 378 344 52.4/47.6
PGT - Lost Temple 2.4 [06] 1994 (2.1x more than r-point) PvZ vs 135 162 45.5/54.5 PvT vs 358 349 50.6/49.4 TvZ vs 304 291 51.1/48.9
PGT - R - Point 1.0 [06] 931 (1.4x more than p2h) PvZ vs 62 51 54.9/45.1 PvT vs 212 182 53.8/46.2 TvZ vs 88 76 53.7/46.3
PGT - Plains to Hill 2.1 [06] 684 (3.4x more than forte2) PvZ vs 90 100 47.4/52.6 PvT vs 108 100 51.9/48.1 TvZ vs 65 68 48.9/51.1
Another way to measure the relative balance of the maps is to take the square-root of the sums of the squares of the imbalance relative to a 50:50 matchup. For example, Lost Temple would get a sqrt(4.5^2+0.6^2+1.1^2) = 4.67 "imbalance rating". I like this method because it is harsher on maps that have a single extremely imbalanced matchup.
For these 5 maps, the "imbalance ratings" are as follows (lower is better):
Plains to Hill 3.4 Lost Temple 4.67 Luna 6.82 R-Point 7.22 Rush Hour 10.0
These statistics largely argree with misty's, but they put Rush Hour at the bottom. They would also provide a relatively convenient/objective way to compare the relative balance of different maps.
On December 26 2005 04:21 Plexa wrote: notice that the general trend is that pvz on *most* maps z>p (excluding rpoint) and the reverse with t where p>t =/
For some reason people keep denying there is any PvZ imbalance though. Sure, it's possible to make a map where P>Z, but that doesn't matter if the vast majority of played maps favour Z. The PvT thing I think is more influenced by the lack of skills of the lower level players on PGTour compared to pros, since T is the most micro-intensive race, but there could be an issue there too (it's hard to get pro-level information, but at least at pro level it isn't common to say "oh no, that T player got a lot of P matches, he's screwed" compared to saying the same about P and Z).
On December 26 2005 04:21 Plexa wrote: notice that the general trend is that pvz on *most* maps z>p (excluding rpoint) and the reverse with t where p>t =/
For some reason people keep denying there is any PvZ imbalance though. Sure, it's possible to make a map where P>Z, but that doesn't matter if the vast majority of played maps favour Z. The PvT thing I think is more influenced by the lack of skills of the lower level players on PGTour compared to pros, since T is the most micro-intensive race, but there could be an issue there too (it's hard to get pro-level information, but at least at pro level it isn't common to say "oh no, that T player got a lot of P matches, he's screwed" compared to saying the same about P and Z).
No, the problem is that most P players are fucking TERRIBLE at PvZ -_- You might not have noticed, but as of late zergs have been losing tons to Ps (reach is the exception) at the professional level, and zergman even said he thinks the new generation of protoss players are a lot scarier PvZ.
PvZ is balanced, the only 'imbalance' I guess is that it takes a lot longer to learn than ZvP does.. Kind of like how you can learn PvT 10 times faster than you can learn TvP =.=
On December 26 2005 04:21 Plexa wrote: notice that the general trend is that pvz on *most* maps z>p (excluding rpoint) and the reverse with t where p>t =/
For some reason people keep denying there is any PvZ imbalance though. Sure, it's possible to make a map where P>Z, but that doesn't matter if the vast majority of played maps favour Z. The PvT thing I think is more influenced by the lack of skills of the lower level players on PGTour compared to pros, since T is the most micro-intensive race, but there could be an issue there too (it's hard to get pro-level information, but at least at pro level it isn't common to say "oh no, that T player got a lot of P matches, he's screwed" compared to saying the same about P and Z).
No, the problem is that most P players are fucking TERRIBLE at PvZ -_- You might not have noticed, but as of late zergs have been losing tons to Ps (reach is the exception) at the professional level, and zergman even said he thinks the new generation of protoss players are a lot scarier PvZ.
PvZ is balanced, the only 'imbalance' I guess is that it takes a lot longer to learn than ZvP does.. Kind of like how you can learn PvT 10 times faster than you can learn TvP =.=
Just because P have been doing a bit better recently doens't wipe out the historical record and suddenly make PvZ balanced on "normal" maps. Maybe if things stay this way for a long time and there isn't a wave of anti-Protoss Zergs the way July sparked a wave of anti-Terran zergs, or Zergs learn how to beat the newer Protoss style, or whatever.
For some reason people keep denying there is any TvP imbalance though. Sure, it's possible to make a map where T>P, but that doesn't matter if the vast majority of played maps favour P. The ZvP thing I think is more influenced by the lack of skills of the lower level players on PGTour compared to pros, since P is the more micro-intensive race, but there could be an issue there too
I would love to be able to restrict stats to only specific ranks or ranges of ranks . It would be really neat to compare imbalances at the top with imbalances at the bottom. It would also allow us to measure imbalance at D/C/B/A ranks seperately.
I don't think it'd be too much of an additional load on the server, unless the feature becomes really popular. In that case, maybe you could allow people to download a "database" of stats (updated every 24 hours), which would be less than 8 KB, and a simple VB program to search and sort the data to our heart's content . It could also be programmed to perform statistical analyses like the ones in this topic. e.g. you might see something like:
On December 26 2005 07:04 SoMuchBetter wrote: For some reason people keep denying there is any TvP imbalance though. Sure, it's possible to make a map where T>P, but that doesn't matter if the vast majority of played maps favour P. The ZvP thing I think is more influenced by the lack of skills of the lower level players on PGTour compared to pros, since P is the more micro-intensive race, but there could be an issue there too
Hurrr hurrr. The difference is that both the TvP balance is typically smaller at all levels than PvZ, and that the imbalance is more lessened at pro level. PvZ imbalance is a much bigger problem, even if there is still some small TvP imbalance. Again, it's not like people go "omg Oov has to fight 3 P's this OSL, he's fucked", but people (not everyone) *do* say "OMG Reach has to fight 3 Z's, he's fucked".
Does anyone have the records of various pro players divided up by matchup, by the way? As far as I can remember from seeing them in the past, no P does as well vs. Z and Oov does vs P, for example.
edit: also, just changing the races around doesn't make sense with that post. Since when is P the micro race? T is, and that's part of the reason why they do disproportionately badly at lower levels compared to pro and strong amateur levels. Besides, one imbalance isn't an excuse to ignore another (worse) one.
On December 26 2005 07:04 SoMuchBetter wrote: For some reason people keep denying there is any TvP imbalance though. Sure, it's possible to make a map where T>P, but that doesn't matter if the vast majority of played maps favour P. The ZvP thing I think is more influenced by the lack of skills of the lower level players on PGTour compared to pros, since P is the more micro-intensive race, but there could be an issue there too
Hurrr hurrr. The difference is that both the TvP balance is typically smaller at all levels than PvZ, and that the imbalance is more lessened at pro level. PvZ imbalance is a much bigger problem, even if there is still some small TvP imbalance. Again, it's not like people go "omg Oov has to fight 3 P's this OSL, he's fucked", but people (not everyone) *do* say "OMG Reach has to fight 3 Z's, he's fucked".
Does anyone have the records of various pro players divided up by matchup, by the way? As far as I can remember from seeing them in the past, no P does as well vs. Z and Oov does vs P, for example.
edit: also, just changing the races around doesn't make sense with that post. Since when is P the micro race? T is, and that's part of the reason why they do disproportionately badly at lower levels compared to pro and strong amateur levels. Besides, one imbalance isn't an excuse to ignore another (worse) one.
if oov had to face 3 p's in an osl, i'd vote for him in liquibet. P not the micro race on pvz? excuse me? are we playing the same game?
I think that imbalance is mostly seen in ground PvZ(mostly because of ultra+ling) and air ZvP(sair+revear). Rebalancing units may be hard mostly because the game is Professionally played. Some little changes may have some unwanted effects, resulting in one-race domination. Only way to try out new changes is to make a new bw server(with ladder), where these balance issue would be practically tested for some time before taken into professional play. Maybe adding a particular unit for each race could fix some major problems(like some what of anti air unit for zerg on lair level). But why adding new units if certain units/upgrades/abilities are almost never used?(in any matchup) Scouts, Valkyries, Dark archons, Ghosts, Queens could be boosted in some way to regain usability and develop new strategies. Also it somehow irritates me that some units are overused in particular matchups lately - like vessels irradiate and defilers swarm in TvZ.
With all these statistics, you are also making the assumption that the average p, z and t pgt player is of the same skill level. For example, newer players to the game may choose a certain race slightly over another, and therefore, by losing more often, give the impression that such a race wins a lower percentage of games than another race, giving the impression of an imbalance. Therefore, say PvT on map M has 60% win for P, it does not mean that when two equally skilled players, one p and one t play on map M, that P will win 60% of the games.
On December 26 2005 11:20 no_re wrote: With all these statistics, you are also making the assumption that the average p, z and t pgt player is of the same skill level. For example, newer players to the game may choose a certain race slightly over another, and therefore, by losing more often, give the impression that such a race wins a lower percentage of games than another race, giving the impression of an imbalance. Therefore, say PvT on map M has 60% win for P, it does not mean that when two equally skilled players, one p and one t play on map M, that P will win 60% of the games.
There's no reason to assume that the skills are *not* equal over a large number of players, unless you have any real (ie not wild speculation) evidence that they aren't. Also, if good players *do* gravitate more to one race, it's also probably because that race is the strongest, so that only makes things worse.
hey, i don't think you can set pwins to .5 except in mirror matchups. to evaluate the empirical fairness of a map you must get the map-neutral winning percentage of each MU.
On December 26 2005 11:20 no_re wrote: With all these statistics, you are also making the assumption that the average p, z and t pgt player is of the same skill level. *snip*
There's no reason to assume that the skills are *not* equal over a large number of players, unless you have any real (ie not wild speculation) evidence that they aren't. Also, if good players *do* gravitate more to one race, it's also probably because that race is the strongest, so that only makes things worse.
and what reason is there to assume that the skills ARE equal over a large number of players? there is no evidence for that too, so it's merely a wild speculation.
there are several things in the game which could be imbalanced, the most important ones being: skill of the players, strength of the race and the maps.
if you want to draw conclusions about the possible imbalance of maps out of pgt statistics, you can't do this without taking the other potential imbalances into account as well.
On December 26 2005 11:20 no_re wrote: With all these statistics, you are also making the assumption that the average p, z and t pgt player is of the same skill level. For example, newer players to the game may choose a certain race slightly over another, and therefore, by losing more often, give the impression that such a race wins a lower percentage of games than another race, giving the impression of an imbalance. Therefore, say PvT on map M has 60% win for P, it does not mean that when two equally skilled players, one p and one t play on map M, that P will win 60% of the games.
The sample being used is the entire population. There is no inherant bias in that except perhaps a bias in the population itself. If you want to know, as a PGT player, what your chances are in a given match-up, then there is no reason not to accept this data.
The only thing that concerns me is how "balance" changes with different skill sets. Some maps emphasize micro/macro skills like Luna and Azalea, while others like 815 emphasize strategy. There's no way to quantify that. What I'm talking about DOES NOT effect the inherant "balance" of a map, but rather the "balance" as determined by an INDIVIDUAL. For instance, a progamer like NaDa with superior macro skills might go so far as to say that Luna favors him over Protoss. That doesn't mean this is true for the entire population. Lots of progamers talk about how 815 is a P map, but everytime I play it, it seems that T >> P (and believe me, I've been both the T and the P).
In other words, this data is interesting, but I feel it doesn't give the whole story. That said, I'm not sure it's possible to give the whole story. T_T You can't quantify someone's strategic ability.
Great job Pat (or whoever programmed it) for those cute statistics from pgtour Finally we can do some serious statistics backups for our imbalance discussions;)
And great posts Bill, doodoohead101 and mitsy, it is important to analyse those statistics because not everyone is able to really understand them and they might make BS points by reading statistics wrong -_-
While stats depending on rank are interesting too, I think you should use the stats from all games for the evidence-giving stats: - On PGT are not only gosus, so we don`t need the maps balanced only for gosus neither. Balance the maps for the level of play of PGT is best for PGT, obviously. - When considering every game, the sample size is bigger, and a bigger sample size is always good to make quality interpretations. - The level of play on PGT is pretty good anyway, it's not public battle.net or something
well Gaia TvT is inbalanced. bottom right is impossible to tank gas from before bridge and yet the others it is very easy. i've lost a few times because of this...
with all the race picking on pgt and the predictable but improportionate behavior of newbies, it seems easy to speculate that the "winning strength" avg of races will vary a few points.
Well, what strikes me most is the question that needs answering: what defines imbalance. Is it purely based on statistics? If so, at what skill level should we sample? Or do we take a simple random sample of gamers and look at their win percentages against each race?
In ym mind, these approaches miss something: there is a difference between imblance and "tricky." So when is a map "tricky" versus imbalanced? In my mind, it's "tricky" if there are only a couple of viable approaches to playing the map, which require some minimal degree of skill to pull off. For instance, ZvP on Estrella. Is it imbalanced or is it tricky? From my experiences, nobody I can't beat consistently on land maps ZvP is someone I can't beat consistently ZvP on Estrella. Granted, I have never played someone whose PvZ is equal in skill to my ZvP on Estrella, but it should be telling that it is winnable. By comparison, ZvP on Gorky island takes all my energy. I have never lost on Gorky, but I refuse to ever play it again because I know that even players who I can whomp all over have a good chance against me there. The difference, of course, is the easy to expand to tiny island above the main. P with 2 gases means lots of upgraded corsairs, and by taking other mains around the map, the Protoss can wear me down and keep me from playing a centrally dominated late game as I can on Estrella.
On December 27 2005 07:05 oneofthem wrote: hey, i don't think you can set pwins to .5 except in mirror matchups. to evaluate the empirical fairness of a map you must get the map-neutral winning percentage of each MU.
pwin is our hypothesized win ratio, so that is not a problem.
When considering every game, the sample size is bigger, and a bigger sample size is always good to make quality interpretations.
Actually that it not necessarily true. Having such a large number of lower-level PGT players in our sample makes our analyses less applicable to higher-level PGT players. Ultimately it depends on who your target population is.
But you can't ignore higher level ranks either, because some of those games played at the D ranks are from people who inevitably made B or higher.
As interesting as it may be to look at the actual statistics, they don't quite tell the full story. There are too many factors that are not quantifiable.
Er, your trying to figure out if a map is ballanced for a certain matchup based on statistics and play. This beggs the question and brings actors into play. What players whan to be convinced of is that a certain race matchup on a certain map is far before they play. That the match will come down to skill and choice, not things that are present beforehand. Is there a way to just consider just the maps and races? Plug in the specifications and get results...
My second question is what kind of system do the pro leagues use for picking maps? Do they just have them made and hope for the best? Do they test them?