|
United Kingdom10823 Posts
On November 25 2011 20:35 Belha wrote: Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races.
On cmon...how many times to I have to say that that's the point? And that's why I've been saying that the map changes were not necessary, since the meta-game and other balance changes could have sorted out the problems, but were never given the chance because the map was changed beforehand.
|
On November 25 2011 20:54 Hassybaby wrote:Show nested quote +On November 25 2011 20:35 Belha wrote: Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races. On cmon...how many times to I have to say that that's the point? And that's why I've been saying that the map changes were not necessary, since the meta-game and other balance changes could have sorted out the problems, but were never given the chance because the map was changed beforehand.
that's a pretty flawed argument. You could easily reverse this to say map changes are neccesary as that makes balancing the game easier. If maps are flawed you don't get to see the full spectrum of strategies and it get's more difficult to determine the general balance. Also map makers and tournament hosts have more information then given in your thread, they also have the opinion of pro's and perhaps even access to data like inhouse training sessions (directly or indirectly).
Also it's a completely subjective argument what counts as enough proof that a map is balanced or not.
|
20 games, let alone 10 is enough to give any indication of what race a map favors. Some of those statistics should not even be considered at all.
|
I think you forgot a 'not' in there.
|
Yeah
Also, the gold base changes on antiga and dual sight obviously weren't going to change much. If you let a terran safely get a 4th on either of those bases then you're in a bad position regardless of whether it's gold or not. It's not like xel naga where it's an easy 3rd.
|
On November 25 2011 23:28 Itsmedudeman wrote: Yeah
Also, the gold base changes on antiga and dual sight obviously weren't going to change much. If you let a terran safely get a 4th on either of those bases then you're in a bad position regardless of whether it's gold or not. It's not like xel naga where it's an easy 3rd.
the problem was the gold being taken as the first base, or the first expansion in certain situations. Basically happening on most maps lately, thats why tournaments removed the golds. Rocks would have had the same effect. But if you have won an engagement and are able to contain the opponent, take the gold and you can mess up alot they still won't be able to break the contain. Thus golds prevent comebacks, thats why they got removed. (and because golds without rocks force a protoss to do one base play)
|
Doesn't make much sense to say we had shakuras since the beta, it was a party map back then and I doubt any of your sample games come from before the time it was released for the ladder.
|
Really objective post.
54% win rate for Protoss - "seems really balanced" 55% win rate for Terran - "seems to favor terran"
let me guess, you are a protoss player?
|
United Kingdom10823 Posts
On November 25 2011 23:02 Markwerf wrote:Show nested quote +On November 25 2011 20:54 Hassybaby wrote:On November 25 2011 20:35 Belha wrote: Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races. On cmon...how many times to I have to say that that's the point? And that's why I've been saying that the map changes were not necessary, since the meta-game and other balance changes could have sorted out the problems, but were never given the chance because the map was changed beforehand. that's a pretty flawed argument. You could easily reverse this to say map changes are neccesary as that makes balancing the game easier. If maps are flawed you don't get to see the full spectrum of strategies and it get's more difficult to determine the general balance. Also map makers and tournament hosts have more information then given in your thread, they also have the opinion of pro's and perhaps even access to data like inhouse training sessions (directly or indirectly). Also it's a completely subjective argument what counts as enough proof that a map is balanced or not.
You do get the full spectrum, because players start to try anything that can do to help them on a map that is possibly flawed. Its the same problem that 1-1-1 had. Players tried pretty much everything to stop it, but couldn't when it was well executed. At that point, we had balance changes to help the case.
On November 26 2011 00:03 doko100 wrote: Really objective post.
54% win rate for Protoss - "seems really balanced" 55% win rate for Terran - "seems to favor terran"
let me guess, you are a protoss player?
Random actually
|
On November 26 2011 02:42 Hassybaby wrote:Show nested quote +On November 25 2011 23:02 Markwerf wrote:On November 25 2011 20:54 Hassybaby wrote:On November 25 2011 20:35 Belha wrote: Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races. On cmon...how many times to I have to say that that's the point? And that's why I've been saying that the map changes were not necessary, since the meta-game and other balance changes could have sorted out the problems, but were never given the chance because the map was changed beforehand. that's a pretty flawed argument. You could easily reverse this to say map changes are neccesary as that makes balancing the game easier. If maps are flawed you don't get to see the full spectrum of strategies and it get's more difficult to determine the general balance. Also map makers and tournament hosts have more information then given in your thread, they also have the opinion of pro's and perhaps even access to data like inhouse training sessions (directly or indirectly). Also it's a completely subjective argument what counts as enough proof that a map is balanced or not. You do get the full spectrum, because players start to try anything that can do to help them on a map that is possibly flawed. Its the same problem that 1-1-1 had. Players tried pretty much everything to stop it, but couldn't when it was well executed. At that point, we had balance changes to help the case. Show nested quote +On November 26 2011 00:03 doko100 wrote: Really objective post.
54% win rate for Protoss - "seems really balanced" 55% win rate for Terran - "seems to favor terran"
let me guess, you are a protoss player? Random actually
so a 1% difference in win rate is enough for you to go from "pretty balanced" to "favors race x". can you explain the thought behind this because I don't understand it.
|
United Kingdom10823 Posts
On November 26 2011 02:44 doko100 wrote:Show nested quote +On November 26 2011 02:42 Hassybaby wrote:On November 25 2011 23:02 Markwerf wrote:On November 25 2011 20:54 Hassybaby wrote:On November 25 2011 20:35 Belha wrote: Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races. On cmon...how many times to I have to say that that's the point? And that's why I've been saying that the map changes were not necessary, since the meta-game and other balance changes could have sorted out the problems, but were never given the chance because the map was changed beforehand. that's a pretty flawed argument. You could easily reverse this to say map changes are neccesary as that makes balancing the game easier. If maps are flawed you don't get to see the full spectrum of strategies and it get's more difficult to determine the general balance. Also map makers and tournament hosts have more information then given in your thread, they also have the opinion of pro's and perhaps even access to data like inhouse training sessions (directly or indirectly). Also it's a completely subjective argument what counts as enough proof that a map is balanced or not. You do get the full spectrum, because players start to try anything that can do to help them on a map that is possibly flawed. Its the same problem that 1-1-1 had. Players tried pretty much everything to stop it, but couldn't when it was well executed. At that point, we had balance changes to help the case. On November 26 2011 00:03 doko100 wrote: Really objective post.
54% win rate for Protoss - "seems really balanced" 55% win rate for Terran - "seems to favor terran"
let me guess, you are a protoss player? Random actually so a 1% difference in win rate is enough for you to go from "pretty balanced" to "favors race x". can you explain the thought behind this because I don't understand it.
Lemme double check which exact numbers you're referring to. the 54% is for TDA 1.0 PvT, and the 55% is Shakuras v2.0 TvP, correct?
|
i dont know about anyone else's opinion but personally since the last patch ive felt so so much better about the maps, its been such an improvement to me.
|
Besides the arguments already made, I decided to spot check some of your calculations.
For the original Belshir ZvT with 10-6:
μ = 62.5%, which agrees with you. σ =12.5%, which is smaller than your 15%. 95% confidence interval = 24.5%, which is larger than your 15%.
I thought perhaps you'd used the sample se instead of the hypothesis testing se for a binomial distribution, so I checked that too:
σ = 12.1%, which is smaller than your 15%. 95% confidence interval = 23.7%, which is larger than your 15%.
So I'm not sure how you got your standard errors since that's the only common mistake that comes to mind.
I suspect the same mistakes are made in the other stats threads that pop up. In your case, I think it's commendable that your presentation is transparent because that allows your results to be verified. I am troubled by the other stats threads that aren't verifiable because I suspect they may also have mistakes in their error calculations - after all, in those other threads, the errors were added as afterthoughts upon community request, which is not a good sign since errors are such a fundamental part of statistics.
For the other threads, this means that their results show significance when actually there is no significance. In your presentation, I decided to check some other results for significance:
Belshir Winter TvZ with 5-2:
μ = 71.4%, which agrees with you. σ = 18.9% 95% confidence interval = 37.0%.
Your results show that TvZ is significant but in reality it is not. Similarly for PvZ and TvP.
However, in the case of Belshir, there's a bigger issue in that you analysed your results even though your sample sizes are so small, the normal approximation is illegitimate. In these cases it is best not to draw any conclusions at all.
Shakuras 2.0 TvZ with 356-292:
μ = 54.9%, which agrees with you. 95% confidence interval = 3.85%, a larger interval than yours, but the result remains significant, which agrees with you.
For future reference, your standard errors for hypothesis testing should be calculated using σ = 0.5/sqrt(n), where n is your sample size.
|
United Kingdom10823 Posts
I actually used σ = sqrt(a)/n
Where a is the smallest number of the wins and the losses, and n is the sample size. Errors have never been my strong point, so I had help there. If you want, you can have a look at the data I was using.
https://rapidshare.com/files/1308210131/Map_stats_article.xlsx
Maybe it was a bad idea to draw conclusions on Bel'Shir, but it felt very weak to just give the results and then not conclude anything, so I gave a personal opinion. Not my best move in hindsight
|
All right, I'll recheck my equation. I suspect I may have confused the sample distribution with the sampling distribution.
|
Terran had a good run the last few months off the back of Blue Flame, 1-1-1, and heavy ghost usage. That's all been dealt with and the dust hasn't settled from the latest changes. Wait to see how the overall percentages pan out over the nest couple months then revisit the maps.
|
Sample size in some cases is way too small, ~200 games played is probably a reasonable sample.
|
Including Antiga and Bel-shir in this analysis was a mistake. Especially trying to draw conclusions about the effects of the map changes from one version to the next. You say there is no choice but to "take them as they are" but that is not true. The choice is to not include them because drawing conclusions from that small of a sample is idiotic. I just went to a "coin flipping" website flipped 10 coins and got 3 heads and 7 tails. Should I conclude that tails is extremely favored over heads or should I not conclude anything because the sample is far too small to mean anything?
|
if there isn't enough data, then you kind of wasted your time because I look at belshir beach and how little results there are and I ignore that post. Thanks for taking your time to do this, some of this I already knew but should help some players (or hurt them by making them realize they can't ever win a game on a certain map so they QQ and blame it on blizzard instead of themselves)
|
What's the theory behind the formula you use for your standard errors? Does it have a name?
I checked your formula against mine for some dummy scenarios and your formula certainly overestimates the standard errors (a good thing), sometimes by a very wide margin (a bad thing). My formula tended to underestimate the standard errors, but they were much closer to the real value, especially when P = 50%.
However, considering the relatively small sample sizes, you don't actually need to use approximations (which are only reliable for massive sample sizes anyway). Your largest sample is only about 700 observations, so you can solve all of them directly using the binomial distribution. The downside is that you can't easily solve this using a basic calculator.
Here's a sample calculation comparing our methods:
Using your data for Shakuras Plateau 2.0 (from your post, not your raw data):
TvZ: 356-292 (54.94% ± 2.6% to Terran)
Using my formula, we get a se of 1.96%, and a p-value of 0.012. This is significant at the 95% level.
Using your formula, we get a se of 2.64% and a p-value of 0.0614. This is not significant at the 95% level.
What is the real value? Solving using the binomial distribution directly, I obtained a 1-sided p-value of 0.006634, i.e. a 2-tailed p-value of 0.013, which is significant at the 95% level.
As you can see, my formula resulted in a much closer approximation, with a tendency for underestimation of the se. The outcome is that we now have sufficient evidence that Shakuras Plateau 2.0 is imbalanced in TvZ whereas using your formula we couldn't say that.
I think you will find that the tighter se calculations will be useful for your cause. A quick glance through your results, using your numbers, show that:
There is insufficient evidence of imbalance on any version of TDA, and thus it never needed updates.
There is insufficient evidence of imbalance on any version of Antiga, even if we ignore the small sample sizes, and thus it never needed any updates.
There is evidence that TvP is imbalanced on Shakuras Plateau 2.0 and none before, so the changes were actually detrimental.
There is evidence of imbalance on Lost Temple in TvZ, and on Shattered Temple 1.0 in TvZ, and on Shattered Temple 1.1 TvZ and ZvP.
So overall it looks like Shattered Temple 1.1 has been made more favourable for zerg, going from 70% in TvZ to just 58%, and from 55% in ZvP to 60%. So I agree with your assessment about the large effects of removing close spawns, and it certainly agrees with conventional wisdom that close spawns are bad for zerg. And we can see that close spawns are bad for zerg in both ZvP and ZvT.
And we ignore Belshir Beach due to the small sample size.
Imagine how much more you could say if you calculated your se's using the binomial distribution directly?
Just be careful that there are some tricky details involved. If your results don't exactly match mine, you should recheck your methodology.
I think these sorts of statistics are fun to look at. I just wish people didn't have such knee-jerk reactions to them as we've seen in many of the responses here, and I also wish people wouldn't use them as fuel for balance whines (cheese and whine seem to be SC2's primary industries). Overall I think it's a good effort and regardless of significance levels, it's interesting to see the ideas involved.
There are simple ways to account for skill levels, but the data preparation is tedious without cooperation from a source like the TLPD.
As for accounting for metagame changes, if it is possible to break down the data for each map into small chunks, we can get a better idea of how gameplay on a particular map has developed over time. For example, with Shakuras Plateau 2.0, you have about 600 observations for each matchup. If there was a way to separate the data into, say, 6 parts based on when they were played, each with 100 observations per matchup, it would be easier to get a clearer idea of how gameplay has evolved despite there being no changes to the map. I think the TLPD already has the facilities to do this, although does require some work. Then it is simply a matter of choosing which time periods to break the data over. It can even be done to get an idea of the effects of balance changes. One interesting outcome to look out for is a sudden change in the matchup statistics in the later life of a map before it is updated. For example, a map that looks TvZ favoured overall could be TvZ favoured in the first 5 periods and then swing towards a slight ZvT favour in the final period before it is updated to a new version. This would indicate that the update occurred at an inopportune time.
And, yes, this would actually involve deliberately reducing the sample sizes in your analysis. Oh, the horror. :-P
|
|
|
|