|
On May 20 2011 14:08 Warble wrote: For a start, I believe that beginning with the idea of imbalance is the wrong way to start. What a lot of people do is begin with the idea of imbalance, and then seek data to back up their opinion.
exactly this that the reason, why P says P UP T says T UP Z says Z UP
and the otherway around with OP.
especially when the Argument comes in that a matchup is "hard"? What does this mean?... A Matchup has to be "hard" to be considered Balanced for both sides -.-
|
Amazing, I really enjoyed that ^^
|
On May 20 2011 14:08 Warble wrote:
Only masters/pros should talk about balance.
I am of the opinion that no player can truly talk about balance. Psychological studies have found that everybody has an inherent bias that favours themselves regardless of how unbiased they try to be. In other words, any balance suggestion by a pro will necessarily try to make his own race OP rather than to achieve objective balance, no matter how well-intentioned the pro is.
Firstly, Post-hoc analysis for hypothesis generation is perfectly fine and done quite commonly. Internal biases for the investigator is also fine long as their methodology is solid. Its pointless to say an expert in the field should be barred from their hypothesis because they have some X bias. Anyone immersed in a topic will inherently have biases. So what if they have inherent biases? Are their proofs solid? If not you discredit them based on their data and not hand-wave their efforts because their beliefs don't align into some mythical perfect neutrality.
Asking Idra for his opinion for ZvP balance is more meaningful than asking some disinterested passer-by what he thinks about ZvP. Sure Idra is known for his rigidity and having strong opinions, but his information for why he hypothesize ZvP is imbalanced is A) more informed because he is a pro gamer B) falsifiable because he'll provide the premise in why he believes its broken and this leaves room to prove him wrong.
On May 20 2011 14:08 Warble wrote: We can only find imbalance by looking at the top level of play.
The problem here is that people tend to post simple summary statistics and graphs and call it a day. Consider, for example, if this game only had terran and no other races. Our statistics from GSL would show MKP and MVP winning most of their games against other terrans. Conclusion? It’s not terran that’s OP, it’s the players.
Now add the other races and players back in. When MKP plays against a protoss and wins, how much is a result of his race and how much of it is a result of his hard work and raw talent? The summary statistics do not show this.
The first paragraph is a faulty in the logic and quite meaningless to say. If the game had one race you wouldn't even argue imbalance nor would you even look at statistic to look for "imbalance".
Then your second example is to add two races and say MKP's stats don't matter. Yes, but not because MKP is a top player but because you're taking only one individual information; of course its meaningless. It has nothing to do with looking at a population of high level players to look for imbalances.
You basis for not looking at high level play for balance doesn't work cause you can apply that same faulty logic across any skills level - and even more damning because at the middle of distribution you have even more variance in individual skill level because its easier to improve when you're starting rock bottom.
So you basically make no convincing argument to why we shouldn't look at summary statistics for high level players to look for imbalances.
I appreciate the effort post and trying to back your claim is especially refreshing, but as many people pointed out it has many flaws in the theoretical assumptions.
|
On May 20 2011 14:21 Nik0 wrote:Probably
You forget Baller's first post...Epic.
|
I just want to say that even though I cannot read this atm, just woke up and it's soo much. It actually looks very promising. I almost got a heart attack when I kept opening spoilers and there were more spoilers inside! And not little things, multiple paragraphs standard. Really like it, I'll save and read it later when I'm more fresh. Without a doubt there will be at least a couple perspectives that I've never thought about. Nice job must've been a lot of work!
|
United Kingdom12022 Posts
Just read it all, fantastic post man. I agree with almost everything your saying, then again that's kind of my own bias (but if you think about it, all thoughts and feelings are bias!) but I hope a lot of people give it a read!
|
I would love to read your mechanics analysis
|
Great read. Would love to see the maths behind everything.
|
Not quite sure what to say, other than that you've done an amazing job at this. Thanks a lot.
|
On May 20 2011 20:03 wassbix wrote:Show nested quote +On May 20 2011 14:08 Warble wrote: We can only find imbalance by looking at the top level of play.
The problem here is that people tend to post simple summary statistics and graphs and call it a day. Consider, for example, if this game only had terran and no other races. Our statistics from GSL would show MKP and MVP winning most of their games against other terrans. Conclusion? It’s not terran that’s OP, it’s the players.
Now add the other races and players back in. When MKP plays against a protoss and wins, how much is a result of his race and how much of it is a result of his hard work and raw talent? The summary statistics do not show this.
The first paragraph is a faulty in the logic and quite meaningless to say. If the game had one race you wouldn't even argue imbalance nor would you even look at statistic to look for "imbalance". Then your second example is to add two races and say MKP's stats don't matter. Yes, but not because MKP is a top player but because you're taking only one individual information; of course its meaningless. It has nothing to do with looking at a population of high level players to look for imbalances. You basis for not looking at high level play for balance doesn't work cause you can apply that same faulty logic across any skills level - and even more damning because at the middle of distribution you have even more variance in individual skill level because its easier to improve when you're starting rock bottom. So you basically make no convincing argument to why we shouldn't look at summary statistics for high level players to look for imbalances. I appreciate the effort post and trying to back your claim is especially refreshing, but as many people pointed out it has many flaws in the theoretical assumptions.
He's saying that looking at high performing players for balance might skew the balance IF there's a couple more great players using a specific race. Looking at top players is not worthwhile if one race is just more popular (as terran seem to be in korea) or if there's just a couple more really great players using that race.
|
Mods, spotlight this..NOW! Excellent (first) post, that's all I've gotta say.
|
On May 20 2011 15:00 avilo wrote: Just because someone made an incredibly long post does not make it mega awesome or even remotely accurate.
Exactly, it also makes it easier to hide absurdities.
Such a long post makes it impossible to point out all the defaults/absurdities (and I think there are a lot of them in the OP). This is in no way helpfull to anydebate.
I also can't stand people replying "it's amazing", without even reading what they agree with. 2 minutes to read such a long post ? Really ?
|
We can only apply a subtle model for balance, but can't achieve mathematically proven balance.
Mine would try to almost even out the win ratios per match-up per map per time stage at least not below diamond and I'd especially emphasize looking into statistics of the professionals.
At a first glance, the balance situation still looks pretty grim and I don't like Blizzard's half-balance and non-transparency philosophy either.
|
Excellent post. Blizzard hire this guy!
|
On May 20 2011 22:58 FutureArchon wrote: Excellent post. Blizzard hire this guy!
That would cost a lot more than their current solution of Monkey + Dartboard.
|
Thank you for actually understanding statistics and sample size. I remember reading a post awhile back where someone posted an interested statistic (I forget exactly what) that was around 94%. He was attacked for several pages by people demanding a sample size of 10,000 and greater compared to his sample of 700. Made me pretty angry.
Great read though.
|
This was written a bit tongue-in-cheek starting from the very first line since I see SC2 as something to be enjoyed rather than, well...work.
The idea wasn't to provide a Theory of Everything on using statistics to analyse balance, but to provide some groundwork for ideas to develop since I saw that the community kept going back to the same old ideas without moving forward. I thought that with a bit of groundwork everyone can pitch in and develop something better. While it would be nice to have a solid and thorough Theory for Everything, they take considerable work to develop. Those who have attempted a thesis (or even succeeded) knows how much work goes into contributing to theory, and how small each individual's contribution is. My favourite example is how it took centuries for the best minds in maths to prove that we cannot solve polynomials of degree 5 or higher using radicals (and one of those minds died young over a dispute regarding the affections of a young lady). Blizzard probably has its own in-house statisticians who spend 8 hours a day on this problem while I do it on the bus. The difference is that Blizzard isn't sharing their findings with the community.
I guess a bit more background would be useful here. I am interested in the idea of how we might go about balancing a game, and look at SC2 because it's the game I enjoy most. The aim was to think about in the way I would if I was working on Blizzard's balance team. I mentioned in the Background that I was working on a more generalised model, but the levels of complexity make it difficult. Many of you have already pointed out many of the factors that such a model would need to consider and I'd be interested in seeing what methods you use to integrate those factors into your models to provide useful conclusions. After all, one of my aims in posting here was to get the ball rolling and see what others can come up with. It's always good to learn from others.
One impression I hoped to convey with this model was how hard it is to use statistics when analysing balance. I talked about this a bit in the Extensions. This model was very simple and set in an idealised world, yet already it imposes so many restrictions on the conclusions we can draw when looking at the data. We then relaxed a few assumptions to make it more rigorous and the restrictions grew, as did the uncertainties. So if you take some win/loss statistics from masters and graph them, what does it really show?
I took a bit of a lazy way out when saying that high-level games should only be analysed using proper statistical analysis rather than summary statistics. After all, considerable work is involved in formulating a proper statistical analysis. I posted a link to an attempt by another on TL and I think he should be commended for the effort. However, I think it was also written for a school assignment, so maybe he was pressed for deadlines.
Here's an example just to illustrate one of the tougher barriers to a meaningful statistical analysis:
+ Show Spoiler + Consider a GSL with just 2 pros, MKP and MKQ. MKP plays terran while MKQ plays zerg. TvZ is perfectly balanced and both players are of equal talent (t). We define their skill as S(t,p) where p is the hours of practice they put in, and dS/dp > 0 for all p, i.e. the more they practise, the better they play.
We observe 30 games between these 2 players on XNC and conduct an analysis using parameters representing their skills, say BMKP to represent how much higher MKP's skill is compared to MKQ, and BTvZ for how overpowered terran is against zerg.
Take a moment to see why this doesn't work.
So if MKP wins 20 games and MKQ wins 10 games, what does that tell us about our parameters?
Nothing!
Why? Because we don't know which one is relevant. Did MKP win more because he practised more, or did he win because TvZ is imba? Sure, I told you that the game is balanced, so you know MKP practised more - but when conducting the analysis, we don't have this information - the point of the analysis is to figure out using only their results how skilful they are and how imbalanced the matchup is.
This applies even if you add more players. Say we clone both players a few times, and each clone is slightly weaker than the last. So from the best terrans we have MKP, MKPa, MKPb, and from the best zergs we have MKQ, MKQa, MKQb. This is a bit trickier to explain, so let's say we break them up and analyse just the terrans first and then just the zergs. Our analysis would find their relative rankings within their own races and work out their skill levels relative to an arbitrary baseline. Say we get 2, 1, 0 for the terrans and 2, 1, 0 for the zergs.
Now we let them play each other and conduct an analysis on that. Let's assume that the terrans practised more and are one skill level ahead of their zerg opponents. So we would rate their skills MKP = 3, MKPa = MKQ = 2, MKPb = MKQa = 1, MKQb = 0.
But the problem is that our model has the parameter BTvZ, which actually makes it impossible to solve for this. We would get the above if BTvZ = 0, but it could also be 1 and we would subtract 1 from all the skills we calculated above. Or BTvZ = 0.5. There are an infinite number of possibilities. I can't remember exactly, but I think you wouldn't actually be able to find a solution to this at all. But you can see that even if you do find a solution, it's actually meaningless.
To further compound the problem, the players in reality won't keep their relative performances. Maybe some are better at TvZ than TvT and others are worse at TvZ than TvT. Then your model becomes a mess and you're left wondering if the results of your analysis are actually meaningful because you haven't really addressed the underlying problem of attribution.
What is the source of the problem? It is because the player and the race they play are not independent variables. MKP, MKPa, MKPb always play terran and hence we cannot distinguish between when an effect came from the player or from the race.
So it's quite possible that we can't use available statistics to make any inferences about imbalance at all.
But that's where the fun is. Maybe there are creative ways around it. Maybe there are things we can measure that will allow us to draw conclusions. I guess you can also wonder if Blizzard has figured it out and what work-arounds they have found for it.
Personally, I suspect this is one of the benefits of forcing players to log in to play the game. It allows them to collect more data on player behaviour, like how often they play custom games and against whom. I would love to know what Blizzard is doing but I doubt they will share it with us. If there is a solution, I think it lies somewhere in those sorts of data. After all, when you play a custom game, it shows up in your match history with your build order and everything. This means all that data is collected... And I have always felt that they save every replay too (how else would they catch cheaters?), which means they have data on APM and all that and can control for games where players aren't actually trying. I'm not sure how yet, but it seems like the most promising direction.
This means that as a pro, it's definitely a good idea to make sure you do all your practice games legitimately and on Bnet rather than on private hacked lans because when Blizzard goes to analyse the data and sees your stellar performance with only a handful of hours played, your race will look easier than it is.
I have fixed up a few issues with wording in my OP after reading some suggestions here. There's actually an interesting story behind why I made some of the mistakes.
+ Show Spoiler + When I began writing it (it's been sitting there for a while now), I had recently taken an aptitude test when applying for a new job. One of the first questions was a basic statistical one. I can't remember it exactly, but it basically came down to a ratio of 2:1 when comparing two statistics. And it asked how many times more did the first thing occur than the second. However, the options given were ridiculous, like 50%, 200%, 250%, 300%, 400%. Needless to say, I was nonplussed.
As for questions about things like map balance, it would be quite simple for Blizzard to look at balance on individual maps. Unless we can get similar data as players, it will be harder for us.
As for the Myth Busting section, it was more subjective than the analysis, which was why I put it after Outcomes, which was where the analysis ended. I also used the words "my opinion" in the Myth Busting section when I believed there was a significant subjective element to what I was saying, although I tried to stick with things that have objective support (such as how humans will still be biased even if they earnestly try to be unbiased to explain my belief on why we shouldn't give it too much credence when a pro claims the game is imbalanced).
EDIT: I made a mistake in this post explaining why a statistical analysis wouldn't be able to differentiate between the effect of a player's skill and racial imbalance. This has been fixed.
|
Imagine if SC didnt allow for a player to pick thier race. The quality of balance discussions would skyrocket.
I liked this post though.
|
The issue is that your definition of balanced is based on the outcomes of the games, not the game itself.
Players who play the game would like to believe that even though they are able to win now, they see a fundamental issue that an unit of the opposing race has, that can be abused given higher APM, multitasking and experience.
What statisticians lack is the foresight. Statisticians can only gain their predictions through models and trend analysis. It has been seen again and again that the RTS genre does not develop in a plottable fashion in any given XYZ axis, and instead follows a paradigm shift.
There are certain examples where one can predict, roughly, the exponential growth of technology (for example, http://en.wikipedia.org/wiki/Moore's_law). But there is no way we can say "because today we have been able to map the DNA of a human being, in three years we will have a X% likelihood of obtaining time travel."
The game is dynamic, and a simple change in playstyle can open up a completely new paradigm; see TvP tank play and ZvP ling/bling intro ultras play. One can argue that the underpoweredness of a race displayed in the short run motivates players to improve faster than the overpowered race who has no obligations to improve, hence balancing out the long run. This however is an assumption based on a flimsy model.
The conclusion that one should reach is that the game is evolving nonlinearly. It is a concept we cannot look at by plotting it on a line. Hence, mathematical models are extremely hard to implement for any long-term conclusions, and short-term conclusions are useless in most senses of the word.
PS: In the word of Blizzard themselves, the statisticians they hire are only one way of measuring balance. They take into account many other factors including replays sent to them by pros. One occasion I remember, it was MKP? that sent Blizzard replays involving an absolute imbalance in void rays, and thus the VR nerf early on in the game.
|
On May 20 2011 23:35 RoachyRoach wrote: Imagine if SC didnt allow for a player to pick thier race. The quality of balance discussions would skyrocket.
I liked this post though. No that doesn't work since people would probably not play the game if you were forced random -.-
|
|
|
|