Scientific proof that SC2 is imbalanced (sorta) - Page 11

silencesc

United States464 Posts

August 19 2010 16:25 GMT

#201

On August 19 2010 08:12 GagnarTheUnruly wrote:

p-values are nearly zero for all leagues except silver, where it is above 0.2 or 0.3 if memory serves. That's with a n = ~50 million.

Show nested quote +

Unless I'm misunderstanding something, that's the test I've done, but with a different sample size. If you use percentages as your counts than your sample size is 100, unless you've adjusted them for the actual sample size (as I have). For a chi-square test it's important that the correct sample size be used.

@ all the people who think my results are meaningless due to matchmaking:

I've said this a lot by now, but I'll state it again for the last time here:

I have reasoned that if the game is imbalanced, that imbalance must manifest as either 1) a difference in win ratios for the different races or 2) a difference in race prevalence as you increase player skill level, except under one of several unlikely scenarios and one likely one.

The degree to which it shifts from condition 1) to condition 2) depends on the strength of the matchmaking system. Since we don't see 1) (as my data show), and we don't see 2) (as I have said and escapeartist has shown), we can conclude that the game is balanced, at least for regular league play.

The unlikely scenarios:

a. Blizzard's matchmaking system is wise to racial imbalances, and choses lower level opponents for a player of a given ranking if they play as a weak race vs. a strong race. The only reason they would do this would be to 'hide' racial imbalances from the player and/or the community.

b. Blizzard's matchmaking system does nothing, and each league is a random sample of the regional player population.

c. People have no race loyalty, and randomly pick their race before each match.

The likely scenario:

d. The races are balanced overall but matchups are imbalanced, in a rock-paper-scissors fashion. I favor protoss, and I really feel that I struggle against Terran.

@ all the people who think the test is inappropriate because I haven't modeled enough variables that affect win rate

I don't have access to data that will allow me to do that. I'd like to, but I can't. In science when that's the case, you have to look for other ways that you can use to test a question. In my case initially reasoned that an imbalance would lead to a difference in win rates among races. People immediately pointed out that that wasn't the case, due to matchmaking. However, I then realized that the matchmaking system would force weak races into the lower leagues.

I checked to see if that happened and amended my analysis with a graph showing that it doesn't. Escapeartist has since analyzed this in more detail and come to the same conclusion, although nobody to my knowledge has done an analysis for lower league play. People have shown, however, that it's not true for the top hundred or so players in each region.

@ the people who think stats are useless

It's been shown before that stats are a much better way of assessing the truth than anecdotal knowledge. Even experts often have misperceptions, and misperceptions often produce feedback loops. Stats are at least partially resistant to this.

That said, I think opinions and impressions of top-level players (IdrA's thoughts on high level ZvT matchups, e.g.) still warrant attention, and consistently held beliefs warrant scientific investigation. In fact, that's what I did with respect to win rates for league play!

Finally, thanks everyone for your interest! I'll keep trying to answer questions but I know I'll miss some and for that I apologize.

If the p-value is nearly zero, you have to say it's balanced based on your results. Unless it's .05 about zero, you cannot say that you have voided your null hypothesis, so how did you come to the conclusion that the game is imbalanced?

humansherdog

Canada85 Posts

August 19 2010 16:31 GMT

#202

On August 17 2010 07:16 GagnarTheUnruly wrote:
The sample size is so vast that random chance can't explain the differences, but at the same time they are so small as to be meaningless.

Oh god this.

TanGeng

Sanya12364 Posts

August 19 2010 16:59 GMT

#203

On August 20 2010 01:03 escapeArtist wrote:
I also don't see how skill curve equals imbalance. I do agree that zerg needs higher apm than say terrans, but if they perform at the same level then I still doesn't see any imbalance issues, since unless proven different we must assume that they have hit the relative skill cap when playing in the upper diamond level. As I have shown in the upper diamond level Zerg is gaining in population. And withouth any intelligent discussion we can safely assume that random takes more skill than ALL the other races. Even them are gaining in popularity in lower diamond leage. This alone strongly support my statement that difficulty is not equal to balance.

If imbalance is performing at different level when at the same rating level on the ladder then you have proven that the ladder works. This definition of imbalance is trivial.

If imbalance is sharply different learning curves where one race is harder at one point or all points, then you haven't proven anything about that. If imbalance is taking applying a different body of skill sets, then I would think that would be true by definition since the races play differently.

If imbalance is the difference in win rates that you get when all races are being played optimally - at the very very end of the learning curve, you will never get it with statistics because it's retrospective and only captures the history of the metagame.

If imbalance is the difference in win rates between the best that the metagame has to offer, then you shouldn't look at anything below upper diamond because by those players aren't at what the metagame has to offer.

My thesis is you haven't shown anything substantive to be true on the issue of imbalance. This tool you are using isn't convincing and shouldn't be convincing.

On August 17 2010 07:16 GagnarTheUnruly wrote:
The sample size is so vast that random chance can't explain the differences, but at the same time they are so small as to be meaningless.

You CANNOT interpret it this way. This is a huge leap in logic unsubstantiated by your statistical tool.

TanGeng

Sanya12364 Posts

August 19 2010 17:31 GMT

#204

I'm going to call pure BS on this "scientific" process. It a simple failure in logic and you're trying to pawn it off on us. When you prove P implies Q, what you have also proven the contrapositive ~Q implies ~P.

Now you attach some statistical mumble-jumple to your P implies Q and expect us to believe the converse, Q implies P? Fuck that.

texmix

United States106 Posts

August 19 2010 19:10 GMT

#205

If every terran unit suddenly had +25% hit points (making them obviously overpowered), the original post methodology would still conclude the races are equal and have the same 2 pages of statistical garbage backing it up.

A simple way to get a better result would be taking the top couple hundred players who choose Random, download match histories, and look at how often each of them win as P, T, and Z. The results at least be closer to something meaningful (though that doesn't say much).

Yuka

United States133 Posts

August 19 2010 19:34 GMT

#206

On August 20 2010 04:10 texmix wrote:
If every terran unit suddenly had +25% hit points (making them obviously overpowered), the original post methodology would still conclude the races are equal and have the same 2 pages of statistical garbage backing it up.

A simple way to get a better result would be taking the top couple hundred players who choose Random, download match histories, and look at how often each of them win as P, T, and Z. The results at least be closer to something meaningful (though that doesn't say much).

As a Random player myself (and looking at all the requests for Random player data throughout the thread) I at first considered that to be a better metric. However after further thought, I don't think that data would be as useful because:

a) true, purely Random players seem to be something of an outlier in the overall data
b) most Random players are not equally skilled with all three races, a key assumption that would have to be valid for the data to be useful

So as you say, it'd be meaningful, but not by much more.

Kudos to OP by the way for trying this undertaking. At the bare minimum, it has generated some interesting discussion and serious thought.

TanGeng

Sanya12364 Posts

August 19 2010 20:06 GMT

#207

Random players can have selection bias. Basically the possibility is that it may take a special kind ofnplayer to play and want to play as random, and by using randoms, your taking an unrepresentative sample.

As a case study it's better than looking at the different players playing different races, but it falls far from a controlled experiment. The experiment would be choosing players at the same skill point (e.g. platinum 500) and forcing those players to play random for a certain number of games, and looking at the results. This is still flawed since it will also measure how well skill or lack thereof transfers between the races for those players dedicated to playing a single race.

As constructed I doubt these statistics will answer any questions about balance in sc2 when a non-trivial definition is use without a lot more work.

socal50

United States93 Posts

August 19 2010 20:08 GMT

#208

i think the graph shows its balanced more than anything, despite terrans having a slight edge
the little imbalance could be due to random variation

nihlon

Sweden5581 Posts

August 19 2010 20:14 GMT

#209

On August 19 2010 08:12 GagnarTheUnruly wrote:
d. The races are balanced overall but matchups are imbalanced, in a rock-paper-scissors fashion. I favor protoss, and I really feel that I struggle against Terran.

That's retarded logic. How can you consider a game functioning in the rock paper and scissor fashion balanced overall? If one matchup is imbalanced, so is the overall balance. Which is what most people are arguing.

Does this sound like a reasonable argument?

Blizzard: "The game is balanced perfectly overall"
Players: "How the hell can you say that when I can't win against the X race?"
Blizzard: "Yeah, but we are talking about overall balance here..."

Gentlebite

United States132 Posts

August 19 2010 20:18 GMT

#210

Factors including player skill level, the amount of the Race in population, this shows matchmaking is balanced but doesn't signify any gameplay thingies

andyrichdale

New Zealand90 Posts

August 19 2010 21:18 GMT

#211

What?

If Terran units had 25% extra hit points then Terrans would win considerably more of their matches than they currently do. This would reflect in a win% increase to the point where it's in the "considerably higher than expected" region which would lead to the conclusion that Terrans are over powered.

andyrichdale

New Zealand90 Posts

August 19 2010 21:20 GMT

#212

On August 20 2010 05:14 nihlon wrote:

Show nested quote +

He's just saying that the average win% of each race against every other race is pretty even. Whether or not a game is acceptable given imbalances in certain matchups is another question altogether really.

ParasitJonte

Sweden1768 Posts

August 19 2010 21:23 GMT

#213

On August 20 2010 05:14 nihlon wrote:

Show nested quote +

You're arguing over semantics. He didn't pass any judgement on whether it was a good thing or not. He simply stated that it was a likely scenario. What you then call it, really doesn't matter.

nihlon

Sweden5581 Posts

August 19 2010 21:25 GMT

#214

On August 20 2010 06:20 andyrichdale wrote:

Show nested quote +

I know what he is saying, it just makes very little sense in using that kind of logic when discussing balance. Saying the game is balanced overall, just because of that fact is just pointless. Who wants to play a rock paper and scissor game?

nihlon

Sweden5581 Posts

August 19 2010 21:27 GMT

#215

On August 20 2010 06:23 ParasitJonte wrote:

Show nested quote +

You're arguing over semantics. He didn't pass any judgement on whether it was a good thing or not. He simply stated that it was a likely scenario. What you then call it, really doesn't matter.

No it's not just arguing semantics in this case. He have been using that point to argue the game is balanced earlier in the thread when it's clearly not in such hypothetical situation. He is the one playing with semantics to prove his own points when he uses the overall win % to prove that the game is balanced. (So yes he is passing judgement)

Stargazer

United States10 Posts

August 19 2010 21:32 GMT

#216

I did some analysis yesterday of the top 200 players currently on the sc2 ladder and got some interesting results. I agree with the many other posters who have said that analysis should be done in the top levels of play for accurate assessment of any sort on game balance for two reasons.

First, only at high levels of play do balance issues become relevant. Why bother arguing imba in silver if you can just learn how to macro properly to get into gold or platinum?

Secondly, and I think more importantly because people tend to overlook this, it is only at the lowest and highest end of the matchmaking system that the game will show any signs of imbalance among races based on performance. Think of each race's population in sc2 as a ladder--wordlplay definitely intended :D--with the highest players at the highest rungs in top diamond and the lowest at the bottom of bronze. Ideally, each of the three ladders (or four if you want to include random) will stand equally tall and have about the same distribution in skill level. If the game is favored toward one race, however, we get a translational effect, where the best of race A are better than the best of race B. Then the good of race A become better than the good of race B and so get matched up with the very good players of race B, and so on. Matchmaking doesn't recognize any 'inherent' skill, only performance. Thus, the players of race A won't have a significantly better performance overall, since the middle population occupies over 99% of the ladder. They will, however, have a stronger performance at the top and bottom (theoretically, but not in reality) of the ladder. Since the lowest end, the worst of the bronze league, won't provide anything useful for us, we need to analyze the high end of the ladder.

As a disclaimer, this is a preliminary analysis. It is by no means exhaustive and it makes no conclusive claims. It's more of a thumbnail look at the trends, as there are no chi-square tests or other tools used to check for statistical relevance except for common sense. If someone wants to run more involved statistics on this data set, by all means please do, but don't criticize it for being too weak to support its conclusions because I am only using it to hint and show correlation. I only speculate about causative factors and I leave evidence of that to further healthy discussion and deeper analysis.

Data taken from http://sc2ranks.com/stats/
From the top 200 players as of yesterday evening, I looked at race frequency, % diamond players of each race in top 200, the mean and median points of each race in the top 200, and the mean points and frequency for each quartile of each race.

Section 1: Race frequencies in the top 200

Top 200 Race
Population-----random--protoss--terran-----zerg
200--------------3------------61--------90----------46

Clearly there are a lot of terran players in the top 200, but let's not jump to any conclusions.

Race breakdown by top 200
Qrtl Random Protoss Terran Zerg Cutoff (points)
1Q 1-----------17---------26------7-----------1127
1Q 1-----------15---------22------10---------1066
1Q 0-----------13---------21------16---------1023
1Q 1-----------16---------21------13---------991
overall 3--------61--------90-------46---------991

Note: random is not considered for later analysis because of it's extremely low representation in the top 200.
Also Note: the quartiles didn't break evenly because of point ties at cutoff locations. So we have 51, 48, 50, 51 for the quartile breakdowns.

We see an increasing proportion of the terran players in the top of the top 200, while protoss remains fairly level and zerg shows the opposite trend of terran.

Section 2: Individual race analysis of the top 200

The following charts breakdown performance by each quartile of the individual race's top 200.

Protoss average points of top 200
Qrtle avg pts---size
1Q--1180.25---16
2Q--1093.87---15
3Q--1044.20---15
4Q--1005.14---15
mean------1082.49
median---1074.00

Terran average points of top 200
Qrtle avg pts---size
1Q--1177.55---22
2Q--1100.86---22
3Q--1052.09---23
4Q--1007.61---23
mean------1083.31
median---1073.00

Zerg average points of top 200
Qrtle avg pts---size
1Q--1147.75---12
2Q--1066.09---11
3Q--1037.64---11
4Q--1005.50---12
mean------1064.78
median---1049.00

Race Performance Differentials
Qrtle Pro-Zer__Ter-Zer__Ter-Pro
1Q__32.50___29.80___-2.70
2Q__27.78___34.77___7.00
3Q__6.56____14.45___7.89
4Q__-0.36____2.11____2.47
Mn__17.71___18.53___0.82
Md__25.00___24.00___-1.00

These charts show a bit more in-depth the trends we already noticed in the first section. Terran has a much larger population in the top 200, but they don't dominate in performance compared to protoss. Protoss has a better performance from its top quartile than terran, but terran outperforms protoss in each subsequent quartile. Also, protoss has a slightly higher median while terran has a slight edge on mean.

The most alarming trend is the underperformance of Zerg at this level. Zerg have nearly a 30point deficit on each of the top two quartiles in average points and has a much lower mean and median compared to the other two races.

Section 3: Race representation in top 200 from diamond

Race distribution by league
league----random--protoss--terran--zerg
diamond 4525------15725---13621--10771

Race representation of top 200 from diamond
Race-----------% in top 200 from diamond
Random-------0.066%
Protoss--------0.388%
Terran----------0.661%
Zerg------------0.427%

The only thing to point out here is that a lot more terrans perform at the top 200 level proportionally compared to the other races.

Conclusions
Terran has the lion's share of the top 200 compared to the other two races and has strong performance in each quartile of its top 200 players. Protoss, while underrepresented among top (read: diamond) players, also has strong performance while Zerg is both weak in number and performance among the top 200. Additionally, Terran has a much larger proportion of its top players in the top 200, which leads one to believe that, consistent with its trends to perform well at the top of the top 200, terran also probably performs very well at a high level, diamond.

There are many factors to consider here. But it seems reasonable from this data set to at least think that terran needs a nerf and zerg a buff. However, there could be many plausible reasons for this, from better players playing terran to the metagames being undeveloped and so on, but I think the most reasonable explanation for this correlation is that terran is imba right now and also that zerg needs a buff. We can expect to have a high amount of randomness associated with a small sample size (n=200). This analysis is very preliminary and I did not test for statistical relevance, although I hope you will see its relevance even without those helpful tools.

Other factors to consider:
win/loss ratio, games played

I hope this provides some helpful food for thought on the current balance issues.

LlamaNamedOsama

United States1900 Posts

August 19 2010 21:52 GMT

#217

On August 17 2010 17:07 GagnarTheUnruly wrote:

Show nested quote +

Science is a process by which you formulate a hypothesis based on a theory, a prior hypothesis, or an observation, and then devise a method to objectively test that hypothesis.

I heard a hypothesis that the races were inbalanced, observed a dataset that suggested that win rates were race independent, and formulated a hypothesis that win rates were race independent. I then found support for my hypothesis by analyzing a data set of 58 million cases using a time honored statistical technique. Where does that deviate from the definition of science?

Also, I'm not a layman, and despite what you may have heard, a Chi-square analysis is in fact a statistical technique. In fact, it's possibly the most widely used technique in scientific literature, particularly in advanced statistical modeling where it's used for model verification.

The definition of the scientific process includes both experimentation and observation in the acquisition of data. If you claim to know your statistics, then you should easily know that there's a clear distinction between an experiment and an observational survey, and you clearly didn't alter any of the variables.

Also, as far as I recall none of the statistics was present in the original post: after the post appears to have been edited a couple hours after my post, and as far as I know you were updating it with the actual statistic substance when I was posting. Part of your discussion reflects this and my recollection of your original statements.

For ex:
" Within those leagues, Terran has a slight advantage (see, Terran is IMBA!), meaning you’ll win about 2 games in a thousand more often than you should"

or

"A Diamond Solo Zen Master would have to play 1801 games to win as random and 1800 to win as zerg, but only 1794 to win as Protoss and 1784 to win as Terran. So if you want Terran mastery, you’ll get it in 17 fewer games than a random player!"

or

"you’d have to play about a million games before you started to notice that the races were imbalanced in the diamond league"

These are incorrect interpretations of statistics/data. For example, in the very last one, the identification of a p-value less than the alpha for statistical significance only indicates that it's probable that the initial results were not by chance, not an actual quantification of bias, just a determination that there may be some.

figq

12519 Posts

August 19 2010 22:03 GMT

#218

Even if we had real random match making to draw conclusions from, I fail to see why balance is properly measured by winning %. Let me try to explain. One race could still be significantly easier and lower skill capped, but get even winning % with the other races - in that case this race is just more "cheesy", is designed around risky all or nothing plays, which are not difficult, but also don't get imbalanced amount of wins. Meanwhile, another race could be really hard to play, but still get high enough winning % to be even with the rest - enough people are able to put enough effort to get wins. In other words, some races could be ridiculous, and other could be serious, and still the results of their win/lose ratios could be even, in a truly random match-making. Such state is officially regarded as balanced, but that is misleading.

PlagueRat

United States39 Posts

August 19 2010 22:06 GMT

#219

Cool cool but the MMS screws with your numbers pretty bad, I'd like to see percentages for match-ups that would be interesting

Hunch

Canada336 Posts

August 19 2010 22:09 GMT

#220

of course sc2 is imbalanced, there is no question what so ever, the point is that 10 years from now nothing that we talk, discuses and rage about balance will matter because the game will change, im sure that when blizz puts out their latest patch ppl are going to change their minds on what is balanced and what isn't balanced.

its just funny to me how much people stress about how balanced or unbalanced the game is, instead why dont we try and talk about what could be improved or just stfu and enjoy the game as it is right now, which some ppl wont do im sure but its just a thought.

i mean it looks like you wrote a lot of interesting stuff but after the first paragraph i kinda just skimmed the rest and looked at the nice little pictures there which im sure 80% of the posters did.

well gl with w/e your trying to do here

Prev 1 9 10 11 12 13 Next All

Please or register to reply.

Scientific proof that SC2 is imbalanced (sorta) - Page 11

Completed

Ongoing

Upcoming