Statistical Analysis of StarCraft 2 Balance - Page 5

SlipperySnake

248 Posts

May 06 2011 16:01 GMT

#81

I really enjoyed your model and I look forward to you improving it and maybe adding variables to better estimate match outcomes. It would be great to see more than just GSL ran through this sort of a model but I understand it would be a ton of work. Maybe one solution is to have people email you data in a form that you can use or partner with a few spectators to keep track of game stats.

I mean someone just needs to have and excel workbook open and type in the things you measured so that you wouldn't have to go through it all. Anyways I look forward to any future analysis, I feel like this was a damn good start at estimating balance. Thanks.

Mactator

109 Posts

May 06 2011 16:49 GMT

#82

The imbalance issue is not necessarily related to probability of a player winning. The usual notion of "imbalance" refers to specific issues rather than xvy being imbalanced.

If we consider the notorious example of the protoss death ball then many people complains that zerg players can't win against it. Let's assume that is correct. Then the obvious thing to do if you are a zerg player is to avoid getting into late game against a protoss. This may be a very effective strategy and you may even measure a high probability for zerg players to win. The game would still be imbalanced though!

Therefore the question of imbalance is a matter of strategies. To quantify it you need to consider a specific case of imbalance. If for example you can prove statistically that zvp never goes in to late game and if it does then protoss has an extreme win-loss ratio then you can conclude either that
1) there is an imbalance issue
or
2) zerg players are bad at playing late game.

ffdestiny

United States773 Posts

May 06 2011 17:01 GMT

#83

Quoting Day9: "You can't really talk about balance before you take a hell of a lot of time analyzing the data." Unfortunately, your article has 5 references, uses a small sample size and jumps to conclusions based on your model. Obviously you're industrious and want to prove a point, but without just statistically gathering the entire database of games played (on this patch) there is no room for an argument on balance. Also, balance is so tied to maps that it almost nearly becomes a moot point to measure racial imbalances rather than map imbalances. There are just so many factors.

How do you measure balance in terms of the whole game? If we measure balance using data by pro players that doesn't indicate the whole game, but the subset.

How do you measure balance in terms of a race? If we measure by race then how do we correlate that data to maps.

How do you measure balance in terms of games? If we measure imbalance by games then how do we account for strategies that are intended to kill the opponent before he or she has expansions--cheesing or all-ins.

How do you measure balance by all of the above? If we measure imbalance by the whole game, the race and the games then how do those relate to one and another, because if we analyze all of the data and it comes up with a statistical win ratio favoring the zerg but then our subsets of data prove that zerg is weaker on certain maps, against certain strategies, etc. this totally negates our first assumption.

You see it's almost pointless to try and argue imbalance?

Lingy

England201 Posts

May 06 2011 17:03 GMT

#84

IMO there is no way toss is better than zerg, i dont care what the stats say

d_ijk_stra

United States36 Posts

May 06 2011 17:25 GMT

#85

On May 07 2011 02:01 ffdestiny wrote:
Quoting Day9: "You can't really talk about balance before you take a hell of a lot of time analyzing the data." Unfortunately, your article has 5 references, uses a small sample size and jumps to conclusions based on your model. Obviously you're industrious and want to prove a point, but without just statistically gathering the entire database of games played (on this patch) there is no room for an argument on balance. Also, balance is so tied to maps that it almost nearly becomes a moot point to measure racial imbalances rather than map imbalances. There are just so many factors.

How do you measure balance in terms of the whole game? If we measure balance using data by pro players that doesn't indicate the whole game, but the subset.

How do you measure balance in terms of a race? If we measure by race then how do we correlate that data to maps.

How do you measure balance in terms of games? If we measure imbalance by games then how do we account for strategies that are intended to kill the opponent before he or she has expansions--cheesing or all-ins.

How do you measure balance by all of the above? If we measure imbalance by the whole game, the race and the games then how do those relate to one and another, because if we analyze all of the data and it comes up with a statistical win ratio favoring the zerg but then our subsets of data prove that zerg is weaker on certain maps, against certain strategies, etc. this totally negates our first assumption.

You see it's almost pointless to try and argue imbalance?

First of all, the analysis takes the effect of map into account,
thus actually this analysis can be thought of as "DO WE HAVE BALANCED MAPS?",
and try to see how many P>Z imba or T>Z imba maps there are, so on.

Secondly, I understand you feel uncomfortable with statistical analysis.
Say, there are 50 students in the class. Let's say the mean of heights is 170cm.
What does it talk about the individual person? Nothing. Any student in the class
can be 150cm tall, or 200cm tall. However, the mean itself is still not meaningless.
To gain information, we sometimes have to find out what is the clever way of
summarizing things. Of course the more complex the situation is, the harder
and less intuitive the statistics become.

If you think statistical analysis explains the detail of EVERY GAME,
I think you are misled. That is not the point of conducting an analysis.
The point is to find out whether there is an overall trend.
In one game, a Terran gamer can cheese a Zerg gamer.
However, can he do it in every game? Absolutely not.
But there are maps that a cheese can be succeed in high probability (ex: steppes of war).
In such a case, it is not hard to see there is a balance issue. (ex: the infamous Mercury map in BW)

Argolis

Canada211 Posts

May 06 2011 17:26 GMT

#86

Well done. Stats are always fun, not so much as proof of anything because they can always be argued, but because numbers are fun.

d_ijk_stra

United States36 Posts

May 06 2011 17:37 GMT

#87

On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".
(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.

First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference

2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

d_ijk_stra

United States36 Posts

May 06 2011 17:42 GMT

#88

On May 07 2011 01:49 Mactator wrote:
The imbalance issue is not necessarily related to probability of a player winning. The usual notion of "imbalance" refers to specific issues rather than xvy being imbalanced.

If we consider the notorious example of the protoss death ball then many people complains that zerg players can't win against it. Let's assume that is correct. Then the obvious thing to do if you are a zerg player is to avoid getting into late game against a protoss. This may be a very effective strategy and you may even measure a high probability for zerg players to win. The game would still be imbalanced though!

Therefore the question of imbalance is a matter of strategies. To quantify it you need to consider a specific case of imbalance. If for example you can prove statistically that zvp never goes in to late game and if it does then protoss has an extreme win-loss ratio then you can conclude either that
1) there is an imbalance issue
or
2) zerg players are bad at playing late game.

To make a change in a game as a patch, you are definitely correct.
However, there ARE imbalances sometimes.
If you watched BW for a long time, do you remember the infamous map Mercury?
What was the P

score? Did P win more than 2 games in that map?

From game to game, yes there are differences.
Even July was defeated in Mercury in OSL final.
However, everyone who has been playing SC1/SC2 for long time KNOWS that
certain maps REQUIRE PLAYERS of certain race to do things x, y, z,....
and thus it leads to imbalance issues.

latan

740 Posts

May 06 2011 17:50 GMT

#89

I like your initiative but this analysis is almost a joke. badly written, poorly justified and pretty much naive for something that tries to pass as a scientific paper, I only say this because i don't like that things like this are on arxiv.

I would rather it being limited to discussing possible statistical models and methods to approach the problem.

Elean

689 Posts

May 06 2011 18:02 GMT

#90

On May 07 2011 02:37 d_ijk_stra wrote:

Show nested quote +

2) you should've not read it as a professional reviewer

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.

Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.

Cheerio

Ukraine3178 Posts

May 06 2011 18:17 GMT

#91

On May 05 2011 10:21 professorjoak wrote:
Data set had only about ~620 nonmirror games in it. It would be interesting to use this methodology on the Brood War TSL Season 1 and 2 full ladder replay packs, which have several times more data in them.

I looked into trying a statistical analysis for TSL Season 1 at one point to see if the distribution of build orders on a map had any correlation with win percent. A first glance at the data showed all matchups on any map where I had 100+ games in that specific map and matchup balanced within 52-48. (Which is different than the Korean results in the TLPD which usually split 60-40 or 55-45, though those are based on far fewer games). However, I then realized the data set had many duplicate games from a game between two top ladder players being counted in each player's replay pack and decided it would be too much trouble to properly sort them out so I quit there and didn't take the analysis much further.

well what's wrong with duplicates? It's not like in a replay from the opposite player the winner would somehow change. Even if many replays are duplicated and many are not it is still ok as long as duplication is random (though it can hurt the result it's much more probable the difference would be minor)

Mactator

109 Posts

May 06 2011 19:38 GMT

#92

On May 07 2011 02:42 d_ijk_stra wrote:

Show nested quote +

To make a change in a game as a patch, you are definitely correct.
However, there ARE imbalances sometimes.
If you watched BW for a long time, do you remember the infamous map Mercury?
What was the P

You are right about maps being important. Some maps can be abused if you are playing a specific race but I don't think that is the issue that frustrates people.

It would be nice to have a homepage where you for a specific patch could see things like 1) the average time (perhaps with standard deviation) played for a specific map and races (x vs y) 2) the most popular units/army composition in early, mid and late game i.e. at a specific time, 3) correlation plots etc.. It would also be good to have the division or tournaments such as GSL, MLG etc. as a variable. Like sc2ranks although with different data.

This would add some useful data to the discussion about imbalance and strategy.

tdt

United States3179 Posts

May 06 2011 19:41 GMT

#93

Don't know stats but I believe it. When blizzz used to release numbers it showed same with P on short end. When you look at tipsy tops of ladders Terran just dominate everywhere. When you combine bunches of tournaments Terran is on top.

Maybe terrans just better skilled though? How do you know?

Saying Terran is IMBA It's like saying basketball is imbalanced towards USA rather than we have better players. No?

I prefer too look at individual strategies instead. If something can not be beaten like 3 50 DPS VR in Zergs base early and nothing you can do about it, that's imbalanced so it was patched.

Everything else, including these stats, IMO is just whining and could be just as well be attributed to superior/inferior play if we step back and look objectively with neutral glasses on.

d_ijk_stra

United States36 Posts

May 06 2011 20:54 GMT

#94

On May 07 2011 03:02 Elean wrote:

Show nested quote +

By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

Elean

689 Posts

May 07 2011 06:31 GMT

#95

On May 07 2011 05:54 d_ijk_stra wrote:

Show nested quote +

On May 07 2011 03:02 Elean wrote:

On May 07 2011 02:37 d_ijk_stra wrote:

2) you should've not read it as a professional reviewer

By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.

d_ijk_stra

United States36 Posts

May 07 2011 14:47 GMT

#96

On May 07 2011 15:31 Elean wrote:

Show nested quote +

On May 07 2011 05:54 d_ijk_stra wrote:

On May 07 2011 03:02 Elean wrote:

On May 07 2011 02:37 d_ijk_stra wrote:

2) you should've not read it as a professional reviewer

By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

That part is already (implicitly) mentioned by other users. If there is NO MIRROR MATCH, your point is right. The existence of mirror match enables you to do such an analysis.

Of course, (as another user already pointed out) you can question it. Every gamer may have different levels of skill depending on the race of his/her opponent. But I think this assumption itself is not too strong to make everything nonsense: we know that most top level players are also good at mirror matches.

d_ijk_stra

United States36 Posts

May 07 2011 14:56 GMT

#97

On May 07 2011 15:31 Elean wrote:

Show nested quote +

On May 07 2011 05:54 d_ijk_stra wrote:

On May 07 2011 03:02 Elean wrote:

On May 07 2011 02:37 d_ijk_stra wrote:

2) you should've not read it as a professional reviewer

By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

Oh, and it seems like you missed this point: every beta_player of each user is ALSO penalized by LASSO. This is a very important point, but I thought when I say LASSO everyone would imagine every variable is being penalized. Isn't it the usual case? I think not penalizing certain variables is an exceptional case when using LASSO.

Elean

689 Posts

May 07 2011 15:06 GMT

#98

On May 07 2011 23:47 d_ijk_stra wrote:

Show nested quote +

On May 07 2011 15:31 Elean wrote:

On May 07 2011 05:54 d_ijk_stra wrote:

On May 07 2011 03:02 Elean wrote:

On May 07 2011 02:37 d_ijk_stra wrote:

2) you should've not read it as a professional reviewer

By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

Obviously, mirror matches change nothing. My exemple still stands with an extremely large number of mirror matches.

I didn't say that everything was nonsense. Your model is probably good to estimate the odds of a match, or to tell which player is the best within one race. However it is completely blind to balance.

Elean

689 Posts

May 07 2011 15:07 GMT

#99

On May 07 2011 23:56 d_ijk_stra wrote:

Show nested quote +

On May 07 2011 15:31 Elean wrote:

On May 07 2011 05:54 d_ijk_stra wrote:

On May 07 2011 03:02 Elean wrote:

On May 07 2011 02:37 d_ijk_stra wrote:

2) you should've not read it as a professional reviewer

By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

Yeah of course, every parameter is constrained. This is why in my example, where all the players have the same strenght (skill + race balance), your model will set all the parameters to 0, despite the unbalance.

FoxNews

1 Post

May 07 2011 15:18 GMT

#100

Nice work! i've always been interested in doing a statistical study myself, but i have yet to take stats--lol in hs it was either stats or calc.. It's also refreshing to see another Cornellian on here. I'm a freshman undergrad myself planning on majoring in physics. Anyway, nice work, and don't listen to the haters who couldn't have done a study like this in the first place.
Keep up the good work!
Also, did you go see nelly? lol he's so bad.

Prev 1 2 3 4 5 6 7 Next All

Please or register to reply.

Statistical Analysis of StarCraft 2 Balance - Page 5

Completed

Ongoing

Upcoming