Statistical Analysis of StarCraft 2 Balance - Page 3

Apokilipse

United States2 Posts

May 05 2011 05:46 GMT

#41

Very interesting to read! Most discussions about balance are simply unproductive rants, and it's fascinating to see someone take a scientific approach to documenting Starcraft balance.

d_ijk_stra

United States36 Posts

May 05 2011 05:48 GMT

#42

arbitrageur/ it was based on the parameter I estimated via statistical inference. Of course it was not a significantly large value, but there was a slight indication. I did not extrapolate ladder data. It is based on GSL statistics, but instead of using mere ZvP statistics, I took each player's skill into account.

You can still question the adequacy of my model anyways, and thus further question adequacy of my estimated parameters. But at least, those values are from data, not from my personal understanding of a game. Actually I personally think P > Z, T if skills are equal, but this is what I got.

Thrombozyt

Germany1269 Posts

May 05 2011 05:57 GMT

#43

I guess it would be better to use a different data set, as the game has vastly changed from Oktober 2010. With Steppes of War and Delta Quadrant still being in the map pool and many balance changes not being in place (roach range increase anyone?).

You cannot really group different patches together, as potential 'imbalance' from a former patch will reflect on current patches. Also by using only current data (say March 2011 and onwards) but drawing from more tourneys you actually reduce the number of maps played and therefore the number of parameters you have to determine (as each map carries 3 beta values for the matchups) from a limited set of data.

Edit:
Changing the data set would also improve the quality of the analysis, because you wouldn't have to make the assumption that the Korean style is the 'gold standard' and rather take data from all over the world avoiding local bias.

Primadog

United States4411 Posts

May 05 2011 05:57 GMT

#44

On May 05 2011 14:26 Nontrivial wrote:
Although I'm no math major what I do understand I'm quite impressed with. I do have one question though how close to this is what the balence team talked about at Blizzcon?

Here is the link to what I'm referring to: Link

This paper's approach differs from the balance team's.

d_ijk_stra's approach is to creating a statistics model for competitive StarCraft that uses only two variables: (1) player skill (2) map racial bias. He then proves that the model is a good fit for the GSL data. Finally, he asks the question: Does this model demonstrate any strong racial biase (using an average of the map racial bias variable) and concluded that there's no significant biase observed thus far.

What is significant here is that his approach uses competitive play data, which the community generally consider a better indicator of game balance compared to the ladder. Secondarily, he created a model that separated player skills and map racial preference that fits this data, which is important to study the question of whether there's an imbalance in the game.

palanq

United States761 Posts

May 05 2011 05:59 GMT

#45

this is great stuff.

are you going to do more, or was this just for a class or something? if so, you should scrape TLPD for broodwar proleague games or something, which would give you a lot more data, enough to do multi-period analysis and see how the parameter estimates change over time. plus you don't have as many inter-game dependencies that there are with best-of-X series.

aksfjh

United States4853 Posts

May 05 2011 06:01 GMT

#46

I really appreciate your work on the subject. It was done with the intent of academic integrity, and succeeded in that.

The only "beef" I have with it is the fact that it covers a rather volatile period of SC2 (with frequent patches completely changing matchups), along with a region that has been predominantly Terran based since release. Not only that, but the Protoss from that region have also failed performed on an individual basis in individual matches.

space_yes

United States548 Posts

May 05 2011 06:03 GMT

#47

An interesting read though I'm skeptical of your approach given that you're taking games from different patches and each patch changed the rules of the game. Aside from suggesting that each patch may in fact represent a different population (given that each patch is technically a different game) sampling across the patches should significantly impact the limitations described in your model (particularly conditional independence and the interactions between the players).

space_yes

United States548 Posts

May 05 2011 06:05 GMT

#48

I will add that it is nice to see someone actually doing statistics, I'm fucking tired of these "here are some numbers/graphs, now this is what I think type threads." These threads should be closed by mods and the users warned imo.

d_ijk_stra

United States36 Posts

May 05 2011 06:05 GMT

#49

On May 05 2011 14:57 Thrombozyt wrote:
I guess it would be better to use a different data set, as the game has vastly changed from Oktober 2010. With Steppes of War and Delta Quadrant still being in the map pool and many balance changes not being in place (roach range increase anyone?).

You cannot really group different patches together, as potential 'imbalance' from a former patch will reflect on current patches. Also by using only current data (say March 2011 and onwards) but drawing from more tourneys you actually reduce the number of maps played and therefore the number of parameters you have to determine (as each map carries 3 beta values for the matchups) from a limited set of data.

Edit:
Changing the data set would also improve the quality of the analysis, because you wouldn't have to make the assumption that the Korean style is the 'gold standard' and rather take data from all over the world avoiding local bias.

I strongly agree with you and 'space-yes''s comment. At the time I was conducting the analysis, it was March and I didn't have good understandings on tournaments other than GSL. Moreover, gamers in GSL were isolated from others. But I didn't have enough GSL games per each patch, so I had to aggregate them all. I also feel very uncomfortable about this.

Now the situation is a little different. There are many ongoing "global" leagues like NASL/TSL which I also enjoy to watch, thus I have more number of games worldwide and it might be enough to conduct a valid analysis. I hope I can do follow-up analysis anytime soon!

slyboogie

United States3423 Posts

May 05 2011 06:17 GMT

#50

Good read! The regression hammer comes to SC2 =) I'd like to see a larger sample size, but the methodology is fine and the interpretation is sound. Thanks for the work!

Valroth

New Zealand28 Posts

May 05 2011 06:24 GMT

#51

A lot of effort for a fundamentally flawed analysis. You say that you've taken player skill into account, which is something that cannot be measured statistically in matches between different races. Measuring player skill based on mirror matches and then using that to add/reduce weight to balance statistics in matches between different races is logically misleading. I found it interesting anyway.

GhettoSheep

United States150 Posts

May 05 2011 06:29 GMT

#52

I like how you admit that your results aren't statistically significant.

TheRabidDeer

United States3806 Posts

May 05 2011 06:30 GMT

#53

On May 05 2011 15:29 GhettoSheep wrote:
I like how you admit that your results aren't statistically significant.

There is nothing to admit, its stating a fact. Saying he admits to something makes it sound like its something bad.

Anyway, look forward to the next one! GL with all of your coursework!

EDIT: Or, I think maybe you misunderstood what statistical significance is?

d_ijk_stra

United States36 Posts

May 05 2011 06:37 GMT

#54

On May 05 2011 15:24 Valroth wrote:
A lot of effort for a fundamentally flawed analysis. You say that you've taken player skill into account, which is something that cannot be measured statistically in matches between different races. Measuring player skill based on mirror matches and then using that to add/reduce weight to balance statistics in matches between different races is logically misleading. I found it interesting anyway.

This is a good point, but well I don't think this is fundamentally flawed.

This model assumes that each player's skill is the same for every match. Well it may not be true, as we know from BW that some gamer is really good vs. specific race and sucks vs. another. But I think most gamers show coherent level of skill between games, and then overall analysis may not be that misleading. Yes, actually without such an assumption it's impossible to quantify the balance between two races...

You may still disagree with this, and then deny the results. Every statistical model makes assumptions to overcome data parsimony, and I think whether the assumption is valid or not is a constructive discussion. I think the assumption is not that strong... But it's reasonable to question it. I have some ideas about more sophisticated models to account for this... Hope I can show results soon

han_han

United States205 Posts

May 05 2011 06:43 GMT

#55

Wow, scholarly articles on Starcraft II? I am TOTALLY diggin' this.

Primadog

United States4411 Posts

May 05 2011 06:50 GMT

#56

There's not enough data points available to estimate every player's skill level in particular match-ups, but the tests he used showed that his model fits the dataset well despite this flaw. You also mischaracterized how skill is measured and used in the first place.

When you make a statistics model, you have to make certain assumptions that may not completely reflect reality. It's the nature of dealing with any large set of data. If you believe an assumption is incorrect, create a better model and demonstrate that it better fits the data. Believing that making assumptions somehow discredits a model simply shows that you have absolutely no idea how Statistics as a hard science works.

Techno

1900 Posts

May 05 2011 14:47 GMT

#57

On May 05 2011 13:54 d_ijk_stra wrote:
Techno/ Well this is what is called 'Latent Variable' method, which enables you to model which cannot be observed. It need not be defined or observed, although it's convenient to 'interpret' it that way. Actually the method of latent variable is very popular technique these days, although not covered in basic statistics courses (even in the graduate level).

I think you confused it with random effects / hierarchical model in ANOVA. You don't really need to assume latent variable to follow normal distribution. Of course, without any regularization it will overfit data, and using the assumption of normal distribution is a good way to regularize your parameters. But you can also use other types of regularization... I used L1 penalty for other reasons. However, I guess you may not want to discuss this much of technical details

I really think it would have been better if you had used win rates of certain leagues assuming skill is either non present, or normally distributed, as it is debatable that skill even exists outside of winning, and should you include skill, you should include variables like:

- Skills affect on Racial Performance
- Skills affect on this map
- Skills affect on this strategy (perhaps strategy is a part of skill, perhaps not)

I feel like skill is a very abstract concept, that cannot be precisely defined by even God. I feel like it has no place in statistical analyses. I may be wrong, but that's just my thoughts. I mean no disrespect to your report, in fact I respect it.

Primadog

United States4411 Posts

May 05 2011 20:04 GMT

#58

Skill as a normally distributed variable that influence win-rate is the foundemental part of games and sports ratings dating back to the beginnings of Chess ELO. Every ELO, true-skill, or computerize/holistic-ranking system you see in major sports and gaming sites are based on the concept of skill as a measurable variable. There's nothing innovative or surprising about this assumption.

awesomoecalypse

United States2235 Posts

May 05 2011 20:12 GMT

#59

On May 06 2011 05:04 Primadog wrote:
Skill as a normally distributed variable that influence win-rate is the foundemental part of games and sports ratings dating back to the beginnings of Chess ELO. Every ELO, true-skill, or computerize/holistic-ranking system you see in major sports and gaming sites are based on the concept of skill as a measurable variable. There's nothing innovative or surprising about this assumption.

this is true, but all these assumptions correlate winrate to skill, which is something some players dispute. a guy like IdrA would argue that cheesy players are "unskilled" even when they win, something formula would clearly dispute.

But, as someone who thinks that mindset is counterproductive nonsense, and that a win is a win, I'm all for this system.

hypnobean

89 Posts

May 05 2011 20:20 GMT

#60

Anyone notice the paper identifies Jinro's race as Protoss?

Prev 1 2 3 4 5 6 7 Next All

Please or register to reply.

Statistical Analysis of StarCraft 2 Balance - Page 3

Completed

Ongoing

Upcoming