• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 08:46
CEST 14:46
KST 21:46
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL19] Finals Preview: Daunting Task28[ASL19] Ro4 Recap : The Peak15DreamHack Dallas 2025 - Info & Preview19herO wins GSL Code S Season 1 (2025)17Code S RO4 & Finals Preview: herO, GuMiho, Classic, Cure6
Community News
[BSL20] RO20 Group Stage2EWC 2025 Regional Qualifiers (May 28-June 1)9Weekly Cups (May 12-18): Clem sweeps WardiTV May3Code S Season 2 (2025) - Qualifier Results212025 GSL Season 2 (Qualifiers)14
StarCraft 2
General
Aligulac.com changelog and feedback thread Interview with oPZesty on Cheeseadelphia/Coaching herO wins GSL Code S Season 1 (2025) DreamHack Dallas 2025 - Info & Preview Power Rank: October 2018
Tourneys
DreamHack Dallas 2025 EWC 2025 Regional Qualifiers (May 28-June 1) Last Chance Qualifiers for OlimoLeague 2024 Winter $5,100+ SEL Season 2 Championship (SC: Evo) StarCraft Evolution League (SC Evo Biweekly)
Strategy
Simple Questions Simple Answers [G] PvT Cheese: 13 Gate Proxy Robo
Custom Maps
[UMS] Zillion Zerglings
External Content
Mutation # 474 Futile Resistance Mutation # 473 Cold is the Void Mutation # 472 Dead Heat Mutation # 471 Delivery Guaranteed
Brood War
General
BGH auto balance -> http://bghmmr.eu/ [ASL19] Finals Preview: Daunting Task ASL 19 Tickets for foreigners [ASL19] Ro4 Recap : The Peak BW General Discussion
Tourneys
[ASL19] Grand Finals [ASL19] Ro8 Day 4 [BSL20] RO20 Group Stage [BSL20] RO20 Group A - Sunday 20:00 CET
Strategy
I am doing this better than progamers do. [G] How to get started on ladder as a new Z player
Other Games
General Games
Path of Exile Stormgate/Frost Giant Megathread Battle Aces/David Kim RTS Megathread Nintendo Switch Thread Beyond All Reason
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
LiquidLegends to reintegrate into TL.net
Heroes of the Storm
Simple Questions, Simple Answers
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia TL Mafia Community Thread TL Mafia Plays: Diplomacy TL Mafia: Generative Agents Showdown Survivor II: The Amazon
Community
General
Russo-Ukrainian War Thread US Politics Mega-thread Things Aren’t Peaceful in Palestine European Politico-economics QA Mega-thread Trading/Investing Thread
Fan Clubs
Serral Fan Club
Media & Entertainment
[Manga] One Piece Movie Discussion! Anime Discussion Thread
Sports
2024 - 2025 Football Thread Formula 1 Discussion NHL Playoffs 2024 NBA General Discussion
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread Cleaning My Mechanical Keyboard How to clean a TTe Thermaltake keyboard?
TL Community
The Automated Ban List TL.net Ten Commandments
Blogs
Yes Sir! How Commanding Impr…
TrAiDoS
Poker
Nebuchad
Info SLEgma_12
SLEgma_12
SECOND COMMING
XenOsky
WombaT’s Old BW Terran Theme …
WombaT
BW PvZ Balance hypothetic…
Vasoline73
Customize Sidebar...

Website Feedback

Closed Threads



Active: 14766 users

Statistical Analysis of StarCraft 2 Balance

Forum Index > SC2 General
Post a Reply
Normal
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 00:41 GMT
#1
This is my first post on teamliquid

I've written a short article on StarCraft 2 balance!
I hope SC2 geeks especially those have science/engineering background would enjoy it...!

http://arxiv.org/abs/1105.0755

The analysis states that T > P, T > Z, and Z > P while the statistical significance is not very strong. Any feedback would be welcome.

Only recently I've found that there has been other statistical researches/discussions on teamliquid. Sorry for not citing them in this version, and I'm planning to update this article with references in the future.

The difference of this article and previous one is that here I tried to take each gamer's ability into account. That is, oGsMC dominating Terran gamers does not necessarily mean that P >> T, since it could be that actually TvP balance is quite good but MC is just too strong. By taking each gamer's ability into account (using what statisticians call 'latent variable'), I think I resolved this problem.

I hope people enjoy it! :D
My Life for IU!
Barca
Profile Blog Joined October 2010
United States418 Posts
May 05 2011 00:45 GMT
#2
I have confirmed your results by looking through my match history.
- I hate threads that end with "Thoughts?" -
awesomoecalypse
Profile Joined August 2010
United States2235 Posts
May 05 2011 00:52 GMT
#3
Very interesting that everyone cries about Protoss being too strong, yet not one statistical analysis backs it up in any way. Thanks very much for posting this, and welcome to TL.
He drone drone drone. Me win. - ogsMC
Archas
Profile Blog Joined July 2010
United States6531 Posts
May 05 2011 00:53 GMT
#4
Nice first post. Even at a glance, this is very interesting stuff. Good work, and welcome to TeamLiquid.
The room is ripe with the stench of bitches!
Malpractice.248
Profile Blog Joined November 2010
United States734 Posts
May 05 2011 00:54 GMT
#5
Wait, you think Z > P, at all levels, or on average?
At top lvl, its quite a diff story...
Zeke50100
Profile Blog Joined February 2010
United States2220 Posts
Last Edited: 2011-05-05 01:05:46
May 05 2011 00:56 GMT
#6
A thread with "Statistical Analysis" in the title with an actual statistical analysis within? Impossible!

This is pretty cool to look at. Unfortunately, the field is a bit too dynamic and littered with confounding variables that it's hard to pin-point any valid conclusions, but it does show that overall, there's not much evidence to say there's something horrendously wrong. 55% win rate does not mean imbalanced, contrary to popular belief XD
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 00:58 GMT
#7
I've been keep reading TL articles, not just posting any

Malpractice/ Yes in my experience I feel like P > Z... But the technique I used takes skill of every gamer into account, so I think possible explanation is that low level Protoss players lose a lot to Zergs, while small number of top level Protoss players are dominating Zerg players. You've probably seen gamers like IMLosira easily dominating GSL code A Protoss gamers...
My Life for IU!
HolyArrow
Profile Blog Joined August 2010
United States7116 Posts
May 05 2011 00:59 GMT
#8
This is probably one of the most (if not the most) academically rigorous analyses yet, so it'll be interesting to see how people who think P is imba will spin this... I envision many "you're just too academic and not looking at reality" arguments.
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 01:01 GMT
#9
Zeke50100/ Yes, I strongly agree with you Maybe this kind of analysis is more adequate for stabilized BW, but its balance does not interest me anymore :D

Yes, I've seen too many posts which contains 'statistics' in its title with no real serious statistics in it, so I wanted to do one myself
My Life for IU!
Malpractice.248
Profile Blog Joined November 2010
United States734 Posts
May 05 2011 01:01 GMT
#10
Well yes, losira was dominating them, but had a harder time vs code S ones, his skill was just code S material :p
dave333
Profile Joined August 2010
United States915 Posts
May 05 2011 01:04 GMT
#11
Very interesting and it is hard to argue against the analysis, but I guess the only hole is still the lack of enough games played to have statistical significance.
setmeal
Profile Joined March 2011
162 Posts
May 05 2011 01:04 GMT
#12
great thread, do keep them coming. I would love to see more of such analysis in the future!
Dommk
Profile Joined May 2010
Australia4865 Posts
May 05 2011 01:09 GMT
#13
That is, oGsMC dominating Terran gamers does not necessarily mean that P >> T, since it could be that actually TvP balance is quite good but MC is just too strong. By taking each gamer's ability into account (using what statisticians call 'latent variable'), I think I resolved this problem.


I think this is what Liquid`Tyler was saying a few months ago, there are enough pro-gamers and play styles right now that we should be focusing on the player and not the race
PopcornColonel
Profile Joined March 2011
United States769 Posts
May 05 2011 01:14 GMT
#14
Lol cool. I'm looking at Cornell.
Zerg delenda est.
mikyaJ
Profile Joined April 2011
1834 Posts
Last Edited: 2011-05-05 01:19:19
May 05 2011 01:14 GMT
#15
On May 05 2011 09:54 Malpractice.248 wrote:
Wait, you think Z > P, at all levels, or on average?
At top lvl, its quite a diff story...

yea, zergs are winning even more:
[image loading]

jk that's only Korea, Zergs are ONLY winning 51% of the games versus Protoss in the foreign scene:
[image loading]
MKP||TSL
JerKy
Profile Blog Joined January 2011
Korea (South)3013 Posts
May 05 2011 01:15 GMT
#16
That was a really good read, good work!
You can type "StarCraft" with just your left hand.
professorjoak
Profile Joined July 2008
318 Posts
May 05 2011 01:21 GMT
#17
Data set had only about ~620 nonmirror games in it. It would be interesting to use this methodology on the Brood War TSL Season 1 and 2 full ladder replay packs, which have several times more data in them.

I looked into trying a statistical analysis for TSL Season 1 at one point to see if the distribution of build orders on a map had any correlation with win percent. A first glance at the data showed all matchups on any map where I had 100+ games in that specific map and matchup balanced within 52-48. (Which is different than the Korean results in the TLPD which usually split 60-40 or 55-45, though those are based on far fewer games). However, I then realized the data set had many duplicate games from a game between two top ladder players being counted in each player's replay pack and decided it would be too much trouble to properly sort them out so I quit there and didn't take the analysis much further.
"The different branches of Arithmetic -- Ambition, Distraction, Uglification, and Derision." --Lewis Carroll
JKira
Profile Blog Joined April 2011
Canada1002 Posts
May 05 2011 01:28 GMT
#18
Can't seem to open the PDF, but from what I gather the data taken for this analysis are from major tournaments in the first half of April? I ask because it's not in the OP
zeek0us
Profile Joined October 2010
United States67 Posts
May 05 2011 01:32 GMT
#19
LOL SC2 on the arXiv??!?!? Nice. I don't usually peruse the statistics section, but maybe I should.
Mordiford
Profile Joined April 2011
4448 Posts
Last Edited: 2011-05-05 01:35:59
May 05 2011 01:35 GMT
#20
Yeah, I think it's important to note that a sample size under 1000 games may not be very accurate for something like this where variation does occur based on just players performances and all the shit that influences them.

The graphs posted above by mikyaJ I feel are a little more viable, particularly the foreign tournament one, considering that it's based on a fairly large sample size of over 8000 games. Based on that, it would seem as though the game is fairly balanced in terms of win/loss.

The korean sample is much smaller because of the number of tournaments played, it's basically just a representative of the GSL right now, if I'm not mistaken.
MusiK
Profile Joined August 2010
United States302 Posts
May 05 2011 01:36 GMT
#21
Don't understand a thing about stats except that the calculator is ur best friend... buttttt I am impressed how fancy this looks.

Hoping to hear from some other stats fanatics confirm or deny these claims.

Cool graphs tho bro. =]
BOOM!!! ~ Tasteless
duk3
Profile Joined September 2010
United States807 Posts
May 05 2011 01:36 GMT
#22
The analysis is based off of 852 GSL games from October 2010 to March 2011.

Interesting read. How did you determine the B of each player?
Time flies like an arrow; fruit flies like a banana.
Mordiford
Profile Joined April 2011
4448 Posts
May 05 2011 01:41 GMT
#23
In a larger sample size, Starcraft 2 looks pretty fucking balanced in the foreigner scene.
wherebugsgo
Profile Blog Joined February 2010
Japan10647 Posts
May 05 2011 02:04 GMT
#24
So I kinda just skimmed the PDF, and I just wanted to say that there seem to be a lot of basic errors in spelling, grammar, etc. that make the paper look less professional.

You might want to go back and fix these things. (I actually haven't taken the time to fully understand the statistics, so I apologize in advance for not providing further insight other than the presentation of the paper)

example: in Appendix A the italicized names of Protoss and Terran are spelled wrong.
Drowsy
Profile Blog Joined November 2005
United States4876 Posts
Last Edited: 2011-05-05 02:39:42
May 05 2011 02:35 GMT
#25
On May 05 2011 09:54 Malpractice.248 wrote:
Wait, you think Z > P, at all levels, or on average?
At top lvl, its quite a diff story...

The sample is 852 GSL games, but it goes back to October and obviously there were patches.


I found it a pretty good read and it points to the game being relatively well balanced.
Our Protoss, Who art in Aiur HongUn be Thy name; Thy stalker come, Thy will be blunk, on ladder as it is in Micro Tourny. Give us this win in our daily ladder, and forgive us our cheeses, As we forgive those who play zerg against us.
Techno
Profile Joined June 2010
1900 Posts
May 05 2011 04:44 GMT
#26
On May 05 2011 09:41 d_ijk_stra wrote:
The difference of this article and previous one is that here I tried to take each gamer's ability into account. That is, oGsMC dominating Terran gamers does not necessarily mean that P >> T, since it could be that actually TvP balance is quite good but MC is just too strong. By taking each gamer's ability into account (using what statisticians call 'latent variable'), I think I resolved this problem.

Uhhh unfortunetly this sounds brutal. You cannot define skill. Period. You should have assumed it's normally distributed among the population.
Hell, its awesome to LOSE to nukes!
Al Bundy
Profile Joined April 2010
7257 Posts
May 05 2011 04:51 GMT
#27
I'm no statistic expert, so please forgive my stupidity; I don't understand how you calculated β
Is it explained somewhere? I may have missed it
o choro é livre
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 04:54 GMT
#28
Techno/ Well this is what is called 'Latent Variable' method, which enables you to model which cannot be observed. It need not be defined or observed, although it's convenient to 'interpret' it that way. Actually the method of latent variable is very popular technique these days, although not covered in basic statistics courses (even in the graduate level).

I think you confused it with random effects / hierarchical model in ANOVA. You don't really need to assume latent variable to follow normal distribution. Of course, without any regularization it will overfit data, and using the assumption of normal distribution is a good way to regularize your parameters. But you can also use other types of regularization... I used L1 penalty for other reasons. However, I guess you may not want to discuss this much of technical details
My Life for IU!
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 04:56 GMT
#29
AlBundy/ If you're familiar with linear regression/ANOVA, you estimate unknown parameter using maximum likelihood estimation, right? The same thing here. You put every gamer's ability to play as an unknown parameter, and then try to estimate it by maximizing the likelihood function. This is somewhat similar to estimate the mean of a group in ANOVA model, but a little more complicate than that.
My Life for IU!
Primadog
Profile Blog Joined April 2010
United States4411 Posts
May 05 2011 05:07 GMT
#30
Finally, somebody that actually knows statistics instead of 2-bit chums like me! Gonna grab some premium tea and try to wrap my mind around this.
Thank God and gunrun.
Saechiis
Profile Blog Joined May 2010
Netherlands4989 Posts
May 05 2011 05:13 GMT
#31
Cool read man!

I'd definitely add a note that it's an approximation of racial balance as seen in GSL games played up until March. It doesn't say anything about racial balance now, so you can save this thread a whole lot of "I told you X is OP and Y UP, this proves it!".
I think esports is pretty nice.
Carkis
Profile Joined August 2010
Canada302 Posts
May 05 2011 05:14 GMT
#32
its nice to see a serious well thought out balance thread not just Z QQ lol but seriously really interesting stuff I hope you do follow up ones with progression
Quochobao
Profile Joined October 2010
United States350 Posts
May 05 2011 05:23 GMT
#33
I appreciate the effort of the OP!

But to everyone else, the statistical significance is LOW, meaning that there is too much variance, and the result may be MERE CHANCE.

Please don't say anything about balance based on this >.<
Best or nothing.
Nontrivial
Profile Joined April 2011
United States56 Posts
May 05 2011 05:26 GMT
#34
Although I'm no math major what I do understand I'm quite impressed with. I do have one question though how close to this is what the balence team talked about at Blizzcon?

Here is the link to what I'm referring to: Link
TheRabidDeer
Profile Blog Joined May 2003
United States3806 Posts
May 05 2011 05:27 GMT
#35
On May 05 2011 13:44 Techno wrote:
Show nested quote +
On May 05 2011 09:41 d_ijk_stra wrote:
The difference of this article and previous one is that here I tried to take each gamer's ability into account. That is, oGsMC dominating Terran gamers does not necessarily mean that P >> T, since it could be that actually TvP balance is quite good but MC is just too strong. By taking each gamer's ability into account (using what statisticians call 'latent variable'), I think I resolved this problem.

Uhhh unfortunetly this sounds brutal. You cannot define skill. Period. You should have assumed it's normally distributed among the population.

You can define skill, and when working with a definite set of games with a limited number of players you probably should define skill. He is working within the confines of probably 50-100 players, since he is using GSL games. While that is "statistically" a large enough sample size, it could easily produce errors when there are so many other variables included. Example: some players suck at a certain map or matchup which skews the data.

This brings up a question to the OP though:
I know you calculated player skill and map balance, but did you attribute a certain players matchup skill and their ability for a certain map? Or is there not enough data or is that too complex a situation to work out the numbers?

ie: MC may have a very strong PvT on xel'naga but a very weak PvT on tal'darim (dont know if its true, just an example). A different map may require different strategies in a certain matchup that the player isnt necessarily as good with.
Zedders
Profile Blog Joined April 2010
Canada450 Posts
May 05 2011 05:33 GMT
#36
alright I've had enough of these graphs popping up everywhere and people statingthat the game is 'balanced now' because it's like 50/50/50....

It'd be interesting to see what the average game length is over time as well. Since cheeses have changed a lot since the game started (5rax reaper and whatnot), people have a) learned to deal with cheeses all-ins more adequately and b) developed more late game strategies, the games are probably as a result, longer.

It isn't surprising to see that terran was so dominant at the beginning because of the number of people that started out playing terran. If i recall...the first GSL was vastly Terran populated. Not to mention vastly cheese populated too

Terran of course having the strongest tier one unit, the marine, had (has? i'm not sure anymore) the strongest early game. We all of course remember the BitByBit strategy (essentially all-inning...and if that all in doesnt work...all in again....and if that doesnt work...all in again...rinse and repeat).

Since terran had the strongest early game...the game ended fast because cheeses were so powerful/prevalent. Therefore Terran won a lot.

The games are getting longer now.... this of course results in more and more mistakes made by each player. Balance, in my opinion, should be weighted on how many mistakes the player can make in proportion to the other player's mistakes. What I mean by this is if one player makes less mistakes in his game decisions, he should ultimately win in a long game.

Why you ask? Because Starcraft 2 is a game of decisions. And the longer the game goes on, the more decisions must be made. The more decisions that are made, the more mistakes there are, which should result in the degree of separation that makes one player better than the other.

In context...let's say X race gets supply blocked 2 times (common macro mistake) but Y race never gets supply blocked. Y race then as a result has a larger army, larger economy etc. X race still wins simply because the units he made counter the units Y race made. Ok...this isn't imbalance...this is strategy right? Y makes a larger mistake by not scouting X and as a result his units crumble to X's.

So we've established that theres different TYPEs of mistakes one can make. And some mistakes are weighed less than others. But at what point do these mistakes balance. What if X can get supply blocked twice, not scout opponent's army (+more mistakes) and still win.

The severity of one race's total mistakes should not be much larger than another's. Ultimately I'd like to see X -not- win and I hope you agree with me, because X is clearly not the better player, his race is.

--------------back to the graphs.....
Ok so these graphs are representations of both races making an equal amount of mistakes since they are pros, and we are assuming that most pros compete at the same skill level regardless of race.

So the degree of seperation of skill because of the mistakes that are made should be negligible.

To sum up a little.......

The average game length has increased (I'm pretty sure of this considering map size, cheese prevalance, spawn points).

More game length means more potential for mistakes. Ultimately as e-sports fans, we want to see the better player win. This means the player that made the right call at the right time, with the right micro, while maintaining the right macro.

Now it's super important to note...these graphs don't display anything about HOW the games were won.

Looking at T v P... you might think "oh look it's balanced now because it's 50%/50% wins now"

November2010 to jan 2011....Terran cheese prevails until protoss finally learns how to stop it (or they patched whatever). The game was balanced in january 2011 because Protoss learned how to stop strong terran all-ins? (the emergence of a 'safe build' to gain eco lead was developed)

this isn't balance, this is metagame development, meaning half the people that are trying the old strategies that used to work 60% of the time, failed a lot. And the other half that realized this, tried new strategies (and not as developed and therefore not as good) won because it was something their opponent hadn't seen before. yay for meta game development!
huameng
Profile Blog Joined April 2007
United States1133 Posts
May 05 2011 05:35 GMT
#37
How did you estimate the Beta values? Sorry if it's in the paper and I missed it, but I didn't see anything.

Thanks for doing this by the way, it's really interesting.
skating
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 05:39 GMT
#38
Nontrivial/ I looked at the slides before, but even for me it was not easy to decipher. It's clear they're using some kind of Bayesian statistics, but I think there's some error in the math equation in the slide. I hope Blizzard will open more information about it.

TheRabidDear/ This is a good point, and I strongly agree that gamer-map interaction should be taken into account. It really needs to be done, but as an initial investigation I wanted to keep the model simple and stopped at this point.

Well I have a lot more ideas to make this model much more realistic, but as a graduate student I need to work on the project which pays me... I did it during the spring break

If there's adequate conference/journal then maybe I can spend more time on it... If anyone has some idea, please let me know.
My Life for IU!
arbitrageur
Profile Joined December 2010
Australia1202 Posts
May 05 2011 05:42 GMT
#39
You commit an error in this paper:

"It is Xel'Naga Caverns, and it turns
out that the map favors Zerg slightly over Terran"

Says who? Says you? You realised ladder data cannot be extrapolated to the highest levels because the vast majority of people on ladder do not know how to play.
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 05:42 GMT
#40
On May 05 2011 14:35 huameng wrote:
How did you estimate the Beta values? Sorry if it's in the paper and I missed it, but I didn't see anything.

Thanks for doing this by the way, it's really interesting.


huameng/ you may refer to my comment to Albundy
My Life for IU!
Apokilipse
Profile Joined April 2011
United States2 Posts
May 05 2011 05:46 GMT
#41
Very interesting to read! Most discussions about balance are simply unproductive rants, and it's fascinating to see someone take a scientific approach to documenting Starcraft balance.
Audi > Peugeot
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 05:48 GMT
#42
arbitrageur/ it was based on the parameter I estimated via statistical inference. Of course it was not a significantly large value, but there was a slight indication. I did not extrapolate ladder data. It is based on GSL statistics, but instead of using mere ZvP statistics, I took each player's skill into account.

You can still question the adequacy of my model anyways, and thus further question adequacy of my estimated parameters. But at least, those values are from data, not from my personal understanding of a game. Actually I personally think P > Z, T if skills are equal, but this is what I got.
My Life for IU!
Thrombozyt
Profile Blog Joined June 2010
Germany1269 Posts
Last Edited: 2011-05-05 05:59:01
May 05 2011 05:57 GMT
#43
I guess it would be better to use a different data set, as the game has vastly changed from Oktober 2010. With Steppes of War and Delta Quadrant still being in the map pool and many balance changes not being in place (roach range increase anyone?).

You cannot really group different patches together, as potential 'imbalance' from a former patch will reflect on current patches. Also by using only current data (say March 2011 and onwards) but drawing from more tourneys you actually reduce the number of maps played and therefore the number of parameters you have to determine (as each map carries 3 beta values for the matchups) from a limited set of data.

Edit:
Changing the data set would also improve the quality of the analysis, because you wouldn't have to make the assumption that the Korean style is the 'gold standard' and rather take data from all over the world avoiding local bias.
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-05-05 05:59:36
May 05 2011 05:57 GMT
#44
On May 05 2011 14:26 Nontrivial wrote:
Although I'm no math major what I do understand I'm quite impressed with. I do have one question though how close to this is what the balence team talked about at Blizzcon?

Here is the link to what I'm referring to: Link



This paper's approach differs from the balance team's.

d_ijk_stra's approach is to creating a statistics model for competitive StarCraft that uses only two variables: (1) player skill (2) map racial bias. He then proves that the model is a good fit for the GSL data. Finally, he asks the question: Does this model demonstrate any strong racial biase (using an average of the map racial bias variable) and concluded that there's no significant biase observed thus far.

What is significant here is that his approach uses competitive play data, which the community generally consider a better indicator of game balance compared to the ladder. Secondarily, he created a model that separated player skills and map racial preference that fits this data, which is important to study the question of whether there's an imbalance in the game.
Thank God and gunrun.
palanq
Profile Blog Joined December 2004
United States761 Posts
May 05 2011 05:59 GMT
#45
this is great stuff.

are you going to do more, or was this just for a class or something? if so, you should scrape TLPD for broodwar proleague games or something, which would give you a lot more data, enough to do multi-period analysis and see how the parameter estimates change over time. plus you don't have as many inter-game dependencies that there are with best-of-X series.
time flies like an arrow; fruit flies like a banana
aksfjh
Profile Joined November 2010
United States4853 Posts
May 05 2011 06:01 GMT
#46
I really appreciate your work on the subject. It was done with the intent of academic integrity, and succeeded in that.

The only "beef" I have with it is the fact that it covers a rather volatile period of SC2 (with frequent patches completely changing matchups), along with a region that has been predominantly Terran based since release. Not only that, but the Protoss from that region have also failed performed on an individual basis in individual matches.
space_yes
Profile Joined April 2010
United States548 Posts
May 05 2011 06:03 GMT
#47
An interesting read though I'm skeptical of your approach given that you're taking games from different patches and each patch changed the rules of the game. Aside from suggesting that each patch may in fact represent a different population (given that each patch is technically a different game) sampling across the patches should significantly impact the limitations described in your model (particularly conditional independence and the interactions between the players).
space_yes
Profile Joined April 2010
United States548 Posts
May 05 2011 06:05 GMT
#48
I will add that it is nice to see someone actually doing statistics, I'm fucking tired of these "here are some numbers/graphs, now this is what I think type threads." These threads should be closed by mods and the users warned imo.
d_ijk_stra
Profile Joined March 2011
United States36 Posts
Last Edited: 2011-05-05 06:06:19
May 05 2011 06:05 GMT
#49
On May 05 2011 14:57 Thrombozyt wrote:
I guess it would be better to use a different data set, as the game has vastly changed from Oktober 2010. With Steppes of War and Delta Quadrant still being in the map pool and many balance changes not being in place (roach range increase anyone?).

You cannot really group different patches together, as potential 'imbalance' from a former patch will reflect on current patches. Also by using only current data (say March 2011 and onwards) but drawing from more tourneys you actually reduce the number of maps played and therefore the number of parameters you have to determine (as each map carries 3 beta values for the matchups) from a limited set of data.

Edit:
Changing the data set would also improve the quality of the analysis, because you wouldn't have to make the assumption that the Korean style is the 'gold standard' and rather take data from all over the world avoiding local bias.


I strongly agree with you and 'space-yes''s comment. At the time I was conducting the analysis, it was March and I didn't have good understandings on tournaments other than GSL. Moreover, gamers in GSL were isolated from others. But I didn't have enough GSL games per each patch, so I had to aggregate them all. I also feel very uncomfortable about this.

Now the situation is a little different. There are many ongoing "global" leagues like NASL/TSL which I also enjoy to watch, thus I have more number of games worldwide and it might be enough to conduct a valid analysis. I hope I can do follow-up analysis anytime soon!
My Life for IU!
slyboogie
Profile Blog Joined March 2011
United States3423 Posts
May 05 2011 06:17 GMT
#50
Good read! The regression hammer comes to SC2 =) I'd like to see a larger sample size, but the methodology is fine and the interpretation is sound. Thanks for the work!
"We dug coal together." Boyd Crowder, Justified
Valroth
Profile Joined January 2011
New Zealand28 Posts
May 05 2011 06:24 GMT
#51
A lot of effort for a fundamentally flawed analysis. You say that you've taken player skill into account, which is something that cannot be measured statistically in matches between different races. Measuring player skill based on mirror matches and then using that to add/reduce weight to balance statistics in matches between different races is logically misleading. I found it interesting anyway.
GhettoSheep
Profile Joined August 2008
United States150 Posts
May 05 2011 06:29 GMT
#52
I like how you admit that your results aren't statistically significant.
TheRabidDeer
Profile Blog Joined May 2003
United States3806 Posts
Last Edited: 2011-05-05 06:32:09
May 05 2011 06:30 GMT
#53
On May 05 2011 15:29 GhettoSheep wrote:
I like how you admit that your results aren't statistically significant.

There is nothing to admit, its stating a fact. Saying he admits to something makes it sound like its something bad.

Anyway, look forward to the next one! GL with all of your coursework!

EDIT: Or, I think maybe you misunderstood what statistical significance is?
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 06:37 GMT
#54
On May 05 2011 15:24 Valroth wrote:
A lot of effort for a fundamentally flawed analysis. You say that you've taken player skill into account, which is something that cannot be measured statistically in matches between different races. Measuring player skill based on mirror matches and then using that to add/reduce weight to balance statistics in matches between different races is logically misleading. I found it interesting anyway.


This is a good point, but well I don't think this is fundamentally flawed.

This model assumes that each player's skill is the same for every match. Well it may not be true, as we know from BW that some gamer is really good vs. specific race and sucks vs. another. But I think most gamers show coherent level of skill between games, and then overall analysis may not be that misleading. Yes, actually without such an assumption it's impossible to quantify the balance between two races...

You may still disagree with this, and then deny the results. Every statistical model makes assumptions to overcome data parsimony, and I think whether the assumption is valid or not is a constructive discussion. I think the assumption is not that strong... But it's reasonable to question it. I have some ideas about more sophisticated models to account for this... Hope I can show results soon
My Life for IU!
han_han
Profile Joined October 2010
United States205 Posts
May 05 2011 06:43 GMT
#55
Wow, scholarly articles on Starcraft II? I am TOTALLY diggin' this.
Primadog
Profile Blog Joined April 2010
United States4411 Posts
May 05 2011 06:50 GMT
#56
On May 05 2011 15:24 Valroth wrote:
A lot of effort for a fundamentally flawed analysis. You say that you've taken player skill into account, which is something that cannot be measured statistically in matches between different races. Measuring player skill based on mirror matches and then using that to add/reduce weight to balance statistics in matches between different races is logically misleading. I found it interesting anyway.


There's not enough data points available to estimate every player's skill level in particular match-ups, but the tests he used showed that his model fits the dataset well despite this flaw. You also mischaracterized how skill is measured and used in the first place.

When you make a statistics model, you have to make certain assumptions that may not completely reflect reality. It's the nature of dealing with any large set of data. If you believe an assumption is incorrect, create a better model and demonstrate that it better fits the data. Believing that making assumptions somehow discredits a model simply shows that you have absolutely no idea how Statistics as a hard science works.
Thank God and gunrun.
Techno
Profile Joined June 2010
1900 Posts
Last Edited: 2011-05-05 14:51:53
May 05 2011 14:47 GMT
#57
On May 05 2011 13:54 d_ijk_stra wrote:
Techno/ Well this is what is called 'Latent Variable' method, which enables you to model which cannot be observed. It need not be defined or observed, although it's convenient to 'interpret' it that way. Actually the method of latent variable is very popular technique these days, although not covered in basic statistics courses (even in the graduate level).

I think you confused it with random effects / hierarchical model in ANOVA. You don't really need to assume latent variable to follow normal distribution. Of course, without any regularization it will overfit data, and using the assumption of normal distribution is a good way to regularize your parameters. But you can also use other types of regularization... I used L1 penalty for other reasons. However, I guess you may not want to discuss this much of technical details

I really think it would have been better if you had used win rates of certain leagues assuming skill is either non present, or normally distributed, as it is debatable that skill even exists outside of winning, and should you include skill, you should include variables like:

- Skills affect on Racial Performance
- Skills affect on this map
- Skills affect on this strategy (perhaps strategy is a part of skill, perhaps not)


I feel like skill is a very abstract concept, that cannot be precisely defined by even God. I feel like it has no place in statistical analyses. I may be wrong, but that's just my thoughts. I mean no disrespect to your report, in fact I respect it.
Hell, its awesome to LOSE to nukes!
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-05-05 20:05:14
May 05 2011 20:04 GMT
#58
Skill as a normally distributed variable that influence win-rate is the foundemental part of games and sports ratings dating back to the beginnings of Chess ELO. Every ELO, true-skill, or computerize/holistic-ranking system you see in major sports and gaming sites are based on the concept of skill as a measurable variable. There's nothing innovative or surprising about this assumption.
Thank God and gunrun.
awesomoecalypse
Profile Joined August 2010
United States2235 Posts
May 05 2011 20:12 GMT
#59
On May 06 2011 05:04 Primadog wrote:
Skill as a normally distributed variable that influence win-rate is the foundemental part of games and sports ratings dating back to the beginnings of Chess ELO. Every ELO, true-skill, or computerize/holistic-ranking system you see in major sports and gaming sites are based on the concept of skill as a measurable variable. There's nothing innovative or surprising about this assumption.


this is true, but all these assumptions correlate winrate to skill, which is something some players dispute. a guy like IdrA would argue that cheesy players are "unskilled" even when they win, something formula would clearly dispute.

But, as someone who thinks that mindset is counterproductive nonsense, and that a win is a win, I'm all for this system.
He drone drone drone. Me win. - ogsMC
hypnobean
Profile Joined October 2010
89 Posts
May 05 2011 20:20 GMT
#60
Anyone notice the paper identifies Jinro's race as Protoss?
Clearout
Profile Blog Joined April 2010
Norway1060 Posts
May 05 2011 20:28 GMT
#61
Oh this is really great. I'm studying for my statistics exam at uni and this pops up!? Great timing, great read, learned a bit and understood most. Thank you for making a statistical analysis which is actually a statistical analysis. Seems a lot of work went into it.
really?
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 05 2011 21:21 GMT
#62
On May 06 2011 05:20 hypnobean wrote:
Anyone notice the paper identifies Jinro's race as Protoss?


Oops!? I think I made it right in the introductory chapters...!?
My Life for IU!
OrchidThief
Profile Joined April 2011
Denmark2298 Posts
May 05 2011 21:27 GMT
#63
I know you've explained where beta is coming from in this thread, but it really wasn't apparent and seemed like it was just arbitrary numbers you'd fitted to players based on your assesment. I've taken several statistics courses but was still confused by it. Especially when it's such a significant part of your argument.
FrodaN
Profile Blog Joined October 2010
754 Posts
May 05 2011 21:55 GMT
#64
Great work, OP. I'm still learning statistics in college so this is very helpful to me

I'm not a betting man, but I wager that most people refuting this article didn't even read it thoroughly.
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 06 2011 07:34 GMT
#65
On May 06 2011 06:27 OrchidThief wrote:
I know you've explained where beta is coming from in this thread, but it really wasn't apparent and seemed like it was just arbitrary numbers you'd fitted to players based on your assesment. I've taken several statistics courses but was still confused by it. Especially when it's such a significant part of your argument.


Hmm... Actually I thought as soon as I declare the model as 'logistic regression', everybody will think it will estimates parameters (since this is Statistics ), but maybe it is not that natural for non-statistics people. But I'm glad that even non-statisticians are enjoying the article, and when preparing for the next version, I'll try to be more friendly to readers.
My Life for IU!
WhiteDog
Profile Blog Joined November 2010
France8650 Posts
Last Edited: 2011-05-06 08:44:59
May 06 2011 08:39 GMT
#66
That's funny, each time I read that kind of analysis I think exactly the opposite

The analysis states that T > P, T > Z, and Z > P while the statistical significance is not very strong. Any feedback would be welcome.

For me, as a zerg player, Z > T & P > Z... strange.

Well I'm not a pro on this, but in sociology I think it was Bourdieu who explained that the few students coming from the lower class that actually goes through all the scholarship are statistically better than the student coming from the upper class. For him, this effect is caused because the few who succeed in school, despite coming from a social class that doesn't help them, develop special tactics or skills to overcome their handicap.
What I'm trying to say is that the statistic, in this case, maybe reflect the exact opposite of the reality because, to overcome an imbalance (such as, it's just my point of view, but Z being a little bit over power against Terran) the players develop the skill they need to crush their opponent while their opponent don't because they are favored, so they obviously don't need to think to much to win.

Twisting... my argument is pretty slant but I think it's true.
"every time WhiteDog overuses the word "seriously" in a comment I can make an observation on his fragile emotional state." MoltkeWarding
AKspartan
Profile Joined January 2011
United States126 Posts
May 06 2011 10:12 GMT
#67
Well-written.
Kazang
Profile Joined August 2010
578 Posts
May 06 2011 12:19 GMT
#68
On May 06 2011 17:39 WhiteDog wrote:
That's funny, each time I read that kind of analysis I think exactly the opposite

Show nested quote +
The analysis states that T > P, T > Z, and Z > P while the statistical significance is not very strong. Any feedback would be welcome.

For me, as a zerg player, Z > T & P > Z... strange.



It just goes to show that neither your impression or the statistical analysis can conclude anything concrete as the variables of player skill outweigh any perceived or real "imbalance" between the races.
DivineSC
Profile Blog Joined March 2011
United States128 Posts
May 06 2011 12:57 GMT
#69
so you're saying that protoss is the worse race? LOL.
Follow me on Twitter @vGDivine Vision Gaming. vGCommunity.com
Elean
Profile Joined October 2010
689 Posts
Last Edited: 2011-05-06 13:32:14
May 06 2011 13:25 GMT
#70
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.
iNbluE
Profile Joined January 2011
Switzerland674 Posts
May 06 2011 13:38 GMT
#71
On May 06 2011 21:57 DivineSC wrote:
so you're saying that protoss is the worse race? LOL.


Yeah that's a very strong point you're making... Seriously gtfo

To the OP and everybody, the data was collected from GSL 1, several patches have been released since then. I think we just don't have enough datapoints from recent games to state anything yet.
ლ(╹◡╹ლ)
Smancer
Profile Blog Joined December 2010
United States379 Posts
Last Edited: 2011-05-06 13:43:16
May 06 2011 13:42 GMT
#72
The frequency in your model of PvZ is just about half that of the other two matches, primarily due to the number of Terran players.

How do you think this effects your model?

Edit: I should be more careful when typing.
A good way to threaten somebody is to light a stick of dynamite. Then you call the guy and hold the burning fuse up to the phone. "Hear that?" you say. "That's dynamite, baby."
nvrs
Profile Joined October 2010
Greece481 Posts
Last Edited: 2011-05-06 13:51:14
May 06 2011 13:47 GMT
#73
"There are three kinds of lies: lies, damned lies, and statistics"

Edit: If i was to start any sort of statistical analysis on the matter of balance, i would look at Blizzard first...
maddogawl
Profile Joined January 2011
United States63 Posts
May 06 2011 14:12 GMT
#74
I think this is a cool thing to try and do and it brought back a ton of memories of statistics , but to be honest taking the GSL from Oct 2010 to March 2011 causes some validity issues. There have been so many changes to the game in that time, remember the untargetable repairing scv's, the mass amount of bunker and pylon blocks, those are all things of the past. I feel anyone that takes this as the current balance of the game is sorely mistaken, the statistics stop in March 2011, since then we've had changes with the stim research time, bc speed, and infestor changes.

My point is, the stats are nice to see the general trend of the GSL over that time period, but really nothing else. What we'd really need is a statistical analysis of each patch of the game, but even then talking about balance is something we should reserve to Blizzard.

maddogawl
Profile Joined January 2011
United States63 Posts
May 06 2011 14:17 GMT
#75
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.



Precisely, there are some clear flaws with the statistical analysis, and theres no way to accurately indicate the skill of the player, in my opinion the statistics would be best served to say all GSL players are relatively equal in skill since they are in theory all the best in the world.
Nakas
Profile Joined May 2010
United States148 Posts
May 06 2011 14:44 GMT
#76
First off, the statement that Z>P in the OP of this thread is inconsistent with the analysis in the paper. The analysis in section 3.2 page 8 found that P(BETA_protoss, zerg >0) ~= 0.290, which implies that P>Z. Please double check this, as it is likely the most controversial finding of the paper.

Secondly, the analysis of the adequacy of the model is inadequate. You show that your model is statistically better than the null model, but this is not enough. Much of the predictive power of the model is contained not just in BETA_race, but in BETA_player. Your analysis of the adequacy of the model needs to prove that your model can predict based on race, and not just players.
Blacklizard
Profile Joined May 2007
United States1194 Posts
May 06 2011 15:13 GMT
#77
On May 05 2011 15:05 d_ijk_stra wrote:
Show nested quote +
On May 05 2011 14:57 Thrombozyt wrote:
I guess it would be better to use a different data set, as the game has vastly changed from Oktober 2010. With Steppes of War and Delta Quadrant still being in the map pool and many balance changes not being in place (roach range increase anyone?).

You cannot really group different patches together, as potential 'imbalance' from a former patch will reflect on current patches. Also by using only current data (say March 2011 and onwards) but drawing from more tourneys you actually reduce the number of maps played and therefore the number of parameters you have to determine (as each map carries 3 beta values for the matchups) from a limited set of data.

Edit:
Changing the data set would also improve the quality of the analysis, because you wouldn't have to make the assumption that the Korean style is the 'gold standard' and rather take data from all over the world avoiding local bias.


I strongly agree with you and 'space-yes''s comment. At the time I was conducting the analysis, it was March and I didn't have good understandings on tournaments other than GSL. Moreover, gamers in GSL were isolated from others. But I didn't have enough GSL games per each patch, so I had to aggregate them all. I also feel very uncomfortable about this.

Now the situation is a little different. There are many ongoing "global" leagues like NASL/TSL which I also enjoy to watch, thus I have more number of games worldwide and it might be enough to conduct a valid analysis. I hope I can do follow-up analysis anytime soon!


NASL is fun to watch, but until global leagues have some way to fairly distribute lag... as in 5 matches, every other match is on another region's servers, you can't reliably believe the matches between North America and Korea (or something along those lines).
KillerPenguin
Profile Joined June 2004
United States516 Posts
Last Edited: 2011-05-06 15:53:21
May 06 2011 15:20 GMT
#78
Thank you, this paper is very well done. Here are my comments.

The OP should say P>Z if the statement in your paper is correct.

The paper only predicts the balance within the GSL. If 99% of players get into the GSL as Z and the 1% who make it in as P do well it would show P>Z balance. Luckily more T have made it into the GSL than any other race so your conclusion of T>Z and T>P are only further supported.

The data comes from up to 7 months ago, since then the game has evolved and 2 patches have come out. The problem with only using new data is the sample size is significantly reduced, but honestly it was a completely different game pre 2011 most players would agree back then T was OP.

PS: Stop complaining about sample size you guys don't know statistics. 800ish is fine if the balance is uneven enough, 10k is not necessary no one cares if the balance is off by that small of a margin.
http://www.escapeintolife.com/
Krehlmar
Profile Joined August 2010
Sweden1149 Posts
May 06 2011 15:22 GMT
#79
Anything less than 10 000 games worth of statistics is just retarded, no offence because I love the effort of objective viewing but 450 games? Come on, I played WoW and I was such a godamn baller warrior than I won against 99% of everyone I duelled (lvl 70, when druids/frost mages/hunters were op) and if you'd take the statistics of my 1000 duells warriors would be retardedly strong.

My point is; Yes, a test of 450 random games from random people would give some insight, but perhaps not even close to realistic, statistics on how the game balance is at the moment.
My Comment Doesnt Matter Because No One Reads It
Jayrod
Profile Joined August 2010
1820 Posts
May 06 2011 15:34 GMT
#80
+ Show Spoiler +
On May 05 2011 14:33 Zedders wrote:
alright I've had enough of these graphs popping up everywhere and people statingthat the game is 'balanced now' because it's like 50/50/50....

It'd be interesting to see what the average game length is over time as well. Since cheeses have changed a lot since the game started (5rax reaper and whatnot), people have a) learned to deal with cheeses all-ins more adequately and b) developed more late game strategies, the games are probably as a result, longer.

It isn't surprising to see that terran was so dominant at the beginning because of the number of people that started out playing terran. If i recall...the first GSL was vastly Terran populated. Not to mention vastly cheese populated too

Terran of course having the strongest tier one unit, the marine, had (has? i'm not sure anymore) the strongest early game. We all of course remember the BitByBit strategy (essentially all-inning...and if that all in doesnt work...all in again....and if that doesnt work...all in again...rinse and repeat).

Since terran had the strongest early game...the game ended fast because cheeses were so powerful/prevalent. Therefore Terran won a lot.

The games are getting longer now.... this of course results in more and more mistakes made by each player. Balance, in my opinion, should be weighted on how many mistakes the player can make in proportion to the other player's mistakes. What I mean by this is if one player makes less mistakes in his game decisions, he should ultimately win in a long game.

Why you ask? Because Starcraft 2 is a game of decisions. And the longer the game goes on, the more decisions must be made. The more decisions that are made, the more mistakes there are, which should result in the degree of separation that makes one player better than the other.

In context...let's say X race gets supply blocked 2 times (common macro mistake) but Y race never gets supply blocked. Y race then as a result has a larger army, larger economy etc. X race still wins simply because the units he made counter the units Y race made. Ok...this isn't imbalance...this is strategy right? Y makes a larger mistake by not scouting X and as a result his units crumble to X's.

So we've established that theres different TYPEs of mistakes one can make. And some mistakes are weighed less than others. But at what point do these mistakes balance. What if X can get supply blocked twice, not scout opponent's army (+more mistakes) and still win.

The severity of one race's total mistakes should not be much larger than another's. Ultimately I'd like to see X -not- win and I hope you agree with me, because X is clearly not the better player, his race is.

--------------back to the graphs.....
Ok so these graphs are representations of both races making an equal amount of mistakes since they are pros, and we are assuming that most pros compete at the same skill level regardless of race.

So the degree of seperation of skill because of the mistakes that are made should be negligible.

To sum up a little.......

The average game length has increased (I'm pretty sure of this considering map size, cheese prevalance, spawn points).

More game length means more potential for mistakes. Ultimately as e-sports fans, we want to see the better player win. This means the player that made the right call at the right time, with the right micro, while maintaining the right macro.

Now it's super important to note...these graphs don't display anything about HOW the games were won.

Looking at T v P... you might think "oh look it's balanced now because it's 50%/50% wins now"

November2010 to jan 2011....Terran cheese prevails until protoss finally learns how to stop it (or they patched whatever). The game was balanced in january 2011 because Protoss learned how to stop strong terran all-ins? (the emergence of a 'safe build' to gain eco lead was developed)

this isn't balance, this is metagame development, meaning half the people that are trying the old strategies that used to work 60% of the time, failed a lot. And the other half that realized this, tried new strategies (and not as developed and therefore not as good) won because it was something their opponent hadn't seen before. yay for meta game development!

Despite almost your entire post not making any sense, I'll at least make the comment that you can't quantify the importance of various mistakes that are a made. A zerg player might play impeccably for 20 minutes, but then right click his army into his opponent and lose. That level of error has a different value from missing a larva injection or not having perfect creep spread.

If you want to be able to have a real balance discussion someday its going to have to be something stupid and extreme like blizzard only allowing players to pick random for a year. Its not going to happen any time soon so we're just going to have to live with statistics like this mixed in with personal bias of the readers.

Case and point, I bet most of the zerg players listening to idra vs. day9 on state of the game couldn't explain the most obvious counter-arguments against idras lack of scouting complaints. Day9 failed to make those points, but understand he doesn't think of the game in terms of balance while idra seems to constantly be coming up with ways to explain why the game is broken.
SlipperySnake
Profile Blog Joined November 2010
248 Posts
May 06 2011 16:01 GMT
#81
I really enjoyed your model and I look forward to you improving it and maybe adding variables to better estimate match outcomes. It would be great to see more than just GSL ran through this sort of a model but I understand it would be a ton of work. Maybe one solution is to have people email you data in a form that you can use or partner with a few spectators to keep track of game stats.

I mean someone just needs to have and excel workbook open and type in the things you measured so that you wouldn't have to go through it all. Anyways I look forward to any future analysis, I feel like this was a damn good start at estimating balance. Thanks.
Mactator
Profile Joined March 2011
109 Posts
Last Edited: 2011-05-06 16:54:51
May 06 2011 16:49 GMT
#82
The imbalance issue is not necessarily related to probability of a player winning. The usual notion of "imbalance" refers to specific issues rather than xvy being imbalanced.

If we consider the notorious example of the protoss death ball then many people complains that zerg players can't win against it. Let's assume that is correct. Then the obvious thing to do if you are a zerg player is to avoid getting into late game against a protoss. This may be a very effective strategy and you may even measure a high probability for zerg players to win. The game would still be imbalanced though!

Therefore the question of imbalance is a matter of strategies. To quantify it you need to consider a specific case of imbalance. If for example you can prove statistically that zvp never goes in to late game and if it does then protoss has an extreme win-loss ratio then you can conclude either that
1) there is an imbalance issue
or
2) zerg players are bad at playing late game.

ffdestiny
Profile Joined September 2010
United States773 Posts
May 06 2011 17:01 GMT
#83
Quoting Day9: "You can't really talk about balance before you take a hell of a lot of time analyzing the data." Unfortunately, your article has 5 references, uses a small sample size and jumps to conclusions based on your model. Obviously you're industrious and want to prove a point, but without just statistically gathering the entire database of games played (on this patch) there is no room for an argument on balance. Also, balance is so tied to maps that it almost nearly becomes a moot point to measure racial imbalances rather than map imbalances. There are just so many factors.

How do you measure balance in terms of the whole game? If we measure balance using data by pro players that doesn't indicate the whole game, but the subset.

How do you measure balance in terms of a race? If we measure by race then how do we correlate that data to maps.

How do you measure balance in terms of games? If we measure imbalance by games then how do we account for strategies that are intended to kill the opponent before he or she has expansions--cheesing or all-ins.

How do you measure balance by all of the above? If we measure imbalance by the whole game, the race and the games then how do those relate to one and another, because if we analyze all of the data and it comes up with a statistical win ratio favoring the zerg but then our subsets of data prove that zerg is weaker on certain maps, against certain strategies, etc. this totally negates our first assumption.

You see it's almost pointless to try and argue imbalance?
Lingy
Profile Joined December 2010
England201 Posts
May 06 2011 17:03 GMT
#84
IMO there is no way toss is better than zerg, i dont care what the stats say
Hydraliskuuuuhh
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 06 2011 17:25 GMT
#85
On May 07 2011 02:01 ffdestiny wrote:
Quoting Day9: "You can't really talk about balance before you take a hell of a lot of time analyzing the data." Unfortunately, your article has 5 references, uses a small sample size and jumps to conclusions based on your model. Obviously you're industrious and want to prove a point, but without just statistically gathering the entire database of games played (on this patch) there is no room for an argument on balance. Also, balance is so tied to maps that it almost nearly becomes a moot point to measure racial imbalances rather than map imbalances. There are just so many factors.

How do you measure balance in terms of the whole game? If we measure balance using data by pro players that doesn't indicate the whole game, but the subset.

How do you measure balance in terms of a race? If we measure by race then how do we correlate that data to maps.

How do you measure balance in terms of games? If we measure imbalance by games then how do we account for strategies that are intended to kill the opponent before he or she has expansions--cheesing or all-ins.

How do you measure balance by all of the above? If we measure imbalance by the whole game, the race and the games then how do those relate to one and another, because if we analyze all of the data and it comes up with a statistical win ratio favoring the zerg but then our subsets of data prove that zerg is weaker on certain maps, against certain strategies, etc. this totally negates our first assumption.

You see it's almost pointless to try and argue imbalance?


First of all, the analysis takes the effect of map into account,
thus actually this analysis can be thought of as "DO WE HAVE BALANCED MAPS?",
and try to see how many P>Z imba or T>Z imba maps there are, so on.

Secondly, I understand you feel uncomfortable with statistical analysis.
Say, there are 50 students in the class. Let's say the mean of heights is 170cm.
What does it talk about the individual person? Nothing. Any student in the class
can be 150cm tall, or 200cm tall. However, the mean itself is still not meaningless.
To gain information, we sometimes have to find out what is the clever way of
summarizing things. Of course the more complex the situation is, the harder
and less intuitive the statistics become.

If you think statistical analysis explains the detail of EVERY GAME,
I think you are misled. That is not the point of conducting an analysis.
The point is to find out whether there is an overall trend.
In one game, a Terran gamer can cheese a Zerg gamer.
However, can he do it in every game? Absolutely not.
But there are maps that a cheese can be succeed in high probability (ex: steppes of war).
In such a case, it is not hard to see there is a balance issue. (ex: the infamous Mercury map in BW)
My Life for IU!
Argolis
Profile Joined August 2010
Canada211 Posts
May 06 2011 17:26 GMT
#86
Well done. Stats are always fun, not so much as proof of anything because they can always be argued, but because numbers are fun.
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 06 2011 17:37 GMT
#87
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.
My Life for IU!
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 06 2011 17:42 GMT
#88
On May 07 2011 01:49 Mactator wrote:
The imbalance issue is not necessarily related to probability of a player winning. The usual notion of "imbalance" refers to specific issues rather than xvy being imbalanced.

If we consider the notorious example of the protoss death ball then many people complains that zerg players can't win against it. Let's assume that is correct. Then the obvious thing to do if you are a zerg player is to avoid getting into late game against a protoss. This may be a very effective strategy and you may even measure a high probability for zerg players to win. The game would still be imbalanced though!

Therefore the question of imbalance is a matter of strategies. To quantify it you need to consider a specific case of imbalance. If for example you can prove statistically that zvp never goes in to late game and if it does then protoss has an extreme win-loss ratio then you can conclude either that
1) there is an imbalance issue
or
2) zerg players are bad at playing late game.



To make a change in a game as a patch, you are definitely correct.
However, there ARE imbalances sometimes.
If you watched BW for a long time, do you remember the infamous map Mercury?
What was the P score? Did P win more than 2 games in that map?

From game to game, yes there are differences.
Even July was defeated in Mercury in OSL final.
However, everyone who has been playing SC1/SC2 for long time KNOWS that
certain maps REQUIRE PLAYERS of certain race to do things x, y, z,....
and thus it leads to imbalance issues.
My Life for IU!
latan
Profile Joined July 2010
740 Posts
May 06 2011 17:50 GMT
#89
I like your initiative but this analysis is almost a joke. badly written, poorly justified and pretty much naive for something that tries to pass as a scientific paper, I only say this because i don't like that things like this are on arxiv.

I would rather it being limited to discussing possible statistical models and methods to approach the problem.
Elean
Profile Joined October 2010
689 Posts
Last Edited: 2011-05-06 18:03:26
May 06 2011 18:02 GMT
#90
On May 07 2011 02:37 d_ijk_stra wrote:
Show nested quote +
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.
Cheerio
Profile Blog Joined August 2007
Ukraine3178 Posts
Last Edited: 2011-05-06 18:40:50
May 06 2011 18:17 GMT
#91
On May 05 2011 10:21 professorjoak wrote:
Data set had only about ~620 nonmirror games in it. It would be interesting to use this methodology on the Brood War TSL Season 1 and 2 full ladder replay packs, which have several times more data in them.

I looked into trying a statistical analysis for TSL Season 1 at one point to see if the distribution of build orders on a map had any correlation with win percent. A first glance at the data showed all matchups on any map where I had 100+ games in that specific map and matchup balanced within 52-48. (Which is different than the Korean results in the TLPD which usually split 60-40 or 55-45, though those are based on far fewer games). However, I then realized the data set had many duplicate games from a game between two top ladder players being counted in each player's replay pack and decided it would be too much trouble to properly sort them out so I quit there and didn't take the analysis much further.

well what's wrong with duplicates? It's not like in a replay from the opposite player the winner would somehow change. Even if many replays are duplicated and many are not it is still ok as long as duplication is random (though it can hurt the result it's much more probable the difference would be minor)
Mactator
Profile Joined March 2011
109 Posts
Last Edited: 2011-05-06 19:59:34
May 06 2011 19:38 GMT
#92
On May 07 2011 02:42 d_ijk_stra wrote:
Show nested quote +
On May 07 2011 01:49 Mactator wrote:
The imbalance issue is not necessarily related to probability of a player winning. The usual notion of "imbalance" refers to specific issues rather than xvy being imbalanced.

If we consider the notorious example of the protoss death ball then many people complains that zerg players can't win against it. Let's assume that is correct. Then the obvious thing to do if you are a zerg player is to avoid getting into late game against a protoss. This may be a very effective strategy and you may even measure a high probability for zerg players to win. The game would still be imbalanced though!

Therefore the question of imbalance is a matter of strategies. To quantify it you need to consider a specific case of imbalance. If for example you can prove statistically that zvp never goes in to late game and if it does then protoss has an extreme win-loss ratio then you can conclude either that
1) there is an imbalance issue
or
2) zerg players are bad at playing late game.



To make a change in a game as a patch, you are definitely correct.
However, there ARE imbalances sometimes.
If you watched BW for a long time, do you remember the infamous map Mercury?
What was the P score? Did P win more than 2 games in that map?

From game to game, yes there are differences.
Even July was defeated in Mercury in OSL final.
However, everyone who has been playing SC1/SC2 for long time KNOWS that
certain maps REQUIRE PLAYERS of certain race to do things x, y, z,....
and thus it leads to imbalance issues.


You are right about maps being important. Some maps can be abused if you are playing a specific race but I don't think that is the issue that frustrates people.

It would be nice to have a homepage where you for a specific patch could see things like 1) the average time (perhaps with standard deviation) played for a specific map and races (x vs y) 2) the most popular units/army composition in early, mid and late game i.e. at a specific time, 3) correlation plots etc.. It would also be good to have the division or tournaments such as GSL, MLG etc. as a variable. Like sc2ranks although with different data.

This would add some useful data to the discussion about imbalance and strategy.
tdt
Profile Joined October 2010
United States3179 Posts
Last Edited: 2011-05-06 20:05:35
May 06 2011 19:41 GMT
#93
Don't know stats but I believe it. When blizzz used to release numbers it showed same with P on short end. When you look at tipsy tops of ladders Terran just dominate everywhere. When you combine bunches of tournaments Terran is on top.

Maybe terrans just better skilled though? How do you know?

Saying Terran is IMBA It's like saying basketball is imbalanced towards USA rather than we have better players. No?

I prefer too look at individual strategies instead. If something can not be beaten like 3 50 DPS VR in Zergs base early and nothing you can do about it, that's imbalanced so it was patched.

Everything else, including these stats, IMO is just whining and could be just as well be attributed to superior/inferior play if we step back and look objectively with neutral glasses on.
MC for president
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 06 2011 20:54 GMT
#94
On May 07 2011 03:02 Elean wrote:
Show nested quote +
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.
My Life for IU!
Elean
Profile Joined October 2010
689 Posts
Last Edited: 2011-05-07 06:45:47
May 07 2011 06:31 GMT
#95
On May 07 2011 05:54 d_ijk_stra wrote:
Show nested quote +
On May 07 2011 03:02 Elean wrote:
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 07 2011 14:47 GMT
#96
On May 07 2011 15:31 Elean wrote:
Show nested quote +
On May 07 2011 05:54 d_ijk_stra wrote:
On May 07 2011 03:02 Elean wrote:
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.


That part is already (implicitly) mentioned by other users. If there is NO MIRROR MATCH, your point is right. The existence of mirror match enables you to do such an analysis.

Of course, (as another user already pointed out) you can question it. Every gamer may have different levels of skill depending on the race of his/her opponent. But I think this assumption itself is not too strong to make everything nonsense: we know that most top level players are also good at mirror matches.
My Life for IU!
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 07 2011 14:56 GMT
#97
On May 07 2011 15:31 Elean wrote:
Show nested quote +
On May 07 2011 05:54 d_ijk_stra wrote:
On May 07 2011 03:02 Elean wrote:
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.


Oh, and it seems like you missed this point: every beta_player of each user is ALSO penalized by LASSO. This is a very important point, but I thought when I say LASSO everyone would imagine every variable is being penalized. Isn't it the usual case? I think not penalizing certain variables is an exceptional case when using LASSO.
My Life for IU!
Elean
Profile Joined October 2010
689 Posts
May 07 2011 15:06 GMT
#98
On May 07 2011 23:47 d_ijk_stra wrote:
Show nested quote +
On May 07 2011 15:31 Elean wrote:
On May 07 2011 05:54 d_ijk_stra wrote:
On May 07 2011 03:02 Elean wrote:
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.


That part is already (implicitly) mentioned by other users. If there is NO MIRROR MATCH, your point is right. The existence of mirror match enables you to do such an analysis.

Of course, (as another user already pointed out) you can question it. Every gamer may have different levels of skill depending on the race of his/her opponent. But I think this assumption itself is not too strong to make everything nonsense: we know that most top level players are also good at mirror matches.

Obviously, mirror matches change nothing. My exemple still stands with an extremely large number of mirror matches.

I didn't say that everything was nonsense. Your model is probably good to estimate the odds of a match, or to tell which player is the best within one race. However it is completely blind to balance.
Elean
Profile Joined October 2010
689 Posts
Last Edited: 2011-05-07 15:08:28
May 07 2011 15:07 GMT
#99
On May 07 2011 23:56 d_ijk_stra wrote:
Show nested quote +
On May 07 2011 15:31 Elean wrote:
On May 07 2011 05:54 d_ijk_stra wrote:
On May 07 2011 03:02 Elean wrote:
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.


Oh, and it seems like you missed this point: every beta_player of each user is ALSO penalized by LASSO. This is a very important point, but I thought when I say LASSO everyone would imagine every variable is being penalized. Isn't it the usual case? I think not penalizing certain variables is an exceptional case when using LASSO.


Yeah of course, every parameter is constrained. This is why in my example, where all the players have the same strenght (skill + race balance), your model will set all the parameters to 0, despite the unbalance.
FoxNews
Profile Joined February 2011
1 Post
May 07 2011 15:18 GMT
#100
Nice work! i've always been interested in doing a statistical study myself, but i have yet to take stats--lol in hs it was either stats or calc.. It's also refreshing to see another Cornellian on here. I'm a freshman undergrad myself planning on majoring in physics. Anyway, nice work, and don't listen to the haters who couldn't have done a study like this in the first place.
Keep up the good work!
Also, did you go see nelly? lol he's so bad.
Nagisama
Profile Blog Joined April 2010
Canada4481 Posts
Last Edited: 2011-05-07 15:22:41
May 07 2011 15:22 GMT
#101
Sorry this is more helping in editing the grammar/spelling in your paper than the actual content.

In your actual paper, you spelled Terran properly, however in the appendix you used Teran, was that just to save space? or just simple misspelling?

Also, regarding section 5.2, I think the currency is described as Korean Won, instead of Korea Won. Just simple fixes for future versions =)

Regarding content, I was a bit confused at how you calculated B, but you answered it earlier. Not having taken a statistics course in a year has made me forget logistic regressions.
Calendar"Everyone who has accomplished more than you has no life; Everyone who has accomplished less than you is a noob." | Elem: "nagi is actually really smart"
GeorgeForeman
Profile Joined April 2005
United States1746 Posts
Last Edited: 2011-05-07 15:36:21
May 07 2011 15:35 GMT
#102
Nice analysis. I went in anticipating some intro-level stat student doing a regression and calling it science, but you did a good job here!

There are a couple of assumptions about the player pool being representative, etc. that are implicit, and the way tournaments are structured introduces some bias in which players you see more games of, but I think the approach is sound. A better data set for this type of analysis (IMO) might be the NASL. Since there aren't eliminations, you get to see all of the players the same number of times.

Great work!

PS- I used Agresti for my Categorical class, too!
like a school bus through a bunch of kids
mike1290
Profile Joined January 2011
United States88 Posts
May 07 2011 15:36 GMT
#103
I have looked through many balance threads since sc2 came out and people started whining about X or Y. Many of these threads, well some at least, incorporated some kind of statistical analysis like this one to back up various claims about balance.

I do not think this is the correct way to evaluate balance. Statistics are nice, but they do not accurately reflect a game's true balance. Correct me if I'm wrong, but I do not believe anyone has learned how to play sc2 perfectly at this point in time. So I don't see the value of analyzing statistics for players that are not playing optimally. Currently, X race may have a much lower win % than the other races, but this in no way reflects the game balance. This is merely a reflection of where the players are right now.

My approach to balancing the game is to stop patching it and let it develop. I have no idea how long this will take, but I'm sure it's longer than the few months Blizzard puts between patches.

I watched state of the game last week and I found Idra's points very interesting. I agree completely with him that if there is no way for zerg to scout and if zerg has no build that makes it safe against any possible build then the game is imbalanced. However, I do not necessarily agree that this is the case yet in sc2 for zerg. The game is very new and for Idra to claim this definitely seems a little premature. Idra is a top top zerg player and I am obviously not so I also have to take into consideration that there are very few people with Idra's insight into the game as a zerg player.
HateRock
Sueco
Profile Joined September 2009
Sweden283 Posts
Last Edited: 2011-05-07 15:38:27
May 07 2011 15:38 GMT
#104
"There's lies, damn lies and statistics."

- Mark Twain
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 07 2011 16:02 GMT
#105
On May 07 2011 15:31 Elean wrote:
Show nested quote +
On May 07 2011 05:54 d_ijk_stra wrote:
On May 07 2011 03:02 Elean wrote:
On May 07 2011 02:37 d_ijk_stra wrote:
On May 06 2011 22:25 Elean wrote:
It looks like this model assumes that protoss players are extremely skilled (6 Protoss in the top 10 skilled player), and get to the conclusion that Protoss is underpowered.
Basicaly, it has exactly the same value as Idra saying "I'm the best player, I don't win, thus there is an imbalance".

(actually, this model can converge to different solutions, the particular solution the author got was "protoss players are skilled and protoss are underpowered...", it could very well have converged to "protoss players have no skill and protoss are overpowered")

All the people reading this should understand that this is not a scientific peer reviewed paper.

There is no way, this would be accepted as it is now.
If I were to review this paper I would ask for several modifications, and I would actually reject the paper unless the author answer this question:
How can you tell there is no offset on the "skill parameter" of all the players of 1 race ?

I would also ask a plot of the "skill pararemeter" distribution for each race.


First of all, I think you read it very carefully. Thank you very much for your interest.
I'll talk in technical sense, since it seems like you have good background in statistics.

The problem you're worried of can happen in "unidentifiable" cases,
that is, there are multiple parameters that can represent the same model.
This is not the case for this problem, since I used either

1) Use LASSO as a L_1 regularier,
2) Use non-informative gamers as baseline

Therefore, things like what you described cannot happen.
The existence of regularizer tries to not have the presence of extraordinary gamers
as much as it can, unless he wins too many games.

It is very important to check identifiability of the model before conducting an analysis,
and it is good for you to check out this issue. I understand for you to miss this point
since 1) I agree that the document is poorly written. It should be rejected in every journal/conference 2) you should've not read it as a professional reviewer

And it is also good to point out that THIS IS NOT A SCIENTIFIC PEER-REVIEWED PAPER.
I DID IT FOR FUN, and the fact that I am a Statistics major does not guarantee that the
analysis is correct. I didn't worry much at this point at the time posting it, but people without
proper background could've misled. Thanks.

Your model is:

logit(P)=beta_player1-beta_player2+beta_matchup

You use the LASSO method to fit the value of beta_playerx, and beta_matchupx

You get ONE fit, but there are other degenerated solutions, here is the proof:
Take the values of your solution, then decrease by 10000 all the beta_player of protoss players, then increase by 10000 the beta_PvZ and the beta_PvT.
If you do that, you get another fit that is just as good as the one you first had (i.e. all the logit(P) are unchanged). However, now the beta_PvZ and the beta_PvT are extremely high, and protoss become clearly overpowered.


Your model is probably good to estimate how likely a player is to win a match, but it is 100% blind to balance.

The problem is that all the players only play 1 race, and you will never be able to make the difference between "all the protoss players are way better than the others, but protoss is underpowered" and "all the protoss players are noobs, but it's ok since protoss is way overpowered".
There is absolutely nothing you can do about it. Not with this sample of data.


By LASSO, you mean the existence of (L_1) regularizer.
When you add 10,000 to your parameter, you are being penalized a lot.
I suspect you understand the concept of regularization, sorry.

As far as I can tell, LASSO is a least square method that set up a constraint on the L1 norm. Constraint that has no justification in this case.

You have to understand that if 2 models give the exact same results for any match, there is no way to tell which one is better. I explained to you, that there is an infinite number of models that will give the same resuts with different "balance between 2 races". This means you can not tell if there is an unbalance.

I will explain to you on an example, why the L1 constraint does not have any justification.

For simplicity sake, let's consider only 2 races, T and Z, and let's consider that all the players of 1 race have the same skill.
Suppose TvZ is unbalance, and the actuall value of beta_TvZ is 500.
Since all the players made it in the tournaments, they are likely to have roughly the same strengh (skill + balance). This means the Z players have likely a beta_player that is 500 above the beta_player of the T players.

Now, run your mode, with a sample size extremely large. You get the solution beta_TvZ=0, beta_playerZ=0 and beta_playerT=0. This solution clearly minimize the L1 norm, and is also give the exact result for the probability of each match. However, it is completly wrong and does not manage to see the unbalance.

Your method fails to catch any unbalance for the exact same reason everyone can't tell the balance: we don't know if the zerg players are more or less skilled than the terran players.


Due to penalty on beta_players, it cannot naturally happen that all zergs users having 500 more parameter values than terran users. You know, using L2 norm is equivalent to using normal priors centered at origin? By specifying regularizer, you are implicitly assuming that those parameters should be centered around zero.

Of course then it can be attacked from another angle: can you say that the average skill level of all races are (at least approximately) the same? This is a valid question. If good gamers are all choosing Terran and bad ones are choosing Zerg, it's hard to analyze the balance.
My Life for IU!
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 07 2011 16:08 GMT
#106
On May 08 2011 00:38 Sueco wrote:
"There's lies, damn lies and statistics."

- Mark Twain


It's funny that you are not the first person quoting this in this thread. :D

Yes, I do not think this analysis is 100% percent. But we do Statistics because we want to give some answer to questions people are interested in. Not just saying that "Who knows? This can be right, and that can be right!" Actually, if you are a scientist/engineer, this is what we do for EVERY scientific problem. No one did prove that Newton's law is correct, or Schroedinger's equation is correct. It is just an attempt to "explain" the phenomena.

But Statistics can be a serious lie, if one claims that his analysis is perfectly correct. Mark Twain should have said that, because too many people just copy last line of conclusion part of the paper, and then say that "This is the damn truth!"

But I am really happy to see these replies in teamliquid: not only they are interested in the problem itself, they question assumptions, they question methods, so that the no damn liar (or, statistician) can deceive them. I am quite impressed!
My Life for IU!
awesomoecalypse
Profile Joined August 2010
United States2235 Posts
May 07 2011 16:14 GMT
#107
I watched state of the game last week and I found Idra's points very interesting. I agree completely with him that if there is no way for zerg to scout and if zerg has no build that makes it safe against any possible build then the game is imbalanced.


Zerg, or any race, being 100% safe against any possible build is not a prerequisite for balance. It is just something IdrA wants because he thinks the game should conform to some platonic ideal which rewards "skill" (by which he means mechanics) above all else, and he hates the idea of losing to anyone who isn't "better" than him (by which he means more mechanically capable). The so-called "safe" builds lose to cheese all the time even in BW (witness Jaedong's legendary success via the 6 pool), unless you're name is Flash and you have jedi gamesense. But IdrA himself was famous for getting cheesed out of tournaments, then bitching about imbalance, Protoss in general, and his opponents lacking "skill". For all that people have this idea that BW evolved into these perfectly safe builds that made you immune to cheese and guaranteed you'd go into the mid and late game on decent footing...that really wasn't the case. All "safe" means is that if you play perfectly, scout like a motherfucker and have ludicrously good gamesense, you could theoretically react in time to anything. But if you had any lapses whatsover--which, you know, basically every pro does, which is why BW pros are far from immune to cheese--you'll still lose to cheese.

Even if Overlords were faster, IdrA would still lose to nonstandard play a lot, and he'd still bitch about it whenever he did, the same way he bitched every time he lost to cheese in BW (which was also a lot). At the end of the day, IdrA's problem isn't with the game, its himself--he's not as good as he thinks he is, and for some reason he has this platonic ideal of how the game "should" be played that has very little to do with how it actually is. That was true in BW and it will always be true in SC2, regardless of what patches may come.
He drone drone drone. Me win. - ogsMC
mike1290
Profile Joined January 2011
United States88 Posts
May 07 2011 16:47 GMT
#108
On May 08 2011 01:14 awesomoecalypse wrote:
Show nested quote +
I watched state of the game last week and I found Idra's points very interesting. I agree completely with him that if there is no way for zerg to scout and if zerg has no build that makes it safe against any possible build then the game is imbalanced.


Zerg, or any race, being 100% safe against any possible build is not a prerequisite for balance. It is just something IdrA wants because he thinks the game should conform to some platonic ideal which rewards "skill" (by which he means mechanics) above all else, and he hates the idea of losing to anyone who isn't "better" than him (by which he means more mechanically capable). The so-called "safe" builds lose to cheese all the time even in BW (witness Jaedong's legendary success via the 6 pool), unless you're name is Flash and you have jedi gamesense. But IdrA himself was famous for getting cheesed out of tournaments, then bitching about imbalance, Protoss in general, and his opponents lacking "skill". For all that people have this idea that BW evolved into these perfectly safe builds that made you immune to cheese and guaranteed you'd go into the mid and late game on decent footing...that really wasn't the case. All "safe" means is that if you play perfectly, scout like a motherfucker and have ludicrously good gamesense, you could theoretically react in time to anything. But if you had any lapses whatsover--which, you know, basically every pro does, which is why BW pros are far from immune to cheese--you'll still lose to cheese.

Even if Overlords were faster, IdrA would still lose to nonstandard play a lot, and he'd still bitch about it whenever he did, the same way he bitched every time he lost to cheese in BW (which was also a lot). At the end of the day, IdrA's problem isn't with the game, its himself--he's not as good as he thinks he is, and for some reason he has this platonic ideal of how the game "should" be played that has very little to do with how it actually is. That was true in BW and it will always be true in SC2, regardless of what patches may come.


I don't think that I said anything about BW being balanced, but correct me if I'm wrong. I am simply saying that to have the game be as fit for competition as possible, the "luck" portion of the game should be as small as possible.

As far as the "safe" build goes, I agree with you that this should not be an option for zerg. I think the problem and the solution is with scouting. Having a build that is 100% safe would be bad for the game and I think Idra was not really suggesting this safe build. He was really trying to argue for better scouting options and presented this safe build as really the only other option is the couting situation stays the same for zerg. Again, I want to state that I'm not sure if what Idra is saying is the case with zerg in sc2 yet, but if it ever gets there, something will need to be done to fix the game.
HateRock
StarDrive
Profile Joined September 2010
90 Posts
May 07 2011 16:59 GMT
#109
It looks like your model does not take into account balance changes via Blizzard's patches over time, since you use 852 games from the entire GSL. Is there a way you could augment your model to handle this?
Elean
Profile Joined October 2010
689 Posts
Last Edited: 2011-05-07 17:16:22
May 07 2011 17:09 GMT
#110
If good gamers are all choosing Terran and bad ones are choosing Zerg, it's hard to analyze the balance.


This is exactly the problem here.
And you have to keep in mind the nature of your DATA. It's from players who managed to qualify for GSL.
It's not just a problem of good players choosing a race rather than another.

The problem is that if Terran is underpowered (I don't like realistic examples...) and zerg is overpowered, then the Terran players who manages to qualify for the GSL are most likely more skilled on average than the Zerg players in GSL.

So if there is an unbalance in the game, you will have differences among the average player skill of each race in your DATA. And it does not make simply "hard" to analyze the balance, it makes it impossible with GSL as DATA.


Now, the fact that you got 6P, 3Z and 1T in the top 10 skilled players suggest that you are overestimating the average skill of the protoss players and underestimating the average skill of the terran players. And this would naturally leads you to the conclusion that P is underpowered and T is overpowered.
RaLakedaimon
Profile Joined August 2010
United States1564 Posts
May 07 2011 17:16 GMT
#111
Wow nice job on this man, love getting solid info like this.
mavyric
Profile Joined November 2010
Taiwan104 Posts
Last Edited: 2011-05-07 17:16:48
May 07 2011 17:16 GMT
#112
A number a people have brought up good points.

My 2 cents on using this statistics model:

You cannot just add up the seasons of the GSL, because there were balance changes(patches) and map changes. Therefore, you cannot infer from the data alone that a certain matchup was favored over another.

Vive Hodie
Cifer
Profile Joined April 2011
United Kingdom69 Posts
May 07 2011 17:33 GMT
#113
There is also factors like the "meta game" and the opposing styles clashing. For example player A wins 70% of his PvZ but he will most likely lose to a player with drop play regardless of his opponents ability. The "ability" of a player is such a dynamic variable, that I don't think it reasonable to use in such fashion.
W2
Profile Blog Joined January 2011
United States1177 Posts
May 07 2011 17:45 GMT
#114
Seems about right. Good work! More people should read this!
Hi
d_ijk_stra
Profile Joined March 2011
United States36 Posts
May 08 2011 01:25 GMT
#115
On May 08 2011 02:16 mavyric wrote:
A number a people have brought up good points.

My 2 cents on using this statistics model:

You cannot just add up the seasons of the GSL, because there were balance changes(patches) and map changes. Therefore, you cannot infer from the data alone that a certain matchup was favored over another.



Totally agree. I hope the patch is being stabilized at some point of time and then I have enough data to analyze a single patch. I promise to do some follow-up analysis with more reasonable model in the near future At this point of time this is just for fun, please don't take it too seriously :D You know, nobody will tell SC2 is IMBA or not based on this!
My Life for IU!
Breaker 1st Class
Profile Joined May 2010
Australia47 Posts
May 15 2011 08:22 GMT
#116
Firstly, this is probably the most academic/scholarly thread I have read in a while on TL.net. Kudos to the OP; his/her receptivity to feedback and criticism warms my heart . I hope you can improve your model soon because I look forward to reading a revised version in the near future

P.S. Your article and the ensuing debate and discussion in this thread have rekindled the fires of my inner scientific self.
Paradice
Profile Joined October 2010
New Zealand431 Posts
Last Edited: 2011-05-17 08:37:22
May 17 2011 08:30 GMT
#117
On May 07 2011 01:49 Mactator wrote:
The imbalance issue is not necessarily related to probability of a player winning. The usual notion of "imbalance" refers to specific issues rather than xvy being imbalanced.

If we consider the notorious example of the protoss death ball then many people complains that zerg players can't win against it. Let's assume that is correct. Then the obvious thing to do if you are a zerg player is to avoid getting into late game against a protoss. This may be a very effective strategy and you may even measure a high probability for zerg players to win. The game would still be imbalanced though!


What? No! I realise this is 1 page ago but I have to call this one out. If there is an effective strategy that lets Zerg win despite the existence of the "unbeatable" Protoss deathball, then that means the opposite of your conclusion.

Consider the game of Connect 4. If player A gets into a position where he has three counters in a line and open spaces on both sides of the line, his position is unbeatable by player B. The existence of that situation does not mean that Connect 4 is imbalanced - the existence of that situation actually means that the game is winnable which is rather a desired property for a competitive game. It would be imbalanced only if player B had (1) no viable means to stop player A from getting into that position, and (2) had no viable means themselves to win the game before that position was reached by player A.

If you want to prove an imbalance you have to argue for both (1) and (2); if you listen to IdrA on SotG with Day9, this is the type of argument he's making. You can also use statistics to try and prove or disprove (1) and (2) - e.g. if you show that Zerg players are winning, it's a fairly logical conclusion that they must have a viable means to win.


And now in regards to the OP - a fantastic effort look forward to any future developments put forward. I would also love it if you expanded the sample size and listed the identified beta values for everyone's favourite pro players - but I'd mainly love it because of the shitstorm it would create
paperwing
Profile Joined February 2011
49 Posts
Last Edited: 2011-05-20 08:06:13
May 20 2011 08:05 GMT
#118
The paper was referenced in the following thread:

http://www.teamliquid.net/forum/viewmessage.php?topic_id=224881

You may be interested in his commentary
Thrombozyt
Profile Blog Joined June 2010
Germany1269 Posts
May 20 2011 08:32 GMT
#119
I would be interesting to see your analysis applied to the NASL stats after the end of group play. With their results being splitted by maps, too, it should be easier to separate map bias from racial matchup bias.
Primadog
Profile Blog Joined April 2010
United States4411 Posts
May 20 2011 09:34 GMT
#120
Featured on Weapon of Choice, congrats!
Thank God and gunrun.
Dst
Profile Joined May 2011
United States1 Post
May 20 2011 09:58 GMT
#121
Would it be possible to do an analysis on duration on how long the game has been to see if there is an affect on winning rates? It would be interesting to see how zvp progress as the game prolongs since there is some dislikes about the deathball that builds up.
DrShaiHulud
Profile Joined April 2011
United States14 Posts
May 24 2011 04:36 GMT
#122
Can you upload the raw data somewhere just as plain text or something?
LaughingTulkas
Profile Joined March 2008
United States1107 Posts
May 24 2011 04:57 GMT
#123
Nice work! I read the whole paper, and while perhaps I might have some stylistic things I would edit if it was one of my scholarly papers, I think your method was novel and interesting. I think its definitely a great addition and a basis for more in depth statistical analysis of balance. You seem to realize that your results are interesting and illuminating, but you also realize their limitations and you don't stretch your conclusions further than the method/data would allow.

Thanks for you contributions to the body of knowledge; you've set the bar and further work will need to build off of and surpass what you've done if it wants to be taken seriously.
"I love noobies, they're so happy." -Chill
contraSol
Profile Blog Joined December 2010
United States185 Posts
May 24 2011 05:15 GMT
#124
Solid paper, it's cool to see some scholarly work being put into SC2. A bigger sample to work with would be cool, too, though I think you did well to limit the scope to the top levels of competition. Maybe GSL+TSL+MLG+Dreamhack/etc?

You might have answered this already, but what program did you use for the analysis?
KermitTheFro
Profile Joined April 2010
United States25 Posts
May 24 2011 06:47 GMT
#125
Very cool to see a paper bringing some college/graduate level statistics into play!

A couple suggestions -- it would be helpful for clarity if you explained the details of your model at some point in the paper. From what I gathered, you had a binary indicator variable for each Player and a ternary indicator variable showing the Map/Player1/Player2 combination. If you could write this out formally, it would greatly simplify your description of the logit equation you are using to calculate the odds of winning.

Second, I would be very curious to see how many data points (games) you had for each Player1/Player2/Map combination in your data set. You mention the obvious concern that you don't have enough data for some games to be included, but you say you fixed these players' skill parameters to 0...won't this just skew your data? For instance, if MC had relatively few data points, you would set him to zero which would artificially make PvX look a lot better, since his wins are forced to be explained by the PvX regression term increasing.

Finally, what is your reasoning behind using L1 regularization? Since it applies equally across all parameters, you are forcing all your regression terms toward zero. This will be effective in making regression terms for players which have only a couple games in the data go to zero, since they can't (by definition) have a large effect on the final accuracy of your regression, but the resulting effect on all of your other parameters seems unintended and hard to justify. In reality, it seems that you expect very few of your regression terms to be zero a priori.

Like I said...very cool and props to you for writing this all up. Would be very cool to make this model slightly more complicated (could easily be done just by factoring in some basic time-series information to the racial balance) and see if you can capture meaningful shifts in the metagame.
Pro]ChoSen-
Profile Joined December 2008
United States318 Posts
May 24 2011 06:49 GMT
#126
The link in the OP doesn't work? ^^
Warble
Profile Joined May 2011
137 Posts
Last Edited: 2011-05-25 05:41:46
May 24 2011 09:57 GMT
#127
The link is still working for me. If you look to the top-right there should be a search bar and right underneath that it says Download: and you can choose PDF.

He wouldn't have used a ternary variable. He would have had separate binary variables for each map and matchup - one for XNC(TvZ), one for XNC(TvP), one for XNC(PvZ) and for each map.



I was reluctant to post technical comments earlier because it's so easy to destroy and so hard to create and I think this sort of thing is a good step up from what we've been seeing. I really saw that you put a lot of work into this.

Since more people are interested in this now and the threads have been cross-referenced I think I'll write up some proper feedback. In the other thread I mostly just talked about the lack of data you showed us - no ANOVA tables for instance. Particularly because others like Shai has expressed interest in the data and we may see more people do tests, it would be good to improve things a bit before people put a lot of effort into analysing poor models.

I've finished reading your article, footnotes, and this thread. I don't think the references require checking, so I'll start writing it up when I have some spare time. I'll probably have it ready in a fortnight.

In the meantime I think it's important to emphasise a point before others get the wrong idea:

Your model concludes that there is no evidence for racial imbalance.

I got the feeling when reading it that you were trying to show that the game was actually imbalanced since all the steps you took were geared in that direction and you didn't talk about doing other checks against concluding that there is imbalance.

I think it's an excellent start and certainly on the right track. We'll need a different model, though. More on that in 2 weeks.
starcraft911
Profile Blog Joined July 2008
Korea (South)1263 Posts
May 24 2011 12:35 GMT
#128
On May 05 2011 09:52 awesomoecalypse wrote:
Very interesting that everyone cries about Protoss being too strong, yet not one statistical analysis backs it up in any way. Thanks very much for posting this, and welcome to TL.


That's because the sample regression is based on a pool that you're not part of. Hypothetically say that sc2 is a balanced game at the highest tier then there is an almost certainty that it will be imbalanced at the lowest tier. This is where the concept of skill cap comes in and this statistic doesn't address that at all which is understandable as trying to do so would require a statistic on each player under various conditions... i.e. map, opponent, game in the series, previous experiences, known patterns, etc.

tldr: protoss deathball owns noobs hence the qq.
KermitTheFro
Profile Joined April 2010
United States25 Posts
Last Edited: 2011-05-24 19:43:52
May 24 2011 19:42 GMT
#129
On May 24 2011 18:57 Warble wrote:
He wouldn't have used a ternary variable. He would have had separate binary variables for each map and matchup - one for XNC(TvZ), one for XNC(TvP), one for XNC(PvZ) and for each map.


Ah of course...laziness on my brain's part =)


I was reluctant to post technical comments earlier because it's so easy to destroy and so hard to create and I think this sort of thing is a good step up from what we've been seeing. I really saw that you put a lot of work into this.

Since more people are interested in this now and the threads have been cross-referenced I think I'll write up some proper feedback. In the other thread I mostly just talked about the lack of data you showed us - no ANOVA tables for instance. Particularly because others like Shai has expressed interest in the data and we may see more people do tests, it would be good to improve things a bit before people put a lot of effort into analysing poor models.

I've finished reading your article, footnotes, and this thread. I don't think the references require checking, so I'll start writing it up when I have some spare time. I'll probably have it ready in a fortnight.


We will all appreciate the feedback, I'm sure. It would be really exciting to start getting some good, well-explained methodology into these types of questions. Given the sheer amount of data that can be collected from something like SCGears, it seems that the only thing stopping deeper analysis is the availability of deep data sets on SC2 games.
Warble
Profile Joined May 2011
137 Posts
Last Edited: 2011-06-08 03:38:01
June 03 2011 03:18 GMT
#130
Edit: I realised that it was a mistake to claim that ∑maps = 1. I had forgotten to account for mirror matchups. I have left this in because it doesn't make much difference.



Since Stra hasn’t replied for a while I’ll assume he’s gone on a prolonged absence so this won’t be directed at him but to others interested in conducting an analysis.

Stra’s work was definitely a step up from everything else we’ve seen. With some more work we might get meaningful analysis of publicly available game data. I think it was definitely an excellent effort and in the right direction and the sort of thing we should see more of.

One thing that bugged me most about his article was how keen he was in trying to show there was imbalance. The proper conclusion from his findings is that there is no evidence of racial imbalance at the GSL level but he only talked about how there are signs of imbalance and everything was geared around trying to show that. It’s very easy to use statistics to show whatever you want to show, it just depends on how deeply you are willing to bury the deception. I think in this case we can bill it to eagerness – after all, all of us here know the emotions that are stirred when thinking about imbalance. A side effect of his eagerness would have been haste – since he’s the first to do this, I find it completely understandable that he’d focus more on just getting something done to present to the community even if it wasn’t the most robust way about it. In terms of pioneering this sort of work, I think he’s done well and has possibly spared us from a lot of bad “analysis” others may have posted in the meantime since he’s raised the bar. My point here is that I don’t want this to be seen as criticism of him, but as ideas on how to improve on this in future work by anyone in the community willing to do it.

Since this can get rather involved, I’ll focus on what I think is most important for others to keep in mind when conducting an analysis in the future.

Model Specification

+ Show Spoiler +

I’ll admit that when I had a look at the model I was puzzled: “How did he get that to solve?” Sure, he’d applied lasso – but he implied that he’d run the regression without lasso since he said that he’d compared the two results. More on that in a moment. Let’s take a look at the model first to see why I was puzzled.

As we shall see, specifying models is hard. This model is misspecified because the variables are linear combinations of each other and so he should have perfect collinearity. Which means it shouldn’t solve. At all. You would get an error if you tried to solve it. If you keep insisting, your computer will literally grow a leg and kick you in the groin. I suspect this is the real reason he hasn’t been revisiting his thread: I was in hospital for 3 months when I tried it for the first (and last!) time, so he’ll probably be gone for another 8 weeks.

In this instance there are quite a few linear combinations present and they should be quite obvious when you know to look for them:


∑players = 2
∑maps = 1
∑maps(TvZ)= ∑players(T+Z)


You get the idea. There’s a bunch of them.

The first 2 are instances of the dummy variable trap. He eliminated the constant, which can compensate for 1 instance of the dummy variable trap, but cannot compensate for 2 instances. The proof:

+ Show Spoiler +

It is easier to show with a simpler model. Consider the classic male-female black-white model. That gives us 2 unique sets of linear combinations:


M+F=1
B+W=1


Note that M+F=B+W is just a combination of the unique sets. Consider the properly specified model:


y=β0* + β1*F


Note that the stars denote our well-specified parameters (they’re not multiplication signs).

Consider a model that compensates for the dummy variable trap by removing the constant:


y= β1.F+ β3.M
= β1.F+ β3(1-F)
=( β1- β3)F+ β3


In this case we still have a closed form solution:


β0*= β3
β1*= β1- β3


This was only possible because we’d dropped the constant. Otherwise we wouldn’t have a closed form solution:


β0*= β 0+β3
β1*= β1- β3


Now consider the properly specified model:


y= β0*+ β1*F+ β2*B


Consider a model that tries to compensate for the 2 linear combinations by removing the constant:


y= β1.F+ β3.M+ β2.B+ β4.W
= β1.F+ β3(1-F)+ β2.B+ β4(1-B)
=( β3+ β4)+( β1- β3)F+( β2- β4)B


In this case we don’t have a closed form solution:


β0*= β3+ β4
β1*= β1- β3
β2*= β2- β4


Removing the constant only allows us to compensate for one set of linear combinations.


The dummy variable trap is very easy to avoid so I won’t go into the details here.

The primary trouble comes from the third unique linear combination I presented above. The 2 sets of dummy variables are actually related to each other by several sets of linear combinations and I don’t think there’s an easy way around this. I think this model must be abandoned.

And even if the data allowed us to solve it, consider what that actually means. Aside from mistakes in the data, it means that some players switched races. How reliable do you think it would be using the same skill variable for the player in both races? That would make us unable to trust our estimates, right? So to account for this, we’d create a new variable for them in their off race – and end up back with perfect collinearity.

Another thing to avoid is dropping the constant, like in this model. This is generally bad practice and biases our estimators. Sure, it can help us avoid the dummy variable trap – but that doesn’t mean we should do it that way. Before removing the constant, we must always consider the consequences. In this case there was no justification for setting the intercept to 0. Even without all the other problems compounding to it, consider the logic behind the idea: it’s because we’re forcing the unmodified win rate to be 50%. And, sure, that’s what we would expect from our data – except some of the observations were removed and so we no longer expect a 50% win rate. And even if we had kept those observations, it would still be desirable to leave the constant in and let it solve to 0 by itself, with the benefit that if it doesn’t solve to 0 then we know to look deeper at the data. There’s no reason to remove it, since all that will come from it is biased estimators, which means we cannot trust our results.

So how did he get it to solve?

He got it to solve after all, didn’t he? And he did it without using lasso as well. The problem is…the way he did it basically imitated lasso and resulted in biased estimators.

The intuition of Elean and others were correct in that it didn’t make sense to use lasso to solve this, even if they couldn’t quite explain why. What he did was set some of the parameters to 0 – which is what lasso does – and that immediately eliminates the perfect collinearity problem. The only problem is that it makes the results meaningless since it’s undermined the logic of the model.

No matter how insignificant a parameter looks, if the logic behind the model dictates that a variable must be present, then we must keep it in our model. Setting its parameter to 0 eliminates the variable from the model and biases our other estimators. In this model, he wanted to control for map and racial imbalance and player skill, so setting a player’s parameter to 0 basically says, “This player has a base 50% chance of winning based on their skill.” Which, considering that he only removed the variables for players who played few games, we can generally say that many such players would have been eliminated early and some may even have dropped out of the GSL, so their base win rate would likely have been lower than 50%. However, we cannot say this for all players (maybe some were upstarts who had only recently entered the GSL).

What this means is that we cannot use lasso on this model. Doing so artificially deflates the variances, which makes the results look more significant than they really are, and introduce bias. That’s a very bad combination:

It makes the results biased and makes them look significant at the same time.

I would extend this by saying that we cannot use regularization techniques to solve this model at all.

My impression is that Stra was overly worried about overspecification and introduced bias into the model as a result of his fears. Risk of overspecification is preferable to bias. The irony is that he was right because the model was also overspecified. It’s just that the only real solution is to create a new model but he tried to salvage the model.


Analysis

+ Show Spoiler +

My biggest complaint is the lack of results that were presented. There were no ANOVA tables, no tables summarising the estimates and standard errors, nor anything else to help us evaluate the results for ourselves. So future analyses should publish their tables. Just put them in the appendices.

We’ve already discussed Stra’s concerns regarding overspecification. This issue was compounded by the fact that his tests showed that overfitting was not a big problem. (Although I would question the validity of those tests in this instance, I don’t think it’s an important topic to discuss here.) I can’t comment much more on this due to the lack of tables summarising his estimates. For now I’ll proceed under the assumption that the estimates for player skills were significant and in accordance with his tests showing that the model wasn’t overspecified.

In that case, the lack of significance in the racial imbalance parameters means there’s no evidence of racial imbalance while there is evidence that player skill plays a role in GSL results.

Interestingly, this lack of significance for the racial imbalance parameters is despite the estimates being biased and having inflated significance. It may be possible that the estimates for the player skills were also not significant, which is quite plausible considering the high level of multicollinearity we expect from this model. Our inability to assess this goes back to my primary complaint: lack of presented results in the report.

He displayed graphs showing that the estimates for player skill centred above 0 but didn’t talk about the primary cause of this, which was that he had set the parameters for players with few games to 0. If he hadn’t done that, those players would probably have had negative estimates and the estimates for player skill would have centered closer to 0. This centering near 0 would not necessarily have been the case if he’d had a properly specified model with a constant.

I would advise downplaying imbalance. A lot of the tests in this analysis seemed geared to show imbalance, and he didn’t highlight the point that the results showed no evidence of imbalance. Since he should have known that most of those reading his article would not have much of an understanding of statistics and would thus jump immediately onto the numbers for imbalance that appear non-zero while ignoring standard errors, it would have been prudent if he’d downplayed his numbers and placed more emphasis on the fact that they don’t show imbalance. This is something that’s too easy to forget and I urge those who publicly release the results of statistical tests on imbalance in the future to keep in mind.

The problem with hypothesis testing is that we can never prove the null hypothesis and only fail to disprove it. Considering the community’s propensity towards assuming imbalance, they will likely misinterpret any statistical conclusions by saying, “But the possibility still exists…” or, “But it almost looks significant,” or even the reverse, “The data shows no imbalance,” without realising it’s a moot point. So care must be taken when presenting the conclusions of these tests and maybe we can find a way to present any conclusions that can minimise these misunderstandings (I will be interested in hearing what methods others come up with).

I was quite surprised not to see a test on whether the variables for each matchup were jointly significant. That is to say, all of the maps jointly for TvZ, then for TvP, then PvZ, or even all 3 combined. If they all come back jointly insignificant, we have more weight to declare that there is no racial imbalance. With that said, even if we were as concerned as he was about overspecification, I wouldn’t respecify the model without them even if they were jointly insignificant since that would bias our other variables.

The benefit of such a test is that it also provides much better conclusive proof if imbalance does exist as well. Even if the estimates for imbalance on each map were insignificant, if they were jointly significant then we know that racial imbalances can affect the matchups, i.e. that they do exist in some form, and it’s just that it’s difficult to pinpoint where.

I’m not sure why he did bootstrap tests as I couldn’t see any rationale for them. The bootstrap tests essentially found (1-p) and they were in line with what we would have found calculating the p-value using just the estimates and standard errors. Hence they also supported our conclusion that there is no evidence of imbalance. I’m not quite convinced regarding his reasoning that bootstrap tests are necessary just because the logit model has no closed-form solution. I’m not too big on the maths but I think the lack of a closed-form solution is due to transforming the observations for the dependent variable via log(pi/(1-pi)), which for a binary variable is undefined. So I think the values are just adjusted a little so they’re not precisely 0 and 1 and converted that way, and this obviously has no closed-form solution. If there’s anyone here studying statistics who is familiar with the process, I would love clarification from you. In any case, assuming I’m right, while this means we can scale the estimates, it doesn’t actually affect their significance nor introduce any relative bias, so I believe we can just use basic inference methods. So bootstrapping is probably unnecessary for our purposes.


Further Design

+ Show Spoiler +

I think it’s important to make the data available to others. This will allow others to verify the work. So I would encourage anybody publishing their analysis also to publish their data sets. The lack of tables in Stra’s article was also troublesome, so I think it is a good idea to make them available in the appendices in the future.

I think he did well to identify the other drawbacks and uses of the analysis. I agree with his conclusion that this sort of analysis will be useful in identifying imbalanced maps. That sort of information would be useful to players, map makers and in balance discourse. I think imbalance on individual maps would provide an easy channel to help balance the game, and if there is also imbalance in aggregate then we could start thinking about tweaking the races themselves.

As an extension, I think it’s important to consider spawn positions. For example, I believe that although Metalopolis looks balanced overall, it is heavily imbalanced based on spawn positions. Consider if overall statistics for TvZ shows that both races have a 50% win rate on Metalopolis, and we ignore close air spawns. It’s commonly accepted that close spawns favour T. If T has a 70% win rate on close spawns, then for Metalopolis to have a 50% win rate for both races, then far spawns must necessarily favour Z by 70%. This essentially introduces luck into the TvZ matchup on the map, with spawns determining which race is favoured, and the map feels horrible to play as a result.

Hence it is desirable to account for spawn positions if possible.

This is particularly salient considering that many tournaments now exclude close spawns on this map. This represents a significant change in the map and so statistics for Metalopolis on old policies allowing close spawns will not be applicable to Metalopolis under current policies.

Stra identified a major difficulty with the data regarding the fact that many players only had 2 observed games, and almost half of the players had 5 or less games recorded. This makes it difficult to use a model that specifies player skill as a key variable, and we would expect high variances. We either need a model that doesn’t specify player skill, or we need to transform the data in some way.

As discussed earlier, we cannot set any of the parameters to 0. So we may be better off just removing observations for players with few games from the data so that we can remove their variables from the model. I believe Stra considered this since he discussed the need for data reduction.

That’s right – we can improve our analysis by using less data.

Let’s see if that gets quoted out of context, shall we? :-)

We may also consider looking for data in round robin tournaments, if any are frequently held.

With regards to formulating a new model, I have a few ideas but am hesitant to post them without having fully analysed them myself since I don’t want others to do a lot of work based on something I post only for me to later say, “Oh, but I found a drawback.” However, I have nothing against suggesting a few likely directions and letting you run with the ideas and doing your own models since then it’s all on you and I have already provided this caution. :-)

The challenge is that we can only use publicly available data, and I’m assuming that we only want to balance for the top level, so that means we can use tournament results. Since we are mostly interested in racial imbalance, that means we will need to retain those variables (while avoiding the dummy variable trap). Paradoxically, this means we cannot have variables for player skills. As we have seen here, that would just lead to problems with perfect collinearity.

This does not necessarily pose an intractable problem since the variables for imbalance will capture the effects of imbalance so long as we are able to capture the effects of player skill in such a way that bias is not a big problem.

In my opinion, the most likely avenues to explore at the moment are the use of proxy and instrumental variables. In particular, I have been looking at possible proxy variables that can stand in for player skill. I’ll leave it at that.


Final Words

I think Stra’s effort is definitely much better than the other statistical “analyses” we have been seeing, although that shouldn’t be discouraged altogether (from my other thread, if a current graph of win rates shows a rock-paper-scissors situation, that is strong indication that racial imbalance exists somewhere). I think it should be possible to get a meaningful model, although it is harder than it seems at first, as we’ve seen here. I think we can get some meaningful results and that it’s a matter of getting people with the right knowledge and time together to do it.

There will need to be some caution when it comes to publishing our findings, though, since we will need to keep in mind how the results will appear to those without training in statistics. I say this because I believe there is a good reason Blizzard has stopped releasing much statistics from the game and the community is apt to get overexcited.

There is also a question of motivation. Even if we do find imbalance, it won’t matter for the majority of players since we’re only examining the dynamics for the very top level. This could serve as an interesting exercise, may have applications in improving the game as a spectator sport, and may be of interest to players considering going pro. It could also be useful for map designers. However, all we’ll be able to find are the balances for the game in its current state. Further strategic development by the races without any balance changes by Blizzard could just as easily change the estimated imbalances in the future. In common parlance: the metagame may still evolve. So there is a risk that any results could be used to push for unwarranted balance changes. This probably isn’t a big enough concern to stop further analysis from being conducted since curiosity is a powerful force and analysis will be conducted anyway, so perhaps things will still turn out well if those conducting the analyses are moderate in their conclusions.
Normal
Please log in or register to reply.
Live Events Refresh
AllThingsProtoss
11:00
Team League - Quarterfinals
Gemini_19102
Liquipedia
[ Submit Event ]
Live Streams
Refresh
StarCraft: Brood War
Britney 57731
Horang2 7935
EffOrt 1613
Hyuk 726
Pusan 650
actioN 539
Nal_rA 379
Last 358
ggaemo 230
Mind 112
[ Show more ]
Aegong 104
PianO 92
Barracks 70
Hyun 62
sSak 48
Shinee 41
Backho 38
soO 30
yabsab 30
Sacsri 23
HiyA 20
Icarus 18
sorry 15
Movie 14
zelot 13
Terrorterran 8
eros_byul 2
Dota 2
Gorgc9352
Dendi2377
XBOCT894
XcaliburYe357
Counter-Strike
edward512
Super Smash Bros
Mew2King146
Heroes of the Storm
Khaldor364
Other Games
B2W.Neo4090
Beastyqt512
DeMusliM421
Fuzer 238
crisheroes185
ToD95
BRAT_OK 42
MindelVK14
Organizations
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 16 non-featured ]
StarCraft 2
• Berry_CruncH343
• Dystopia_ 1
• Migwel
• AfreecaTV YouTube
• sooper7s
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
StarCraft: Brood War
• RaNgeD 10
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• WagamamaTV820
League of Legends
• Jankos1509
• Stunt731
Upcoming Events
Road to EWC
1h 14m
BSL: ProLeague
5h 14m
Cross vs TT1
spx vs Hawk
JDConan vs TBD
Wardi Open
22h 14m
SOOP
1d 19h
NightMare vs Wayne
Replay Cast
1d 21h
Replay Cast
2 days
GSL Code S
2 days
Cure vs Zoun
Solar vs Creator
The PondCast
2 days
Online Event
3 days
Clem vs ShoWTimE
herO vs MaxPax
HupCup
3 days
[ Show More ]
GSL Code S
3 days
GuMiho vs Bunny
ByuN vs SHIN
Online Event
3 days
Replay Cast
4 days
CranKy Ducklings
5 days
Replay Cast
6 days
Sparkling Tuna Cup
6 days
Liquipedia Results

Completed

BSL 2v2 Season 3
2025 GSL S1
Calamity Stars S2

Ongoing

JPL Season 2
YSL S1
BSL Season 20
China & Korea Top Challenge
KCM Race Survival 2025 Season 2
NPSL S3
Rose Open S1
DreamHack Dallas 2025
Heroes 10 EU
ESL Impact League Season 7
IEM Dallas 2025
PGL Astana 2025
Asian Champions League '25
ECL Season 49: Europe
BLAST Rivals Spring 2025
MESA Nomadic Masters
CCT Season 2 Global Finals
IEM Melbourne 2025
YaLLa Compass Qatar 2025
PGL Bucharest 2025
BLAST Open Spring 2025
ESL Pro League S21

Upcoming

CSL 17: 2025 SUMMER
Copa Latinoamericana 4
CSLPRO Last Chance 2025
CSLAN 2025
K-Championship
SEL Season 2 Championship
Esports World Cup 2025
HSC XXVII
Championship of Russia 2025
Bellum Gens Elite Stara Zagora 2025
2025 GSL S2
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1
BLAST.tv Austin Major 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.