|
Analyzing Ladder Map Balance Statistics
Introduction:
The balance of maps in Starcraft II is a topic that is discussed quite a bit, and as with all things balance related a lot of people have very strong opinions about it. Almost every Starcraft series is preceded by a ban phase for maps, and the presence or lack of features like overlord pods or reaper jump-up spots has led to a surprising amount of drama. And of course the discussion about maps ties into the broader discussion about balance; for instance on whether the recent resurgence of terrans is due to #terranpatch or #terranmaps .
All these arguments about map balance are usually quite subjective with pros, mapmakers and the community all having their differing takes on the state of things. We are lucky enough to have Liquipedia to provide some data about map balance, but that data can be misleading at times. Given that I find maps interesting and had some time on my hands I decided to do some more analysis based on that data.
Most of the time the data agrees with the prevailing community opinion about maps, which often follows from the prevailing opinion of pros. But sometimes the data disagrees, or there isn't a consensus opinion among pros, or the community feels differently about a map than the pros do. And that's where extra analysis of the data is useful. Here's a few examples I can think of where opinions have differed:
- When Zen came out the community immediately decided that PvZ was unwinnable on it due to the natural. However even before Blizzard changed the natural due to the complaints, statistics wise Zen was close to the best map in the pool for PvZ. Based on vetoes I do think that most of the pros knew that the map was actually just fine for PvZ.
- Golden Wall is a very rare and exciting case where the pros haven't figured out what the best way to play the map is, and even after more than a season on ladder still aren't sure about which way all of the match-ups lean (I think some European pro said that on stream during an event recently).
- On Cyber Forest lots of casters and high level Protoss players thought that tank pushes were extremely strong on it thus making the map good for terran. The Liquipedia statistics said that PvT was at 60% in Protosses favor at the time. It's quite clear based on the vetoes of Korean terrans (once Year Zero and Automaton rotated out) that they, however, thought the map was really bad for terran.
- During an episode of In-Depth which I'd highly recommend watching (+ Show Spoiler +) Scarlett mentioned that she didn't veto Nightshade because unlike some other Zergs she didn't think it was much better for Terran than the other maps. This was partially due to Liquipedia having the winrate at 52% which wasn't so different from the other maps. However, (and this is where statistics can be misleading) if you filter out games between lower level players, Nightshade was quite decidedly the second best map for Terran in the pool (after Purity and Industry).
Methodology:
Liquipedia was the obvious starting point, as the only large repository of game results by maps available for SCII. The main problem with it was that it sometimes includes games from lower level players that aren't too meaningful to map balance at the highest level. The Alpha SC2 Team League/Season 16 is a great event, but I still wouldn't want to include most of its games in map balance analysis.
Luckily Liquipedia has an API and I could retrieve the list of games being used for its map statistics (if you want to look up games without the API you can also do something like this: index.php?title=Special:SearchByProperty&limit=500&offset=0&property=Is+played+on&value=Golden+Wall+LE ). After retrieving the list and doing some cleaning up (I ended up with a few duplicate games in the list in cases where games were on more than one page such as Path of Star and Path of Star/30 . I assume there might be other APIs to avoid that problem.) I had a dataset I could use.
My initial goal was to filter out games between low level players to obtain more meaningful statistics. To figure out which players are low level or not, I turned to Aligulac (which also has a nice API). I chose to only include games between players in the top 100 of aligulac which cut out roughly half the games for most maps, and using that I obtained some nice filtered winrates which I thought would be more meaningful. The results of this exercise can be seen in this spreadsheet: LotV Map Balance
There were often interesting differences between the raw results from Liquipedia and the results including only games between top 100 players. In most cases it magnified the imbalance (e.g Nightshade goes from 55% TvZ to 63%). Here's a chart I made for the maps from last season:+ Show Spoiler +
However this still fails to account for the difference between players. At the moment of writing this Aligulac tells me that Reynor should win roughly 2/3 of his games against Neeb. Now if they both play a bunch of games on a map and Reynor wins 2/3 of the games as expected, then we'd have the map results for ZvP be ~67% which is very zerg favored. And this is where the concept of rating adjusted map winrate comes in. If our goal is to figure out the balance of a map and try to ignore the overall balance of the game as much as possible we can weigh the games by their predicted outcome. In the aforementioned example Reynor defeating Neeb would only be weighted half as much Neeb defeating Reynor since he's expected to win twice as often, which would leave this imaginary map's balance at 50% despite two thirds of the games being won by Zerg.
This is not a measure of absolute map balance, but rather a relative one measured against the other maps we've had recently. Given that the ratings of players also adjust based on their result, the average of the 'rating adjusted map results' for all the maps in the pool should always end up around 50% in the long term. This happens even if all the maps favor one race or if the balance of the game favors one race. 'Rating adjusted map winrates' are completely useless for figuring out the balance of the game, but very useful for comparing maps in the same pool against each other.
Most of the map results from Liquipedia had dates that could be parsed out, so I could use the elo and deviation data from Aligulac to predict the outcome of those games at the beginning of the time period where they happened (using the formula from https://github.com/TheBB/aligulac/blob/master/aligulac/simul/playerlist.py, and use this to weigh the games accordingly.
Before I present the data for the current map pool, I'd be remiss if I didn't briefly discuss confidence intervals. They're an invaluable part of entry level statistics courses, and somehow completely ignored by all manner of SCII balance discussions where people try to convince you that ZvT is broken based on the twenty games played in ESL Pro Tour/2020/21/Masters/Summer.
There aren't a ton of map results recorded, so the confidence interval is usually quite large. For Ever Dream there are currently 525 TvZs recorded on Liquipedia of which Terran won 283, which gives us 53.9% ± 3.7 at 90% binomial confidence interval. And that's after a season and a half on ladder and before filtering out half the games for involving weaker players. Most maps (especially the ones that just debuted a month and a half ago) have far fewer results and accordingly much larger error bars.
With these caveats let's look at the current ladder maps:
Current Ladder Maps:
I assume everyone skipped over the methodology and just jumped ahead to this. With that in mind here are the unfiltered balance results from Liquipedia as of July 25th 2020:
This doesn't look all that bad. Ice and Chrome might be a bit too zerg favored in ZvP, but the median map in all the match-ups aren't too far off from 50%.
Here are the results filtered to only include games between top 100 players:
This is not as nice. Ice and Chrome is way too good for zerg in ZvP, and Submarine is way too good for terran in general. ZvP's very favorable for zerg, and TvZ and TvP are both good for terran.
Here are the results filtered and also rating adjusted (useful to compare maps, not to find the absolute balance of maps):
The 'rating adjusted winrates' seem to consistently adjust things in Protoss' favor currently, since they are usually the underdog elo-wise, but it could also suggest that they have been performing a bit better since the patch. This is also the chart you'd look at to see what maps the data suggests to veto (though in practice personal preferences and builds also play a big part in this). Of course, as mentioned above, Deathaura, Pillars of Gold, Submarine and Ice and Chrome are quite recent so I wouldn't be surprised to see winrates shift quite a bit.
This was mainly a fun side-project for me, so yeah if anyone finds this useful, or has any questions please tell me. I might do an update at the end of the season to see how things evolved.
|
Nice work mate. I love when people do this shit for fun.
Kinda feels like it needs some sort of big overarching conclusion or argument to tie everythning together. (Sorry, my experience as a teacher and history research related work drills into me the "yeah, but whats your point, how are you breaking ground?" line of thought). I'd like to see some work done on exactly how certain elements of a map give advantages to different races, i.e. if the presence of gold bases is an element which heavily impacts zvp winrates.
I'm always interested to see how date changes based on it's filter criteria.
Some random conclusions from the work - These maps look slightly protoss favored, and they still have kinda sucked a bit. Looks like everyone but Terrans should ban submarine constantly, and probably just isn't a good map. Eternal Empire looks like the most balanced of the maps. Death Aura and Ever Dream are comparable but a bit less so. Pillairs of Gold and Goldenwall are also potential bans, and are a bit unbalanced. Avoid Ice and Chrome heavily in PvZ.
|
Italy3197 Posts
|
Great work ~
To be honest Nightshade was a more comfortable map for me back then; so I kinda used the liquipedia stats to try to justify playing it ^^ I had a lot of success with 50~ drone roach attacks on that map; but yes the map layout made it almost impossible to play muta ling bane for example as you couldn't defend the 4th vs 3base 8rax push
In regards to Zen; even if it may seem zerg favoured at first glance, any map where the rush distance is short enough for protoss to block zerg's natural (even down to 15hatch) with a standard 14pylon>scout gives protoss a huge opening advantage, as zerg needs to choose between playing 14hatch before overlord or taking the 3rd as their natural every game (otherwise you can gamble on such openings as gas>pool or 12pool if you think the protoss doesnt know how to respond safely)..
Also I believe at the start of last season the natural was adjusted slightly so that both sides of the entrance wall could be powered by the same pylon, allowing the protoss to hide tech with 2nd pylon in main again
|
I realy hope people read the methodology part as well. Thx for the great workand detailed analysis
|
As a statistician and data scientist, I really enjoyed what you did here!
|
This was an interesting read. Do you believe that there is a selection bias? Each tournament has their own map veto process, so players who are competing are going to veto their lowest % map if they can, which will bias our sample selection and results. I don't think that that's a good or a bad thing, but I am not sold on the data telling us about map balance or "relative map balance."
|
On July 27 2020 00:26 DuckS wrote: This was an interesting read. Do you believe that there is a selection bias? Each tournament has their own map veto process, so players who are competing are going to veto their lowest % map if they can, which will bias our sample selection and results. I don't think that that's a good or a bad thing, but I am not sold on the data telling us about map balance or "relative map balance."
If everyone veto their weakest/less favorable maps (and they obviously do in tournament), then the variable should effectively be canceled out. If a map is vetoed a lot, it will be played less which will reduce sample size, but it has no bearing on the quality of the data for a single result. Please correct me if I'm wrong.
Very nice thread, I love the data-driven analysis and I couldn't find any faults in your methodology.
|
On July 27 2020 01:42 fastr wrote:Show nested quote +On July 27 2020 00:26 DuckS wrote: This was an interesting read. Do you believe that there is a selection bias? Each tournament has their own map veto process, so players who are competing are going to veto their lowest % map if they can, which will bias our sample selection and results. I don't think that that's a good or a bad thing, but I am not sold on the data telling us about map balance or "relative map balance." If everyone veto their weakest/less favorable maps (and they obviously do in tournament), then the variable should effectively be canceled out. If a map is vetoed a lot, it will be played less which will reduce sample size, but it has no bearing on the quality of the data for a single result. Please correct me if I'm wrong. Very nice thread, I love the data-driven analysis and I couldn't find any faults in your methodology.
No, this reply means you don't understand what selection bias is. If players of a particular race veto their worst maps, then we are omitting data that would have been included in our sample otherwise. Example: even if Zen was truly the worst PvZ map, the data we have here could not tell us because players who hate it veto it and players like Parting will robo cannon rush with like 95% accuracy. This would not tell us anything about map balance.
|
Italy12246 Posts
On July 27 2020 02:01 DuckS wrote:Show nested quote +On July 27 2020 01:42 fastr wrote:On July 27 2020 00:26 DuckS wrote: This was an interesting read. Do you believe that there is a selection bias? Each tournament has their own map veto process, so players who are competing are going to veto their lowest % map if they can, which will bias our sample selection and results. I don't think that that's a good or a bad thing, but I am not sold on the data telling us about map balance or "relative map balance." If everyone veto their weakest/less favorable maps (and they obviously do in tournament), then the variable should effectively be canceled out. If a map is vetoed a lot, it will be played less which will reduce sample size, but it has no bearing on the quality of the data for a single result. Please correct me if I'm wrong. Very nice thread, I love the data-driven analysis and I couldn't find any faults in your methodology. No, this reply means you don't understand what selection bias is. If players of a particular race veto their worst maps, then we are omitting data that would have been included in our sample otherwise. Example: even if Zen was truly the worst PvZ map, the data we have here could not tell us because players who hate it veto it and players like Parting will robo cannon rush with like 95% accuracy. This would not tell us anything about map balance.
Yea, but the main effect of this would be to reduce the sample size, which in turn would increase the error bar on the win rates on the map
Re the OP, as a fellow data nerd, nice work!
|
On July 26 2020 20:21 Russano wrote: Nice work mate. I love when people do this shit for fun.
Kinda feels like it needs some sort of big overarching conclusion or argument to tie everythning together. (Sorry, my experience as a teacher and history research related work drills into me the "yeah, but whats your point, how are you breaking ground?" line of thought). I'd like to see some work done on exactly how certain elements of a map give advantages to different races, i.e. if the presence of gold bases is an element which heavily impacts zvp winrates.
I'm always interested to see how date changes based on it's filter criteria.
Some random conclusions from the work - These maps look slightly protoss favored, and they still have kinda sucked a bit. Looks like everyone but Terrans should ban submarine constantly, and probably just isn't a good map. Eternal Empire looks like the most balanced of the maps. Death Aura and Ever Dream are comparable but a bit less so. Pillairs of Gold and Goldenwall are also potential bans, and are a bit unbalanced. Avoid Ice and Chrome heavily in PvZ.
I was trying to stick to the statistics and stay away from more 'opinion-based' things, which is difficult when analyzing map features. Trying to do regression analysis on different map features is a bit tough, since it's hard to isolate a feature, and there aren't enough maps to get statistically useful information. A gold base on Golden Wall plays very differently than one on Dreamcatcher or Prion Terraces. I did a little of that at one point, and did find that contrary to popular opinion terran performs better on larger maps (mostly due to the existence of very droppable large maps with long rush distances like Disco Bloodbath or Acid Plant), but it was very loosely correlated, so hard to draw conclusions from.
On July 26 2020 20:54 Scarlett` wrote: Great work ~
To be honest Nightshade was a more comfortable map for me back then; so I kinda used the liquipedia stats to try to justify playing it ^^ I had a lot of success with 50~ drone roach attacks on that map; but yes the map layout made it almost impossible to play muta ling bane for example as you couldn't defend the 4th vs 3base 8rax push
In regards to Zen; even if it may seem zerg favoured at first glance, any map where the rush distance is short enough for protoss to block zerg's natural (even down to 15hatch) with a standard 14pylon>scout gives protoss a huge opening advantage, as zerg needs to choose between playing 14hatch before overlord or taking the 3rd as their natural every game (otherwise you can gamble on such openings as gas>pool or 12pool if you think the protoss doesnt know how to respond safely)..
Also I believe at the start of last season the natural was adjusted slightly so that both sides of the entrance wall could be powered by the same pylon, allowing the protoss to hide tech with 2nd pylon in main again
That's interesting to hear. There's always an element of strategy or personal preference to vetoes that doesn't necessarily reflect the data.
On July 27 2020 00:26 DuckS wrote: This was an interesting read. Do you believe that there is a selection bias? Each tournament has their own map veto process, so players who are competing are going to veto their lowest % map if they can, which will bias our sample selection and results. I don't think that that's a good or a bad thing, but I am not sold on the data telling us about map balance or "relative map balance."
Yes there are a few biases at work. The veto process can especially affect heavily vetoed maps, where players only choose to play the really bad maps if they have something special prepared (e.g Year Zero was heavily vetoed by terran in TvP, but when they did play it they often went for something offbeat like mech). And of course it reduces the number of data points.
Additionally there is some selection bias in what results get recorded in Liquipedia. There's a Liquipedia editor who is a big fan of Heromarine so Liquipedia has better data on his map results in the ESL Weekly cup than for other players for example. And sometimes some players play so many online events that their results are disproportionately represented in Liquipedia over other players of their race. At one point Zest and MaNa alone accounted for 1/8 of the PvZs played on Nightshade, and they were both so successful on that map that the two of them swung the winrate 5%.
These biases are certainly things to keep in mind that do influence the results somewhat, but they still do reflect the strength of the race on that map so it's not a problem per se. The bigger issue to my mind is still always the relatively small sample sizes for some of the maps, which means there's a lot of uncertainty.
|
Italy12246 Posts
I think you could do some basic things like win rate as a function of rush distance, or something more complex like main to third distance, or possibly even splitting maps in categories like in TLMC.
In order to do it correctly ideally you would need multiple map pools played on the same balance patch though. Alternatively, you could try to do so for a long period of time (e.g. all of LotV) and then also for sub-sets played on the same balance patch, to have a feel for how the patches changed how races approach maps.
Actually, that sounds fun. If you want to do that project PM me and I'd be happy to help
|
On July 27 2020 04:39 Teoita wrote:I think you could do some basic things like win rate as a function of rush distance, or something more complex like main to third distance, or possibly even splitting maps in categories like in TLMC. In order to do it correctly ideally you would need multiple map pools played on the same balance patch though. Alternatively, you could try to do so for a long period of time (e.g. all of LotV) and then also for sub-sets played on the same balance patch, to have a feel for how the patches changed how races approach maps. Actually, that sounds fun. If you want to do that project PM me and I'd be happy to help 
I found some older graphs I did for rush distance (just using the raw numbers from Liquipedia without any processing), and it wasn't very meaningful. The difficulty for more complex map features is data collection, since it would probably need to involve collecting the data manually, so I'm not planning on doing anything of that sort right now. But I'll be sure to PM you if I do.
![[image loading]](https://imgur.com/Z8E7PYA.jpg)
![[image loading]](https://imgur.com/WApF21d.jpg)
![[image loading]](https://imgur.com/wWiFRdP.jpg)
Incidentally if anyone wants the raw data or the python scripts that I used to gather and analyze the data just message me. I do update this spreadsheet with winrates at the end of seasons: LotV Map Winrates. Haven't updated it with rating adjusted winrates yet.
|
Italy12246 Posts
Lol, those are some astrophysics level "correlations". On a serious note,no correlation is still an interesting result though - it shows that rush distance does not impact win rate (contrary to popular belief that e.g. Zerg needs to be the reactive/defensive race only a-la WoL).
Anyway yeah, individual features are too messy which is why I suggested broad categories instead (e.g. "rush maps", "gold maps", "macro maps" etc). It still gives an incomplete picture, but it's the best one can do and it's an interesting step regardless.
|
Great work. Thanks for posting it.
|
A very good read. I hope we can see this kind of thread more.
I would like to see the confidence interval of the analysis, because you know, it is how I could tell if the data is statistical significant or not.
Also, it is interesting to see that for the 3 maps that have not been rotated out, and with a terran buff and zerg nerf patch, 2 out of 3 map have an increase in zerg win rate (although is it statistical significant is not know).
For reference, TVZ win rate on golden wall goes from 56 to 56.7, on ever dream 58 to 55.4, eternal empire 56.5 to 55.5.
|
On July 27 2020 05:10 Teoita wrote: Lol, those are some astrophysics level "correlations". On a serious note,no correlation is still an interesting result though - it shows that rush distance does not impact win rate (contrary to popular belief that e.g. Zerg needs to be the reactive/defensive race only a-la WoL).
Anyway yeah, individual features are too messy which is why I suggested broad categories instead (e.g. "rush maps", "gold maps", "macro maps" etc). It still gives an incomplete picture, but it's the best one can do and it's an interesting step regardless.
Well you're not usually reacting to a move out. You don't start building units to defend a 2-1-1 when you see 2 medivacs fly out of the base - you'd lose.
You scout the 2-1-1 way earlier at like 3:30 and start building lings at 4:30, and it doesn't move out until like 4:45. It's still reactive, it just has nothing to do with maps
|
Italy12246 Posts
On July 27 2020 20:59 InfCereal wrote:Show nested quote +On July 27 2020 05:10 Teoita wrote: Lol, those are some astrophysics level "correlations". On a serious note,no correlation is still an interesting result though - it shows that rush distance does not impact win rate (contrary to popular belief that e.g. Zerg needs to be the reactive/defensive race only a-la WoL).
Anyway yeah, individual features are too messy which is why I suggested broad categories instead (e.g. "rush maps", "gold maps", "macro maps" etc). It still gives an incomplete picture, but it's the best one can do and it's an interesting step regardless. Well you're not usually reacting to a move out. You don't start building units to defend a 2-1-1 when you see 2 medivacs fly out of the base - you'd lose. You scout the 2-1-1 way earlier at like 3:30 and start building lings at 4:30, and it doesn't move out until like 4:45. It's still reactive, it just has nothing to do with maps
This is only true to some extent tbh. A shorter rush distance also affects how quickly you can get your units in position, building static defense if necessary, cutting off reinforcements, etc. There's a reason why holding a natural on Steppes of War was impossible.
Also in general, the pace of the game in LotV is such that often times you can only scout the moveout, depending on the matchup and build.
|
|
|
|