Two years ago we examined a data sample of 8 million brood war replays from repmastered. If you're seeing this for the first time, I highly recommend reading the older post first.
The data sample has grown to 70 million replays, so it seemed like a good time to revisit. There are two parts to this. The first part is similar to what we looked at last time, i.e. how win rates for specific match-ups develop during the game, just with more data and split by map. This is now possible due to the much bigger data sample.
The second part is new, wherein we'll take a look at how balanced spawn locations are. We'll look at this both for mirror match-ups and for non-mirror matchups.
Did you ever wonder what the best spawn is as Terran in TvZ on Fighting Spirit? Did you ever feel like your timings are much more crisp when you spawn at 12 o'clock with Zerg on Dominator? Is there a difference between vertical and horizontal entrances at the natural expansion for Protoss in PvZ? Could it even be that some specific spawn locations are so much better, that an otherwise disadvantaged matchup becomes advantageous? Let's find out.
Data
+ Show Spoiler +
The dataset comprises roughly 70 million 28 million 1v1 ladder games played since the start of 2018, but by far the biggest part came in during the last 2 years. This is perfect, since we should care much more about recent games on new ladder maps, than what happened long in the past.
The information from the dataset doesn't provide complete information from a replay but rather some extracted data. Build order or income details are not available.
The dataset did include:
To refine the dataset, a few filters were applied:
Game duration > 2 minutes
Exclude draws
Exclude games with afk players
Exclude games on fastest maps and similar
Exclude games on maps with fewer than100.000 60.000 games (changed to include new ladder maps such as Roaring Currents)
See below for a map frequency histogram. The rectangle-name-map is FS1.3 and FS1.4, since it's the only map with a korean name.
![[image loading]](https://i.ibb.co/cS60p7VN/mapshist.png)
The information from the dataset doesn't provide complete information from a replay but rather some extracted data. Build order or income details are not available.
The dataset did include:
- Player races
- Game winner
- Game duration
- Spawn locations
- Player MMR
To refine the dataset, a few filters were applied:
Game duration > 2 minutes
Exclude draws
Exclude games with afk players
Exclude games on fastest maps and similar
Exclude games on maps with fewer than
See below for a map frequency histogram. The rectangle-name-map is FS1.3 and FS1.4, since it's the only map with a korean name.
![[image loading]](https://i.ibb.co/cS60p7VN/mapshist.png)
Player population
Here we see how the ladder MMR is distributed between the players of different races. The vertical lines represent the mean of the distributions.
In the raw counts + Show Spoiler +
![[image loading]](https://i.ibb.co/c74M6BW/mmr-hist-races-counts.png)
We can also look at this after normalizing each of the distributions: + Show Spoiler +
![[image loading]](https://i.ibb.co/svZygXXX/mmr-hist-races-normalized.png)
Part 1, Win rates vs. game time
These distributions were created in the same way as described here. I think the write-up and the discussion in the comments in the old thread on what we see in the plots was quite good, so I recommend reading that.
Just to have a short explanation: the colored data points with the colored line tells us how the win rate changes during the match, the relevant legend is on the left-hand side. Additionally we have a dotted, straight green line at exactly 50% win rate, which makes it easier for us to see where the "perfect balance" is.
In the background, we have blue bars superimposed, which tell us how many games ended in a given 1-minute interval. The relevant legend is on the right side. On top of the blue bars, we have percentage numbers. These tell us, which fraction of all games ended before a given point in time. For example, in the PvT graph, we can see a number of 35% above the blue bar for the 10-11 minute interval. This means that ~35% of all PvT games end before 11 minutes. Note that these percentage numbers go towards 100%, as the time increases - which makes sense, since all games have to end at some point, and almost all games end somewhere before 40 minutes.
PvT, overall
+ Show Spoiler +
![[image loading]](https://i.ibb.co/SDbSRzj9/racial-win-rate-vs-time-pvt.png)
PvT by MMR brackets
+ Show Spoiler +
![[image loading]](https://i.ibb.co/fsGBZJR/racial-win-rate-vs-time-pvt-selectioncomparison-MMR-Groups.png)
PvT on 4-player maps, cross- vs close spawn
+ Show Spoiler +
![[image loading]](https://i.ibb.co/mVwv6dkM/racial-win-rate-vs-time-pvt-selectioncomparison-4p-Close-Cross.png)
PvZ, overall
+ Show Spoiler +
![[image loading]](https://i.ibb.co/yFRZr7r7/racial-win-rate-vs-time-pvz.png)
PvZ by MMR brackets
+ Show Spoiler +
![[image loading]](https://i.ibb.co/p6Z0c4sZ/racial-win-rate-vs-time-pvz-selectioncomparison-MMR-Groups.png)
PvZ on 4-player maps, cross- vs close spawn
+ Show Spoiler +
![[image loading]](https://i.ibb.co/xqyXLKRb/racial-win-rate-vs-time-pvz-selectioncomparison-4p-Close-Cross.png)
TvZ, overall
+ Show Spoiler +
![[image loading]](https://i.ibb.co/wjMsZmy/racial-win-rate-vs-time-tvz.png)
TvZ by MMR brackets
+ Show Spoiler +
![[image loading]](https://i.ibb.co/67t96Lr1/racial-win-rate-vs-time-tvz-selectioncomparison-MMR-Groups.png)
TvZ on 4-player maps, cross- vs close spawn
+ Show Spoiler +
![[image loading]](https://i.ibb.co/8Ds112VQ/racial-win-rate-vs-time-tvz-selectioncomparison-4p-Close-Cross.png)
You can find a lot more data on a per-map basis here.
When looking at these win rates vs time on a per-map basis, there's quite a bit of variation. While most maps tend to follow the very same trend as we see on the average for all maps, the deviations are actually quite significant. Especially when we compare two maps to each other, we can tell that specific builds or timings differ in their relative strength depending on the map. For example consider
TvZ on Eclipse and Neo Dark Origin, both 2-player-maps: + Show Spoiler +
![[image loading]](https://i.ibb.co/21fckKmf/racial-win-rate-vs-time-tvz-Eclipse-1-2.png)
![[image loading]](https://i.ibb.co/B2hTrvMW/racial-win-rate-vs-time-tvz-Neo-Dark-Origin-2-0.png)
To me, the most striking difference is the mid-game strength of Terran. While we see that Terran dominates the match-up as usual at ~9-16 minutes, this is much more pronounced on Eclipse, in agreement with the consensus that Eclipse is a particularly hard map for Zergs to get into a late-game 4gas situation.
Now if we go into specifics and smaller details, we can consider that the rush distance on Eclipse is a bit longer compared to most other 2-player-maps, so things like 2-Rax sunken busts are a bit weaker.
We can see this at the peak around 5-6-7 minutes, where the win rate shoots up for Terran. Note the peak is both lower and later in the case of Eclipse: the sunken bust hits later and is less potent. On the other hand, when we look at what happens around 7-8 minutes, we see a dip caused by Mutalisk timings. We can instantly tell that, whatever the reason, Mutalisks hit much harder on Neo Dark Origin, compared to Eclipse.
I think generally this is the most important takeaway and how this data can be used. You can consult these plots to see whether a particular build is comparably stronger or weaker on a specific map.
Part 2, Win rates by spawn location
Now, to the new part. In the following, we'll be looking at win rates for a specific spawn, on a given map, in a given matchup.
So each of the plots will have exactly that information: map, matchup, and how the spawn locations differ between each other.
Consider the example of ZvZ on Dominator: + Show Spoiler +
![[image loading]](https://i.ibb.co/W4QcFnr2/spawn-winrate-radial-3p-ALLplayers-zvz-Dominator-1-2.png)
We can see that the 12 o'clock is by far the best spawn location. The red-green-color gradient at the bottom is telling us whether a spawn is at advantage or at disadvantage. For the case of mirror match-ups, this is the easiest, since the expected win rate everywhere should be 50%. So a deviation below that will turn into red, and anything above becomes green. At the very center, close to 50%, the color scheme becomes white, which means that, the more color you see on this plots, the more overall imbalance there is. If you see a plot with very little color, this means that all spawn locations are very similar to each other.
Next, let's consider another example, TvP on Polypoid. + Show Spoiler +
![[image loading]](https://i.ibb.co/VWBLfWf5/spawn-winrate-4p-MMR-above2100-tvp-Polypoid-1-75.png)
The only difference here is that we no longer use the 50% as benchmark for a "balanced" spawn. Instead, the benchmark is the overall win rate (for this matchup) for this map. For example, Terran has an overall win rate of 51.1% in TvP on this map, as given in the central text box. Thus to evaluate the spawn, we consider the difference to this 51.1% to figure out what the best/worst spawns are.
Another difference in this particular example is that an MMR bracket was selected, as shown in the info box. I'm using the same definition for this as last time. + Show Spoiler +
A match-mmr is determined by taking the average MMR of the two players (initial MMR values before the match was played). Several brackets of games are defined and compared. Additionally, a requirement is imposed on the max difference of player MMR, to ensure only games of a meaningful skill difference are taken into account.
I generally wanted to see how this imbalance for spawn locations behaves as we examine players of different strength. Currently, there are three brackets:
- all players
- match MMR > 1900
- match MMR > 2100
If deemed relevant, we can adjust them, but it seemed sufficient so far.
I generally wanted to see how this imbalance for spawn locations behaves as we examine players of different strength. Currently, there are three brackets:
- all players
- match MMR > 1900
- match MMR > 2100
If deemed relevant, we can adjust them, but it seemed sufficient so far.
At last, note that while previously it was sufficient to e.g. look at a graph for PvT to understand both the perspective of the Terran and the perspective of the Protoss, for this map-and-matchup-and-spawn-specific plots we must consider these independently. So we end up with 9 perspectives which need to be considered: zvz, tvt, pvp, pvz, pvt, tvz, tvp, zvp, zvt.
Now we can dive into the actual results - let me start by saying that I was very surprised by what I saw.
For a given map, more often than not, the spawn location seems to actually be the most deciding factor. I expected the differences to be maybe at the order of a few percent, but the actual differences are often above 5% and sometimes even at 10%. Isn't that crazy? Two years ago, we also examined at how matchup win rates depend on cross-spawn vs close-spawn on 4-players map, and there is a significant and measurable effect. However, pretty often it is completely dwarfed when compared to the influence of the specific spawn.
If you think about how many of the recent balance discussions here on ZvP centered on matchup differences of like 48-52%, but its actually more like
"yeah so for PvZ, as Protoss on Fighting Spirit, you just need to spawn top right and you are miles ahead"
and this usually persists throughout the MMR brackets or becomes even more severe. Just look at this:
+ Show Spoiler +
![[image loading]](https://i.ibb.co/WNMNTt9w/spawn-winrate-4p-ALLplayers-pvz-1-3.png)
![[image loading]](https://i.ibb.co/39SKSrsP/spawn-winrate-4p-MMR-above1900-pvz-1-3.png)
![[image loading]](https://i.ibb.co/v9nZ6bP/spawn-winrate-4p-MMR-above2100-pvz-1-3.png)
What the heck? So when I first saw some of these, my first thought was that something might be wrong with the data or my methods, so I started checking things to confirm everything works as expected and I didn't mess up. I've arrived at a few cases where it should be easy to tell was is supposed to happen, and it does happen every single time. Thus I'm inclined to believe what we see in the data is overall correct. Let me show you these cases.
case 1, FS1.3 -> FS1.4
Luckily, there are a few maps in the pool, where just tiny changes were applied to specific spawns. For example, when we went from FS1.3 to FS1.4, the one relevant change was the layout of the minerals on the left-side spawns. It has become known that right-side spawn was mining faster, so we switched to "L"-shaped mineral layouts on the left side. This difference in mining rates is most relevant for Zerg, since they mine with the least amount of drones.
FS1.3, ZvZ
+ Show Spoiler +
![[image loading]](https://i.ibb.co/KxXSBgyg/spawn-winrate-4p-ALLplayers-zvz-1-3.png)
![[image loading]](https://i.ibb.co/bgbbTrmg/spawn-winrate-4p-MMR-above2100-zvz-1-3.png)
FS1.4, ZvZ
+ Show Spoiler +
![[image loading]](https://i.ibb.co/R4pLMPBV/spawn-winrate-4p-ALLplayers-zvz-1-4.png)
![[image loading]](https://i.ibb.co/K1YDfMQ/spawn-winrate-4p-MMR-above2100-zvz-1-4.png)
As expected, in each of the brackets we see the advantage of right-side spawn shrinks, when we compare FS1.4. to FS1.3. The same is true for other Zerg matchups when looking at 1.3 -> 1.4., where the left side spawns become a bit better. There are some fluctuations, but the trend goes in the right direction.
case 2, FS top right spawn Protoss
I actually didn't know about this, so I was very glad when eon mentioned that top right spawn is known to be advantageous for protoss. These advantages in mining should be most relevant in the mirror matchups, and this is what we see.
PvP on FS1.3+ Show Spoiler +
![[image loading]](https://i.ibb.co/qM5n1dMy/spawn-winrate-4p-MMR-above2100-pvp-1-3.png)
PvZ on FS 1.3 + Show Spoiler +
![[image loading]](https://i.ibb.co/v9nZ6bP/spawn-winrate-4p-MMR-above2100-pvz-1-3.png)
Protoss spawns top right on FS = big peepee.
It looks like the change to 1.4 partially improved this in PvP, where now top left is also a good spawn. + Show Spoiler +
![[image loading]](https://i.ibb.co/NnsLSxrK/spawn-winrate-4p-MMR-above2100-pvp-1-4.png)
![[image loading]](https://i.ibb.co/CpP9hv0t/spawn-winrate-4p-MMR-above2100-pvt-1-3.png)
![[image loading]](https://i.ibb.co/4Z317F3y/spawn-winrate-4p-MMR-above2100-pvt-1-4.png)
So there is plenty of very match-up specific stuff that I don't quite understand, but also more variables at play. In the mirror match-up, everything seems as expected.
case 3, 12 o'clock spawn Zerg on Dominator
This spawn is known for very quick mining for Zergs, also confirmed by the data at hand.
What now?
First, here's the rest of the data on spawn-specific stats. It's grouped by
matchup -> map
and then there are 3 plots each, one per MMR bracket. Feel free to re-upload them somewhere else to be able to repost or reference them here.
There are hundreds of plots overall, have fun browsing them. I'd recommend you look for the match-ups you play yourself and maps that you like in particular, and try to see whether you find anything interesting or noteworthy.
There might be better ways to structure this and present it, I'm looking for input there as well. Also if you have any specific sanity tests or cross-checks that could be applied to make sure everything works as intended, please comment. I should add that the results did not pass all my vibe checks. For example, on FS, the bottom left spawn is supposed to be particularly bad in TvZ, which we do not see in the data. Now, whether this is because something is wrong with the methods; or, because every Terran knows that this spawn is supposed to be worse for them, and they looked up how to best place turrets; or, this spot never was bad at all; I don't know.
Regarding what all of this means for the game, I'm not sure either. Most often, there seem to be several variables involved at the same time, with overlapping effects.
- Specific spawns seem optimal for one race, but might be not optimal at all for a different race
- Some maps have wildly imbalanced spawns, while others are pretty balanced
- Quite often, the variance between specific spots is higher than the difference in the win rates for the matchup as a whole
I didn't want to theorize too much on what the underlying causes could be, e.g. is it more of a left-side-spawn vs right-side-spawn bias, or is it about top side vs bottom side, or is it specific to horizontal vs vertical entrance only relevant in PvZ + Show Spoiler +
but it seems like vertical entrance natural expansion is MUCH better for protoss in pvz, compared to horizontal entrance. its a mostly consistent thing
Anyway, thanks again to repmastered for the data, and everyone consider donating a dollar for the free services we get. The stuff you see here is based on ~28 million ladder replays (out of the 70 million replays on repmastered, which include team games, fastest games, etc).
