The data sample has grown to 70 million replays, so it seemed like a good time to revisit. There are two parts to this. The first part is similar to what we looked at last time, i.e. how win rates for specific match-ups develop during the game, just with more data and split by map. This is now possible due to the much bigger data sample.
The second part is new, wherein we'll take a look at how balanced spawn locations are. We'll look at this both for mirror match-ups and for non-mirror matchups.
Did you ever wonder what the best spawn is as Terran in TvZ on Fighting Spirit? Did you ever feel like your timings are much more crisp when you spawn at 12 o'clock with Zerg on Dominator? Is there a difference between vertical and horizontal entrances at the natural expansion for Protoss in PvZ? Could it even be that some specific spawn locations are so much better, that an otherwise disadvantaged matchup becomes advantageous? Let's find out.
Data
+ Show Spoiler +
The dataset comprises roughly 70 million 1v1 games played since the start of 2018, but by far the biggest part came in during the last 2 years. This is perfect, since we should care much more about recent games on new ladder maps, than what happened long in the past.
tbh the "70 million" is a lie at this very moment, because I used only a data set of ~21million replays (up to february 2025), but it should be updated to the full data set in the next few days. 70 million are available at repmastered so this is what we're going with for the thread title >
The information from the dataset doesn't provide complete information from a replay but rather some extracted data. Build order or income details are not available.
The dataset did include:
To refine the dataset, a few filters were applied:
Game duration > 2 minutes
Exclude draws
Exclude games with afk players
Exclude games on fastest maps and similar
Exclude games on maps with fewer than 100.000 games
See below for a map frequency histogram. The rectangle-name-map is FS1.3 and FS1.4, since it's the only map with a korean name.
![[image loading]](https://files.catbox.moe/5yjhq5.png)
tbh the "70 million" is a lie at this very moment, because I used only a data set of ~21million replays (up to february 2025), but it should be updated to the full data set in the next few days. 70 million are available at repmastered so this is what we're going with for the thread title >

The information from the dataset doesn't provide complete information from a replay but rather some extracted data. Build order or income details are not available.
The dataset did include:
- Player races
- Game winner
- Game duration
- Spawn locations
- Player MMR
To refine the dataset, a few filters were applied:
Game duration > 2 minutes
Exclude draws
Exclude games with afk players
Exclude games on fastest maps and similar
Exclude games on maps with fewer than 100.000 games
See below for a map frequency histogram. The rectangle-name-map is FS1.3 and FS1.4, since it's the only map with a korean name.
![[image loading]](https://files.catbox.moe/5yjhq5.png)
Player population
Here we see how the ladder MMR is distributed between the players of different races. The vertical lines represent the mean of the distributions.
In the raw counts + Show Spoiler +
![[image loading]](https://files.catbox.moe/o946a8.png)
We can also look at this after normalizing each of the distributions: + Show Spoiler +
![[image loading]](https://files.catbox.moe/f3y07q.png)
Part 1, Win rates vs. game time
These distributions were created in the same way as described here.
PvT, overall
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/txesgg.png )
PvT by MMR brackets
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/199rmg.png )
PvT on 4-player maps, cross- vs close spawn
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/sbi7ds.png )
PvZ, overall
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/564fwj.png )
PvZ by MMR brackets
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/lpgs34.png)
PvZ on 4-player maps, cross- vs close spawn
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/uykzdk.png )
TvZ, overall
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/chsm66.png )
TvZ by MMR brackets
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/b1gyed.png )
TvZ on 4-player maps, cross- vs close spawn
+ Show Spoiler +
![[image loading]](https://files.catbox.moe/0jpnqf.png )
You can find a lot more data on a per-map basis here. Not sure what's the best way to share this, but google drive makes it easiest for me to update.
Part 2, Win rates by spawn location
tbc...
teaser: + Show Spoiler +
![[image loading]](https://files.catbox.moe/zuudbn.png)
![[image loading]](https://files.catbox.moe/dgj924.png)