Statistics behind map balance

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:52 GMT

We've had significant discussions over the course of this year about how certain maps have distinct advantages for certain races over others, and statistics have been thrown around left and right to reinforce points on either side. Sometimes we have seen no changes, other times we have seen developers change considerable aspects of a map to compensate. My question is; have these changes worked?

Specifically I want to look at 5 maps that have been at the forefront of balance for significant periods of their lifespan in SC2: Bel'Shir Beach, Tal’Darim Altar, Antiga Shipyard, Lost/Shattered Temple, and Shakuras Plateau. Hopefully, having looked through the statistics linked to these maps, we can gauge a reasonable conclusion about map fixes, and their effects on the win-rates of races.

A few notes before we start though:

- All data written in this post is taken from TLPD. I know the site is run by a few dedicated guys who update it as much as they can, so some data may be missing at the time of writing this. However, to the best of my knowledge, this data is as accurate as possible at this moment. And by that, I mean at the end of GSL November’s Code S Ro8 Day 2 (specific, I know)
- Data of updates are taken from Liquipedia. Again, a few dedicated guys updating the data there, and it is accurate at the time of writing
- For convenience, we are only looking at win-rates based on versions of map. While there are outside factors, the complications of discussing game patches, player ability, location, fatigue etc. will make this outrageously complicated.
- A lot of these maps have such a low amount of games played that it is difficult to have an accurate percentage comparison about changes. However, the fact that changes happen quite frequently means that one has no choice but to take the statistics as they are.
- Don't turn this into a race balance debate please.

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:52 GMT

First up,

Bel&

Original:
+ Show Spoiler +

1.1
+ Show Spoiler +

- Added a fourth base along the top and the bottom of the map
- Added Xel'Naga watchtowers in the middle
- Added rocks blocking the base of the gold expansion
- Removed pathway between the natural and third

1.2
+ Show Spoiler +

- Removed high-ground at the choke between main and natural
- Removed smaller path into natural
- Narrowed path into natural (can now be blocked by three large buildings)
- Enlarged the main slightly

Winter
+ Show Spoiler +

- Tileset changed to emulate an ice environment.
- Gold expansion changed to normal expansion with 8 mineral patches and 2 geysers. Rocks also removed.

Firstly, the initial results showed that:

TvZ: 6-10 (62.5% ± 15% to Zerg)
ZvP: 6-5 (54.55% ± 20% to Zerg)
PvT: 5-8 (61.54% ± 17% to Terran)

Eurgh…the statistical errors are disgusting for small samples. There isn’t a significant indicator of imbalance in the data since the sample size is so small. Those few changes can easily be caused by playing a favoured player. Please don’t forget that Bel’Shir Beach game into play with the GSTL, so it was regularly used as a snipe-map.

The changes in 1.1 were made due to the nature of base trading on the original, as well as the fact that, due to the number of pathways and attack options, it was apparently Zerg favoured. 1.1 implementation showed minor improvements in the match-up vs Terran, but damage the matchup vs Protoss. PvT seemed to have improved slightly. However, not enough games were played until the next version was released:

TvZ: 21-31 (59.62% ± 8.8% to Zerg)
ZvP: 17-10 (62.96% ± 12% to Zerg)
PvT: 17-15 (53.13% ± 12% to Protoss)

1.2 saw more removals of pathways, reductions of the natural's ramp, plus the loss of the high ground at the choke into the main. Arguments stated that Nydus worms were too easy to place on the high ground uncontested, plus the large ramp at the front, coupled with the extra path, made it very difficult to defend. The win ratios, however, showed a different story:

TvZ: 4-5 (55.56% ± 22% to Zerg)
ZvP: 11-3 (78.57% ± 12% to Zerg)
PvT: 8-3 (72.73% ± 16% to Protoss)

So once again, we see a possible improvement for Terran, but a major blow to ZvP balance, as well as early indications of PvT Imbalance brewing. However, before we could find out the answer, Bel’Shir Beach Winter was released

The final version? Removal of the gold minerals was widely considered a Terran nerf, due to the ability to MULE high-yield bases for amazing returns. The effect?

TvZ: 5-2 (71.43% ± 20% to Terran)
ZvP: 2-4 (78.57% ± 24% to Protoss)
PvT: 3-5 (72.73% ± 22% to Terran)

TvX with gold patches: 42.86% ± 5.7% (57-76)
TvX without gold patches: 66.67% ± 15% (10-5)

So far, the exact opposite has happened. However, PvZ has totally reversed. But with only 6 games, it’s way too early to make a decent comparison, but alarm bells may be ringing.

Conclusions for Bel’Shir Beach? It’s hard to gauge some, because generally the games have been small in number, so useful percentages are not as easy to obtain. Early indications have me worried though. The map balancers made significant changes to the map with imperfect data (small samples, speculation on imbalance as opposed to actual proof) and we’re left with another version that seems to have opposite effects to their intentions, if any. This is a case where I am hoping that the sample used is an error, and not a good representation of results to come.

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:52 GMT

Next,

Tal’Darim Altar

[image loading]

Original
+ Show Spoiler +

1.0
+ Show Spoiler +

[url=http://imgur.com/fbbaL] [image loading]

[/url

- Removed centre gold bases.
- Removed rocks blocking expansions.
- Removed rocks constricting the entrance to the main and made this choke unbuildable.
- Moved one of the natural geysers away from the cliff edge where it could be sieged from low ground.
- Narrowed the ramp at the third.

LE
+ Show Spoiler +

- Widened the natural choke to prevent walling with e.g. two barracks and one depot.
- Made the entrance to the main buildable.
- Added an area of low ground between the main and third accessible only by cliff-walking, etc.
- The third now has a full 8×1500 mineral patches and 2×2500 gas geysers and is blocked by destructible rocks.

Please note: While there was a version 1.1, it was never used in competitive play, and was instead updated and used as the Ladder Edition (LE)

Oh om nom NOM sample size! Over 2000 games have been played on this monster of a tournament map! Very few games were played on the original compared to versions 1.0 and LE, due to it being basically a test map to try out the concept. Therefore we will ignore the original and focus on the 2 main versions used in tournaments (can you imagine a Terran with that many high-yield expansions to MULE? Q_Q)

TDA 1.0 had the ratios as follow

TvZ: 101-111 (52.36% ± 4.7% to Zerg)
ZvP: 80-88 (52.38% ± 5.3% to Protoss)
PvT: 124-104 (54.39% ± 4.5% to Protoss)

Pretty balanced across the board, with Protoss taking a marginal lead. The theory behind it is that your first 3 bases are quite easily defended, so Protoss can “turtle” easier and get a powerful force, in the hope of one massive attack. However, again this could easily be down to a few guys having a good day, that is to say, there’s not sufficient evidence to say that there is a favour

Move forward to the LE version, the third now has more minerals (from 8x750 patches to 8x1500) and gas that lasts longer (2500 gas from 1250). Adding a wider choke to the natural and the low ground behind the third changed the match-ups a fair bit, due to higher risks of early expansions (harder to defend wider chokes with buildings.) The result?

TvZ: 244-236 (50.83% ± 3.2% to Terran)
ZvP: 205-220 (51.76% ± 3.3% to Protoss)
PvT: 228-255 (52.80% ± 3.1% to Terran)

Suddenly all three match-ups are slightly closer. TvZ is almost perfectly 50%, and both ZvP and PvT have come closer to the magic 50/50 mark. This looks like they’ve improved it, but the argument is whether it was necessary or not. At hasn’t hurt at least :D

Conclusions? This is an amazing example of minor changes making enough of a difference to bring the balance of the game closer to where it should be. However, one could argue that version 1.0 was close to that mark already. Terran seems to have taken an advantage in TvP, but not a significant enough advantage to declare that more changes are needed. However, once again we cannot forget that there could be statistical errors affecting the original data. People could have just had a good day, and that’s why the numbers were slightly on one side. However, we can conclude that, while it may not be a case of the changes making it better, they have not made it worse.

Now we see that the 2000 games and only one extra version seem totally justified!

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:52 GMT

Now, we have the new kid on the block, (Wiki)

Antiga Shipyard

Original
+ Show Spoiler +

1.1
+ Show Spoiler +

1.2
+ Show Spoiler +

- Neutral supply depot added to prevent wall-ins at the foot of the main ramp
- Gold expansion changed to normal expansion with 8 mineral patches and 2 geysers.

TE
+ Show Spoiler +

Seriously, I feel like I’m getting trolled when I talk about this map. It’s been, what, 3 months since it came out? And there are already 4 versions? What the….

What’s worse is that I can’t see any reason that 1.0 became 1.1. At all. Anyone wants help me out with a patch note? Because I can’t find any reason they’ve done it. At least 1.2 has the gold bases changed, and the Tournament Edition has the addition of neutral supply depots, as well as it being impossible to siege the main’s gas from the opposing third….but why is there another version?

I’ll save an area if someone can point out the difference between the original and 1.1, and unfortunately there have not been enough games in the Tournament Edition of the map, so that will be left out too. However, 1.2 has some interesting changes that we can observe.

First, let’s look at the stats before 1.2 was introduced (combined results of the original and 1.1 versions)

TvZ: 39-46 (54.12% ± 7.4% to Zerg)
ZvP: 44-30 (59.46% ± 7.4% to Zerg)
PvT: 30-38 (55.88% ± 8.1% to Terran)

As always, a few guys could have had good days. The neutral supply depot was introduced to prevent the 3-pylon block that Zerg players detest so much, plus Terran wall-ins with bunkers with the same effect. On top of this, the high-yield base was removed, as not only was it thought that a Planetary Fortress, with decent defence mean a Terran would have too much map control in the centre of the map, but it was considered almost impossible to prevent a Zerg taking a fast gold base against an expanding Protoss. However, after the changes, we see that:

TvZ: 15-8 (65.22% ± 12% to Terran)
ZvP: 3-4 (57.14% ± 25% to Protoss)
PvT: 3-7 (70.00% ± 17% to Terran)

Firstly, the statistics say pretty the exact opposite of what people thought in some cases. Despite the problems, there are no indications that Zerg were disfavoured in either match-up. However, the high-yield aspect seems to be reinforced by the stats. Zerg did have an advantage in ZvP, and Terran enjoyed a lead in both matchups.

Now, looking at the change…it almost seems like none of the issues have been addressed. Of course, we need to take these results with a pinch of salt, since the sample size is pretty small, but initial views seems like the changes have made it worse in most of the match-ups. Despite the lack of a gold base, Terran has managed to take a bigger lead in the percentages. ZvP may be fixed, but with such a small sample size, we can’t make a reasonable conclusion.

Conclusion? The really needed to play more games before changing the map like they have, because that this point, there was no reason from these statistics to warrant a change. While 1.2 has only been implemented in Korea so far, the fact that there have only been just over 200 non-mirrored games played before 1.2 was implemented (and that’s with 1.0 and 1.1 merged) means that they should have considered waiting a while longer before implementing these changes.

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:53 GMT

Next, a classic map, (Wiki)

Shakuras Plateau

Original
+ Show Spoiler +

1.1
+ Show Spoiler +

2.0
+ Show Spoiler +

- Backdoor into the main removed.
- 12 and 6 o’clock expansions now accessible by ramps (blocked by destructible rocks) leading down into the middle of the map.
- Inside expansions moved back away from the watch towers.
- Destructible rocks in the centre removed.

Another map with a massive sample size, but unlike Tal’Darim Altar, we’ve had Shakuras Plateau since the beginning of SC2, dating back to the beta. Like certain versions of Antiga Shipyard, there seems to be no difference between versions 1.0 and 1.1 (again, please correct me if I’m wrong) so we will focus on the difference that version 2.0 has made.

A brief look at the results before version 2.0 came out shows:

TvZ: 179-198 (52.52% ± 3.6% to Zerg)
ZvP: 180-186 (50.82% ± 3.7%to Protoss)
PvT: 179-222 (55.36% ± 3.3% to Terran)

While there is possible evidence that there’s probably a slight favour to Terran in the TvP matchup, in general the map seems quite balanced.

However, there were complaints. The backdoor rocks into mains were widely considered an issue, especially in the TvZ matchup, because of the strength of pushes that were capable, as well as the ability to defend the expansion in the same corridor. It also offered a relatively safe third in the case of cross spawns. Moving the inside expansions from the watchtower makes it easier to defend any workers currently mining from said expansion, and removing the rocks added a new path for Zerg to attack into, making Protoss and Terran “turtling” harder. The effect?

TvZ: 356-292 (54.94% ± 2.6% to Terran)
ZvP: 257-269 (51.14% ± 3.1% to Protoss)
PvT: 263-333 (55.87% ± 2.7% to Terran)

In an attempt to balance the map, they’ve managed to statistically make the map more imbalanced. The TvZ match-up has swung in a different direction. A slightly harder inside expansion to defend from Zerg counters has made the win-rates for Zerg fall in both match-ups.

Conclusion for this one seems pretty straight forward. This is a classic case of out of the frying pan and into the fire. In an attempt to make the map closer to a balanced meta-game, they’ve managed to make it worse. All 3 matchups are worse off than before. We can almost definitively make a conclusion as well, since we have over 1700 non-mirror match-ups played in version 2.0 and over 1100 games beforehand.

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:53 GMT

We finally come to the one map I slightly dreaded to compare: (Wiki)

Lost Temple/ (Wiki)

Shattered Temple

Original:
+ Show Spoiler +

Shattered Temple
+ Show Spoiler +

-Island expansion is now connected to the mainland, with the pathways blocked by destructible rocks
-The high ground next to natural expansions has been removed
-The 2 Xel’Naga Towers have been replaced by a single Tower in the centre of the map. The high ground around the initial towers has also been removed, replaced with depressions that cannot be crossed.

Shattered Temple 1.1
+ Show Spoiler +

-Close spawns removed

Combining total numbers of non-mirror matches, we have over 3000 samples to work with! Both sides have over 1400 results each, and even the different versions of Shattered Temple have enough games between them to gauge some credible results. This is a statistician’s dream sample for SC2, but is it a balance dream? Let’s have a look.

(One quick note; there are officially 2 versions of Lost Temple: the GOM version and the SC2 version. The differences are in the name only)

Initial results for Lost Temple were as follows:

TvZ: 218-155 (58.45% ± 3.3% to Terran)
ZvP: 186-196 (51.31% ± 3.6% to Protoss)
PvT: 302-347 (53.47% ± 2.7% to Terran)

Initial analysis shows a slightly alarming favour in the TvZ match-up. We must remember that this was when close spawns were allowed, plus the high ground near the natural provided and amazing area for siege tanks to hold, giving a near-impenetrable platform if allowed to set up. Holding the Xel’Naga towers gave a complete view of the main attack paths, and was easier to hold than most.

Because of these issues, Lost Temple was dropped as a map from the ladder, as well as most popular competitions. However, it was to be replaced by Shattered Temple, with the changes mentioned above. The result?

TvZ: 97-41 (70.29% ± 4.6% to Terran)
ZvP: 59-48 (55.14% ± 6.5% to Zerg)
PvT: 67-77 (53.47% ± 5.7% to Terran)

70%?!?!? Even with adjustment to the error that’s huge! How is it possible that the map was changed to help Zerg players in the ZvT match-up, and yet still cause such a massive shift? At least ZvP was fixed though, with Zerg winning more. However, the improvement is so much that now Zerg are favoured, and the match-up is even further from the magical 50/50 mark that is so desired. Pvt? Exactly the same.

With such concerns, a change was definitely needed to balance the map more. Therefore 1.1 came out quite quickly. The changes? Close spawns are now disabled. And that’s about it. One change, but a significant one:

TvZ: 308-226 (57.68% ± 2.8% to Terran)
ZvP: 231-150 (60.63% ± 3.2% to Zerg)
PvT: 213-252 (54.19% ± 3.1% to Terran)

We see another massive shift in the TvZ match-up, but this time in the positive. After all of the changes, the final result is a 0.81% drop in Terran win-rates. However, ZvP has now shifted even further in the favour of the Zerg, with a steady increase of approximately 5% per change. The only consistent part is the PvT, but even that match-up is slowly getting further from 50/50 as time goes.

Conclusion? Maybe it’s time we retire the map. Blizzard has clearly tried many things to fix the balance of the map, but has been met with a Whack-A-Mole effect; hammering one problem has raised another one. Competitions in Korea have replaced Shattered Temple with the likes of Crossfire and Bel’Shir beach. Maybe it’s time the international scene does the same.

Hassybaby

United Kingdom10823 Posts

November 24 2011 19:55 GMT

Overall Conclusion

I came into this discussion picking a group of maps that I thought would give a good representation of map changes, and their effects on the balance that may or may not have been on previous versions. While I wasn’t exactly sure what I was going to find, it is amazing how broad the results have been, in terms of the success that map changes have seen.

I think we can all agree that the hero in this story is Tal’Darim Altar. It came into the pool and instantly achieved critical acclaim (not the original test version, silly gold bases.) Small changes were implemented, and the map because even closer to the balance point that it was close to already. Does it need any more? I’d like to say no, because any small changes now may ruin the balance that it has so beautifully achieved. Did it need it before? Maybe, maybe not. However, the fact that small changes have made the map seem like it is down to the player is fantastic in my eyes. Well done guys

On the opposite side though are the classic maps Shakuras Plateau and Lost/Shattered Temple. While LT/ST has demonstrated clear map imbalance, changes made to it have only seemed to make it worse. Shakuras seemed statistically close to balance, but changes have possibly made it worse. I think that these results point more to the fact that people’s perception of balance changes are different to what is actually the case at the moment. I cannot give a recommendation to solve this, since I don’t have the in-depth game knowledge that is needed. However, I do feel something has to be done to both maps still, if we want to continue using them. I hate to lose cornerstones of StarCraft 2 competitive play though, so I hope that they can be fixed, instead of simply disappearing.

And then we see the third, and in my opinion worse, map balancing concept out there. Both Antiga Shipyard and Bel’Shir Beach have suffered from multiple changes in short periods of time, without a change to truly test the problems, or to objectively see the results. Removing high-yield minerals because they were apparently Terran favoured? Terran takes a lead in the match-ups. Prevent total wall-ins at the bottom of your main ramp, so Zerg don’t get pylon/bunker blocked? Zerg start losing more. For once, I hope that the results are just wrong, and we’re just looking at statistical errors for these maps. Maybe in a year, we’ll look back and see that the changes HAVE worked, but right now there are indications that they have not.

And that leads me to my main point. We just don’t have enough information. Adding Antiga Shipyard and Bel’Shir beach together, there are fewer than 500 games between them, but there are 8 versions. Shattered Temple was changed before 400 games were played on it. How can there be a decent gauge of how the effects have…affected the map balance, if not enough games have been played on it to give an accurate conclusion?

I agree that there are a number of cases where the community has detected significant problems before the statistics have been able to reinforce their points, so you don’t see them really represented as well (the Terran push through the rocks back in original Shakuras comes as a good example,) and in those cases I totally agree that there have to be changes, so we don’t ever see it in TLPD. However, my concern is that, with the amount of changes we see, and the constant shifts in the meta game, changes in maps as well causes great volatility in proper analysis of results. Maybe if we can stagger changes a bit more we can improve on our samples, but that is difficult considering how many changes happen. Also, when there’s a clear problem, we as a community are very quick to notice it. Khaydarin Amulet was spotted as a problem early, as was the Roach supply issue. It’s the smaller problems that I wish we don’t instantly try to fix.

I ask you, map balancers of the world, please leave the current maps alone for a while. There may be problems, but there is not enough data to prove that there is right now. Let us as a community get enough results under our belts, and then we can give a reasoned argument for and against changes. And once the changes are in place, please give us enough time to show the true effects of the changes in place. Then, and only then, can we have a map pool that we can truly state is balanced.

And then we leave it to the players

Ty to Cascade and Asha’ for helping proof read what I wrote and pointing out my many mistakes!

Psychobabas

2531 Posts

November 24 2011 20:02 GMT

I am sorry but the sample is way too small for some maps, as you state.

Also, the most recent statistics include all the games from before, therefore is not as accurate since multiple game changing nerfs and buffs have occured over time.

What I think would be a better study, and with more meaning, would be to look at the balances on each patch. I dont mean to be annoying but I think there is little point to this, since those factors (patch changes) are far more important in my eyes than maps, which are certainly not as decisive (at least not as much as Brood War).

Andreas

Norway214 Posts

November 24 2011 20:05 GMT

Are you sure that's the original Tal'darim with the gold bases? I was sure that was NASL's version.

shaldengeki

United States104 Posts

November 24 2011 20:05 GMT

#10

It's good that you ran the numbers on this. Not enough people are aware of the fact that the sample sizes for maps (and race balance issues in general) aren't large enough to make conclusive statements yet. Even fewer care enough to make reasoned statistical arguments about it.

A slight nitpick though: IIRC if your confidence intervals overlap at all, you have to say that you can't reasonably tell whether these numbers are actually different or not. There were a few instances in which you seemed to acknowledge the overlaps but then continued to say that the winrates were actually distinguishable and attributable to changes.

stevarius

United States1394 Posts

November 24 2011 20:05 GMT

#11

On November 25 2011 05:02 Psychobabas wrote:
I am sorry but for all the maps apart from Shakuras, the sample is way too small.

Also, the most recent statistics include all the games from before, therefore is not as accurate since multiple game changing nerfs and buffs have occured over time.

I'm going to have to agree with this post.

It's not uncommon to take a sample and run tests to determine the facts stated within, but the sample is too small, not only because of the nature of the game, but because the numbers are extremely small. I commend your effort, but the best solution would to be to somehow derive a much larger sample to test and provide a conclusion from. Maybe Blizzard has this data?

shaldengeki

United States104 Posts

November 24 2011 20:07 GMT

#12

I think the fact that the sample sizes are really small is actually part of the point that OP is trying to make; it's way too early for anyone to really talk meaningfully about map balance, and he does actually get at this in his conclusion.

Psychobabas

2531 Posts

November 24 2011 20:14 GMT

#13

On November 25 2011 05:07 shaldengeki wrote:
I think the fact that the sample sizes are really small is actually part of the point that OP is trying to make; it's way too early for anyone to really talk meaningfully about map balance, and he does actually get at this in his conclusion.

Yes he does.

But still, I think that map changes dont nearly enough influence a matchup as much as patch changes. Therefore, seeing a 70% winrate in a matchup with say, 300 games, means that this match up has evolved through numerous patches and reached that result.

So, it could be that 250 of those were played during an early patch and the map possibly is not played as much as before, so the statistic cannot change as much.
I hope I am clear with what I am trying to say.

In essence 2 factors have to be looked at for a study like this:

1. Patch "periods", ie what was the balance on this specific map during patch 1.4.0 for example

2. Frequency of map played in later patches, since data of a year ago is not that relevant (infestor nerf/buffs, ghost changes, tank changes and so much more have occured since)

stevarius

United States1394 Posts

November 24 2011 20:16 GMT

#14

The sample size is small because it's from TLPD. The best data to use for this kind of statistical analysis would be to use data that Blizzard uses to balance in-house.

shaldengeki

United States104 Posts

November 24 2011 20:18 GMT

#15

On November 25 2011 05:14 Psychobabas wrote:

Show nested quote +

Yeah, patch changes over time undoubtedly influence win rates on maps. That's a really good point, and I feel like it strengthens the OP's point wrt sample sizes - you've got all these variables that are changing over time, so it's even more important that maps be held static for awhile to accurately measure map balance - see what works and what doesn't. Otherwise we're just flailing about in the dark.

Psychobabas

2531 Posts

November 24 2011 20:22 GMT

#16

On November 25 2011 05:18 shaldengeki wrote:

Show nested quote +

Totally agree with that.

shaldengeki

United States104 Posts

November 24 2011 20:22 GMT

#17

On November 25 2011 05:16 stevarius wrote:

Show nested quote +

The sample size is small because it's from TLPD. The best data to use for this kind of statistical analysis would be to use data that Blizzard uses to balance in-house.

Well, as true as it is that optimally Blizzard would be actively conducting and publishing map balance results for all maps (including non-Blizzard ladder maps), I honestly don't see it happening with any frequency in the near future. The fact that Blizzard undoubtedly has more data on this doesn't necessarily imply that carrying out third-party studies is a totally useless endeavour, IMO.

Psychobabas

2531 Posts

November 24 2011 20:26 GMT

#18

On November 25 2011 05:22 shaldengeki wrote:

Show nested quote +

This opens another can of worms actually. Then we would wonder: Is Blizzard balancing maps for the masses or just the higher players? I personally think it's for the masses. So I think a study like this has great potential, as a statistic that includes bronze play (no offence to the bronze people out there!) is meaningless to me.

shaldengeki

United States104 Posts

November 24 2011 20:28 GMT

#19

Yeah, that is a good point. I do remember seeing Blizzard presenting stratified map win rates by division at a recent event, so they're definitely keeping a pulse on map matchup balance at all levels.

Dragar

United Kingdom971 Posts

November 24 2011 20:29 GMT

#20

This is great, but you really need to be factoring in overall changes in the metagame. If the Sunken/Shattered temple changes coincided with the development of blue-flame hellion play that let Terran dominate Zerg for a brief period, then you'd see massive swings that have nothing to do with the map.

If you are careful, you should be able to do this using the win-rate statistics thread.

Sea_Food

Finland1612 Posts

November 24 2011 20:30 GMT

#21

Shakuras was made 10x less zerg unfavored in patch 2.0 but the statistics lie as when 1.1 was played, all the other maps were even more zerg unfavored.

Lumi

United States1616 Posts

November 24 2011 20:30 GMT

#22

I appreciate the effort here but this sample size is pretty useless \= And if it's from TLPD who knows what patch variations are included within this data.

Markwerf

Netherlands3728 Posts

November 24 2011 20:41 GMT

#23

if you provide a confidence interval please provide what confidence level you're using. Makes it easier to read not having to calculate back what confidence level it was. (i assume it's 95%)

Also it would be nice to note at what time certain map changes went into efffect. I can imagine that most map changes were accustomed by a patch as well so shifting match ratio's could partly be the result of patches as well.

Randomaccount#77123

United States5003 Posts

November 24 2011 20:54 GMT

#24

--- Nuked ---

Mista_Masta

Netherlands557 Posts

November 24 2011 20:59 GMT

#25

Very interesting statistical analysis. I do agree that for many maps there's not enough data to yield any meaningful conclusions. Also, like others have said, part of the changes in the match ups may be caused by patches. Still, good job! =)

Primadog

United States4411 Posts

November 24 2011 21:18 GMT

#26

Unbelievable that we ever used a map that was 70%-favored in a competitive environment, I look forward to the future when map theory become respected as a serious discipline. This was a great first step!

Dodgin

Canada39254 Posts

November 24 2011 21:21 GMT

#27

On November 25 2011 05:02 Psychobabas wrote:
I am sorry but the sample is way too small for some maps, as you state.

Also, the most recent statistics include all the games from before, therefore is not as accurate since multiple game changing nerfs and buffs have occured over time.

What I think would be a better study, and with more meaning, would be to look at the balances on each patch. I dont mean to be annoying but I think there is little point to this, since those factors (patch changes) are far more important in my eyes than maps, which are certainly not as decisive (at least not as much as Brood War).

you're actually just completely wrong, sorry. maps are a big deal.

RoboBob

United States798 Posts

November 24 2011 21:24 GMT

#28

It was really interesting seeing the numbers on Shakuras, it definitely wasn't what I expected.

The Shattered Temple results weren't that surprising. I think one thing you didn't consider about the original Lost Temple was the influence of BitByBit. (and to a lesser extent, Rax before Depot) He taught us how to auto-kill Zergs on close-by-ground spawns. By the time Shattered Temple came out, all the changes to the cliff, island, and center didn't matter. They certainly made mid-lategame TvZ better for the Zerg, but because all the Terrans learned they could kill the Zerg in the early game, it just didn't matter.

I'm not sure why the current version of Shattered Temple TvZ is still Terran favored. I'm not a big believer of Gold expansions automatically making maps imbalanced for Terran, but maybe there is a case for that here.

I also thought it was a bit silly to look at Bel'Shiar and Antiga when they haven't had enough play yet. Maybe that was the point you were trying to make, but I feel kinda bad that you went to all that effort for nothing =/

Kira__

Sweden2672 Posts

November 24 2011 21:25 GMT

#29

I think numbers are overrated. Better to look at games and how they play out.

Mirosuu

England283 Posts

November 24 2011 21:36 GMT

#30

I think one factor in all of this is that some of the maps were played in an era where zergs didn't know how to even spread creep and macro correctly. I think that is also a big factor of why some maps were deemed "imba". I think in a few years, if we played on these maps, they would then skew towards one race or another due to everyone being near the height of the skill in the game and these imba reasons come out of that. Emergent phenomena, at a player skill level, you could say.

We'll see. Hopefully they are still balanced in a few years, but I severely doubt that even tal'darim would be balanced across all three races in a few years once everyone has fleshed out a lot of their races and the matchups.

Madera

Sweden2672 Posts

November 24 2011 21:38 GMT

#31

Wow, thanks for the statistics!

Micket

United Kingdom2163 Posts

November 24 2011 21:39 GMT

#32

I don't think taking map statistics is always the best thing. Sc2 isn't BW, where everyone is sooo good that a small map imbalance will lead to a particular race ALWAYS winning. Currently in sc2, we should look at maps and say 'does the better player WIN?' On a map like Taldarim, answer is yes. Xelnaga Caverns? No. Metalopolis? I would say yes. That is all that matters imo. Shakuras looks pretty balanced but in actuality, Zerg has a really tough time there. The reason why the map looks balanced is because we don't have that situation like in BW where everyone is good. Stephano is still gonna trounce some lesser pro on that map. But once players get better, the map will have to leave soon. It is great ATM because the better player wins, but it won't last forever.

Hassybaby

United Kingdom10823 Posts

November 24 2011 21:52 GMT

#33

This is actually going to take me a while, because there are a fair few points of contention about what i am trying to portray here.

On November 25 2011 05:05 Andreas wrote:
Are you sure that's the original Tal'darim with the gold bases? I was sure that was NASL's version.

It's an old version that the NASL accidentally used. They fixed it pretty quickly, and subsequent matches were played on the right version

@ those talking about sample size. That's exactly my point. A number of map changes have been made with little data backing them up, but mostly general opinions. Antiga Shipyard hasn't even been out for 6 months, but there are 3 main versions of it. Bel'Shir Beach has only seen play in the competitive scene, and yet there are numerous changes to it. 40 non-mirror match-ups were played before they released version 1.1! How can there be any seriously justification of this many variations when other factors have not been given a chance to have an affect on the win rates?

@ those mentioning Blizzard data: Blizzard balances maps at the higher levels. Their concern across the board comes more from units then maps, and even there they favour the higher ranks. It was the professional scene that pushed for changes to Lost Temple to remove that high ground. It was the pros that noted now truly brutal Terran pushes could be through Shakuras using the back rocks. While I agree that using the bigger sample would be useful, at the same time we would have to reject the same data because it's not relevant to what the motives were.

On November 25 2011 05:18 shaldengeki wrote:
Yeah, patch changes over time undoubtedly influence win rates on maps. That's a really good point, and I feel like it strengthens the OP's point wrt sample sizes - you've got all these variables that are changing over time, so it's even more important that maps be held static for awhile to accurately measure map balance - see what works and what doesn't. Otherwise we're just flailing about in the dark.

Pretty much what i was going for. The only map that we can safely say has been given a fair chance in terms of severity of changes is Tal'Darim Altar, and that was a tournament map to begin with.

On November 25 2011 05:26 Psychobabas wrote:
This opens another can of worms actually. Then we would wonder: Is Blizzard balancing maps for the masses or just the higher players? I personally think it's for the masses. So I think a study like this has great potential, as a statistic that includes bronze play (no offence to the bronze people out there!) is meaningless to me.

Can't comment, despite how hard i wanted to. If we take the number on face value, some cases they are trying to balance it at the high levels, and failed. Other cases, they are trying and they succeeded in one area, and caused more problems in another. We can't say, because the balances outside maps are constantly changing. That's why I want stability, as opposed to constant changes.

@ those thinking that I should have compared stats by patch and overall win-rates as well: Firstly...I'm actually going to do that later, for each individual map

Secondly, the fact that we would be dividing up the small sample even more, as there have been many patches, would cause statistical errors to be even greater than they are right now, and then there would be even more reason to disregard the data. Not enough time was given....

On November 25 2011 06:24 RoboBob wrote:

I also thought it was a bit silly to look at Bel'Shiar and Antiga when they haven't had enough play yet. Maybe that was the point you were trying to make, but I feel kinda bad that you went to all that effort for nothing =/

It was, and i don't think it's a waste of time for me. Anyone who regularity goes into the GSL LR knows that I love writing out stats, and I enjoyed doing it

On November 25 2011 05:05 shaldengeki wrote:
A slight nitpick though: IIRC if your confidence intervals overlap at all, you have to say that you can't reasonably tell whether these numbers are actually different or not. There were a few instances in which you seemed to acknowledge the overlaps but then continued to say that the winrates were actually distinguishable and attributable to changes.

If you could point out where they are, I'd love to change it around to make the necessary changes.

Hope that cleared up some misconceptions

Zaphid

Czech Republic1860 Posts

November 24 2011 21:54 GMT

#34

I was doing similar research about a month back, including every single custom maps introduced by GOM and basically came to the conclusion that any cookie cutter macro map will lead to the same results given enough games. Exactly like TDA. For example Xel naga Fortress has pretty decent PvT stats - around 57% TvP, yet anyone who saw more than a few games on it will tell you it's a shitty map for that MU. It's 90% 1/1/1, VR allins or someone metagaming. Similar goes for Crossfire. Those maps could be fine on ladder though, because only high masters/GM can execute those builds well enough, so most of the audience wouldn't notice it.

You also completely disregard practice games, which are essentially untrackable, but you can bet your ass that especially for the GSL, they outnumber the tournament ones. When a pro tells you a map x sucks, it's usually because they figured out a way to exploit and they are having a hard time working around it.

Anyway, I think the time of big balance patches has passed, so now it should be perfect time to ditch most of the old ladder pool and introduce some great macro based maps so we can get the most out of the game.

Also Antiga has so many versions because of positional imbalances on the ladder version and you can hit main's gas from the third, at least originally you could. Playing a map with those "features" in tournaments would be a joke.

arioch

England403 Posts

November 24 2011 21:55 GMT

#35

Brilliant and interesting post thanks... gonna read in more detail later but saying thanks for your effort.

Nymbul

United Kingdom127 Posts

November 24 2011 21:55 GMT

#36

After looking at the original i'm only now remembering how bad Shakuras used to be. I knew there was backdoor rocks which is an evil that must be purged but I completely forgot about the middle rocks

The GSL has removed gold expansions but i'm wondering if there's going to be map changes to make them safer due to the lessening of Reward > Risk

ogion

New Zealand79 Posts

November 24 2011 21:59 GMT

#37

Correct me if I'm wrong, but I think the reason they released the original Shakuras Plateau is that there was a glitch were you could build a pylon in the opponents base next to the rocks and it would be invisible. They removed the map for a while, then brought it back as 1.1, where the glitch was gone.

Antoine

United States7481 Posts

November 24 2011 22:05 GMT

#38

1.1 for antiga is no horizontal spawns. (same for shakuras)
also bottom of main ramp changed to prevent 2-bunker block iirc

Markwerf

Netherlands3728 Posts

November 24 2011 22:06 GMT

#39

On November 25 2011 06:55 Nymbul wrote:
After looking at the original i'm only now remembering how bad Shakuras used to be. I knew there was backdoor rocks which is an evil that must be purged but I completely forgot about the middle rocks

The GSL has removed gold expansions but i'm wondering if there's going to be map changes to make them safer due to the lessening of Reward > Risk

I think gold was a slight mistake in sc2. The idea is nice of a high risk high reward expansion where the option of where to expand starts to matter more but it's very hard to balance racewise. Zerg have the least dependance for their bases being close together because of fast units and them not relying on static defense so much so they can probably use golds the best, terran can use all mules on gold and take golds quite fast by means of a planetary as well. Protoss on the other hand cna hardly use gold well at all, if anything they tend to have oversaturation already most of the time so they are not that interested in a base with less patches.
Those principle differences between the races make gold very hard to balance I think.

SpiZe

Canada3640 Posts

November 24 2011 22:10 GMT

#40

On November 25 2011 07:05 Antoine wrote:
1.1 for antiga is no horizontal spawns. (same for shakuras)
also bottom of main ramp changed to prevent 2-bunker block iirc

Shakuras 1.1 also included the fix to the invisible buildings, or was it 2.0 ?

docvoc

United States5491 Posts

November 24 2011 22:13 GMT

#41

Taldarim has major issues. The zvp is full of blind 6pools and 7RR that are effective because of the lack of choke. The PvP on that map is all 4 gate w or w/o phoenix. So far blizz just needs to let the real map makers in the community make the map like they have promised and stop with the stubborness. I understand that some maps are very good, like shakuras is, but there have been many issues. Also to quote Liquid` Sheth on a thread about metalopolis, some maps that are slightly imba are their because of that watching value and that it allows such maps to show who really has the great ability between two players. Heck if you hate a map so much, since blizz won't let the community make them, just veto up to three of them. Really map balance in question seems to lead to much in the way of complaints about maps and not really a true answer to how to fix said imbalances.
EDIT: I don't mean so sound like a total dick, you obviously put a lot of time and effort into this post and it is much appreciated, i just want to see somebody with greater map knowledge than i start proposing some really viable answers to map balancing.

Antoine

United States7481 Posts

November 24 2011 22:14 GMT

#42

2.0 fixed the invisible buildings from what i remember
1.0 was pulled from the ladder pool because of invis buildings and when it came back it was 2.0.

tzenes

Canada64 Posts

November 24 2011 22:18 GMT

#43

Ignoring the issue of sample size, I think it's important to control for overall balance.

For example, if TvZ is currently 60% (I realize it's not) and a map has a TvZ of 55%, although the map might look Terran favored the exact opposite would be true.

Ideally, you'd want to break out win rates by patch, and compare them to map win rates in the same time period using a Chi-squared test for independence.

Nymbul

United Kingdom127 Posts

November 24 2011 22:20 GMT

#44

On November 25 2011 07:13 docvoc wrote:
Taldarim has major issues. The zvp is full of blind 6pools and 7RR that are effective because of the lack of choke. The PvP on that map is all 4 gate w or w/o phoenix. So far blizz just needs to let the real map makers in the community make the map like they have promised and stop with the stubborness. I understand that some maps are very good, like shakuras is, but there have been many issues. Also to quote Liquid` Sheth on a thread about metalopolis, some maps that are slightly imba are their because of that watching value and that it allows such maps to show who really has the great ability between two players. Heck if you hate a map so much, since blizz won't let the community make them, just veto up to three of them. Really map balance in question seems to lead to much in the way of complaints about maps and not really a true answer to how to fix said imbalances.
EDIT: I don't mean so sound like a total dick, you obviously put a lot of time and effort into this post and it is much appreciated, i just want to see somebody with greater map knowledge than i start proposing some really viable answers to map balancing.

I agree with this. On ladder I pretty much have to double scout in ZvP cause otherwise the zerg can do a random 6 pool which can kill you if you scout the zerg 2nd or last and PvP is just the same game every time.

I'm still not a massive fan of Antiga either. Sure terran and protoss can go onto 3 bases easily but after that it seems almost impossible for them to secure a 4th against a good zerg.

JiYan

United States3668 Posts

November 24 2011 22:21 GMT

#45

i think its worth comparing map statistics along with overall statistics. Due to certain patches, protoss might have 80% winrates vs the other two races and obviously those wins would be on the current map editions. Your data is pretty expansive, but if you plan on being conclusive you would have to compare map specific statistics with overall statistics to see the differentiation.

If you dont get my point:
Say patch 1.6 makes marines cost 100 minerals and terrans start losing everywhere. During this patch period, the current maps in play (i.e. the latest antiga, the latest shakuras). Since maps arent the only things that have been fluctuating, they can't be the only variable you look at.

Liquid`Jinro

Sweden33719 Posts

November 24 2011 22:27 GMT

#46

Fun to read but there's a lot of cases where it really isnt about what the map looked like imo...

Like, shakuras:

Conclusion for this one seems pretty straight forward. This is a classic case of out of the frying pan and into the fire. In an attempt to make the map closer to a balanced meta-game, they’ve managed to make it worse. All 3 matchups are worse off than before. We can almost definitively make a conclusion as well, since we have over 1700 non-mirror match-ups played in version 2.0 and over 1100 games beforehand.

Pretty sure if you gave us back the old version it would have more terran favoured stats now than it did then, because terran players are MUCH better at playing long macro games now, than 1 year ago.

Diamond

United States10796 Posts

November 24 2011 22:45 GMT

#47

Sort of similar to what Jinro said (but a little diff) is with maps that have LONG histories and HUGE sample sizes you have to wonder how much of that data is just due to metagame shifts. I really wish patches were separated for TLPD

.

For example (my fav exmaple) is Scrap Station which is as follows:

TvZ: 252-249 (50.3%) | ZvP: 171-164 (51%) | PvT: 163-190 (46.2%)

Wait a minute, you mean SS is one of the most balanced maps ever?

No. So many metagame shifts and balance changes happened during that time that I believe (but don't have proof cause I'm too lazy to hand add up everything) is those stats were all evened out over the many metagame shifts. For example for a long time SS was Zerg favored in TvZ/ZvT. However eventually Terrans figured out how to play it and it became a Terran map in TvZ/ZvT. So basically these two major shifts caused it to balance out.

Does it make the map balanced? No, it just makes it used too long.

Antoine

United States7481 Posts

November 24 2011 22:49 GMT

#48

Just fyi you can separate it out manually, go to a map page and filter by date. I like the idea of making it easier though.

Primadog

United States4411 Posts

November 24 2011 22:50 GMT

#49

Even though TLPD doesn't pharse monthly data, the TLPD Winrate chart is a decent proxy of those values. Should consider using these for future analysis http://www.teamliquid.net/forum/viewmessage.php?topic_id=272754

Diamond

United States10796 Posts

November 24 2011 22:51 GMT

#50

On November 25 2011 07:49 Antoine wrote:
Just fyi you can separate it out manually, go to a map page and filter by date. I like the idea of making it easier though.

Well then I have to do into Liquipeida and pull patch dates and line it up and document it all, not that hard but too much for someone already working 100+ hours a week in E-Sports

! But I am pretty positive SS would show MAJOR shifts in it's balance over the life of the game. It just happened to get canned when it happened to be pretty even.

darkscream

Canada2310 Posts

November 24 2011 22:52 GMT

#51

ok, rather than attacking the actual race matchup statistics as flawed (since lets face it, all statistics are flawed in some way)..

I find this post pretty interesting, I think map balance was the real issue rather than race balance and a lot of weird changes were made along the way before blizzard cleaned up the map pool. Now I think the problem with the maps is that they are stale; but at least they are relatively balanced with the game as it is today for the most part.

Its hard to introduce a new map and have it be somewhat balanced tho, even antiga is somewhat problematic in early versions thus far

Iamyournoob

Germany595 Posts

November 24 2011 23:11 GMT

#52

Nicely done overall, but may I ask a couple of questions?

First of all: What is the distribution you used to do the calculation? Binomial distribution?

Second: What are +/- numbers behind the percentages? Are those the empirical standard deviations or are those confidence intervals? If it is the latter, what is the probability of the intervals?

(If you explained this somewhere and I missed it, then I am sorry)

Cascade

Australia5405 Posts

November 24 2011 23:28 GMT

#53

On November 25 2011 08:11 Iamyournoob wrote:
Nicely done overall, but may I ask a couple of questions?

First of all: What is the distribution you used to do the calculation? Binomial distribution?

Second: What are +/- numbers behind the percentages? Are those the empirical standard deviations or are those confidence intervals? If it is the latter, what is the probability of the intervals?

(If you explained this somewhere and I missed it, then I am sorry)

The error is the square root of the smallest number of the wins and the losses, which is the standard deviation in this kind of counting measurables. In the limit of large numbers (as rule of thumb it's fine above 20 samples) this becomes a normal distribution where the standard deviation correspond to a 68% confidence, two standard deviations correspond to 95% confidence, and 3 standard deviations is more than 99%. These confidence intervals become a bit shaky below 20 samples, but still give a pretty good idea about how reliable things are.

Hassybaby

United Kingdom10823 Posts

November 25 2011 00:31 GMT

#54

On November 25 2011 07:27 Liquid`Jinro wrote:
Fun to read but there's a lot of cases where it really isnt about what the map looked like imo...

Like, shakuras:

Show nested quote +

Pretty sure if you gave us back the old version it would have more terran favoured stats now than it did then, because terran players are MUCH better at playing long macro games now, than 1 year ago.

That reinforces the point of "wait and see" though. Map balances are a lot more subtle than race balance imo, so any changes have to be really justified. Removing the high ground in LT made sense, as did the prevention of close spawns on Shakuras. After that though, can there be actual justification of map changes when race changes have such an effect that could easily neutralise the effects? Or worse, cause greater diversity in number?

Szubie

United Kingdom294 Posts

November 25 2011 01:16 GMT

#55

Very indepth analysis, it's interesting to see the shifting balances on these maps.

I wonder: perhaps it is because the map-makers are not sure exactly how to adjust balance effectively that maps have not played as large a role in balancing the game as in BW. Or maybe it's the other way around, that because we as a community look only to blizzard for most balance changes, and so the map-makers aren't given the opportunity to learn the nuances of balancing the matchups.

Acritter

Syria7637 Posts

November 25 2011 01:26 GMT

#56

This is a very impressive work, but overall entirely useless. We don't have any way to ensure that the changes in winrate were independent of balance changes and metagame shift. You'd need to do a series of tests with a bunch of professionals (or at least Masters or Grandmasters players) on the maps between each version. Good effort, but we can't actually use this to prove or disprove anything. There are just too many variables at work.

Probulous

Australia3894 Posts

November 25 2011 01:32 GMT

#57

Really interesting read.

This is also a good opportunity to link perhaps the greatest troll in the history of TL.

http://www.teamliquid.net/blogs/viewblog.php?id=70545

We love you Inc.

KevinIX

United States2472 Posts

November 25 2011 05:26 GMT

#58

You also have to remember there were major balance patches and metagame shifts which may be affecting win rates.

Hassybaby

United Kingdom10823 Posts

November 25 2011 10:50 GMT

#59

I think I'll point this out once more, because I feel a few people still have missed one of the points I'm trying to point out by merging metagame/balance changes together to give a completely overall picture of a map, and only looking at the map changes: the small sample sizes, the ever changing aspects in the background which I haven't addressed fully, is a critical part of my point: Not Enough Information

We are constantly changing the state that the game is in. The meta-game shifts constantly, with innovation coming out in droves. Balance changes are quite constant, and their effects are wide and varied. General play-styles are different in different regions. The difference in player ability. All of these, and more factor into the overall results that we see. And all on them are constantly changing

So how, with all of these changes, it is even more important that we have a base to compare with. We don't have that in any way:
- We can't use the meta-game because it is ever shifting, and there are no indicators that we can use to portray when the next shift will be
- We can't use balance, because new patches come quite often, and their results are varied, so there is no stability in that sense.
- We have to take into consideration the difference in ability and match-up strengths. DRG playing a weaker Terran who isn't the great in TvZ is going to be different to playing, say, MMA. But again, with the way that tournaments are played, there is no logical way that we can take that into consideration in a reasonable manner, because there is always the "bad-day" aspect as well, that adds to the errors.

In an ideal world, we would test the ability of a map by pitting 2 equally skilled players on a map across several periods of time, and compare their results across the board, across several balance changes and meta shifts. That is to say, ceteris paribus, the variables would be the meta game and the balance changes. However, right now that can't happen, as we have all aspects being variables, including the fact that both competitors are human!

But with all these changes, should we not have something that is a constant? The only thing that CAN be is the map, which is why I'm asking that we stop changing maps so much. They are adding another variable to something that could possibly have been fixed through other means, so why the constant changes?

I want to actually use Jinro's example of Shakuras, because its a perfect point that he made:

On November 25 2011 07:27 Liquid`Jinro wrote:
Fun to read but there's a lot of cases where it really isnt about what the map looked like imo...

Like, shakuras:

Show nested quote +

Pretty sure if you gave us back the old version it would have more terran favoured stats now than it did then, because terran players are MUCH better at playing long macro games now, than 1 year ago.

100% correct, I'm not going to argue with that. BUT the fact that there are these meta-game changes, as well as balance additions means that we can reasonably assume that would happen. But what was the result? We'll never know because the change has been put in. There may not have been an issue with the map in the first place, it was just the meta-game at the time. But we will never know

Same with Antiga Shipyard. barely 200 games have been played on a competitive levels, and then it was hit with the anti0-gold base movement. Now we have another map that was essentially a good test for us, but we've lost all data on it with the changes. And we really shouldn't have.

If it has consistently had high win-rates for a race, or performs better than the win-rates at numerous points, THEN we can justify a change to it, because we can see that balance changes and meta-game shifts are not the problem on the map. But it has to get that far to be able to have that data.

A map should be untouched for a year, unless there are blatant indications of a problem with it (and I'm taking about high-ground LT, or revealed buildings on Shalkuras levels here.) Then and only then will we have enough data to declare a map with balance problems.

Belha

Italy2850 Posts

November 25 2011 11:35 GMT

#60

Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races.

Hassybaby

United Kingdom10823 Posts

November 25 2011 11:54 GMT

#61

On November 25 2011 20:35 Belha wrote:
Nice analisys. However the stats are flawed for a simple reason. Every match up win% in every map must be considered in the same balance patch. So old classics like shattered and shakuras have gone through different patches that favored different races.

On cmon...how many times to I have to say that that's the point? And that's why I've been saying that the map changes were not necessary, since the meta-game and other balance changes could have sorted out the problems, but were never given the chance because the map was changed beforehand.

Markwerf

Netherlands3728 Posts

November 25 2011 14:02 GMT

#62

On November 25 2011 20:54 Hassybaby wrote:

Show nested quote +

that's a pretty flawed argument. You could easily reverse this to say map changes are neccesary as that makes balancing the game easier. If maps are flawed you don't get to see the full spectrum of strategies and it get's more difficult to determine the general balance.
Also map makers and tournament hosts have more information then given in your thread, they also have the opinion of pro's and perhaps even access to data like inhouse training sessions (directly or indirectly).

Also it's a completely subjective argument what counts as enough proof that a map is balanced or not.

Itsmedudeman

United States19229 Posts

November 25 2011 14:16 GMT

#63

20 games, let alone 10 is enough to give any indication of what race a map favors. Some of those statistics should not even be considered at all.

gruff

Sweden2276 Posts

November 25 2011 14:22 GMT

#64

I think you forgot a 'not' in there.

Itsmedudeman

United States19229 Posts

November 25 2011 14:28 GMT

#65

Yeah

Also, the gold base changes on antiga and dual sight obviously weren't going to change much. If you let a terran safely get a 4th on either of those bases then you're in a bad position regardless of whether it's gold or not. It's not like xel naga where it's an easy 3rd.

FeyFey

Germany10114 Posts

November 25 2011 14:46 GMT

#66

On November 25 2011 23:28 Itsmedudeman wrote:
Yeah

Also, the gold base changes on antiga and dual sight obviously weren't going to change much. If you let a terran safely get a 4th on either of those bases then you're in a bad position regardless of whether it's gold or not. It's not like xel naga where it's an easy 3rd.

the problem was the gold being taken as the first base, or the first expansion in certain situations. Basically happening on most maps lately, thats why tournaments removed the golds. Rocks would have had the same effect. But if you have won an engagement and are able to contain the opponent, take the gold and you can mess up alot they still won't be able to break the contain. Thus golds prevent comebacks, thats why they got removed. (and because golds without rocks force a protoss to do one base play)

Tobberoth

Sweden6375 Posts

November 25 2011 14:52 GMT

#67

Doesn't make much sense to say we had shakuras since the beta, it was a party map back then and I doubt any of your sample games come from before the time it was released for the ladder.

ChaosTerran

Austria844 Posts

November 25 2011 15:03 GMT

#68

Really objective post.

54% win rate for Protoss - "seems really balanced"
55% win rate for Terran - "seems to favor terran"

let me guess, you are a protoss player?

Hassybaby

United Kingdom10823 Posts

November 25 2011 17:42 GMT

#69

On November 25 2011 23:02 Markwerf wrote:

Show nested quote +

You do get the full spectrum, because players start to try anything that can do to help them on a map that is possibly flawed. Its the same problem that 1-1-1 had. Players tried pretty much everything to stop it, but couldn't when it was well executed. At that point, we had balance changes to help the case.

On November 26 2011 00:03 doko100 wrote:
Really objective post.

54% win rate for Protoss - "seems really balanced"
55% win rate for Terran - "seems to favor terran"

let me guess, you are a protoss player?

Random actually

ChaosTerran

Austria844 Posts

November 25 2011 17:44 GMT

#70

On November 26 2011 02:42 Hassybaby wrote:

Show nested quote +

Random actually

so a 1% difference in win rate is enough for you to go from "pretty balanced" to "favors race x". can you explain the thought behind this because I don't understand it.

Hassybaby

United Kingdom10823 Posts

November 25 2011 18:42 GMT

#71

On November 26 2011 02:44 doko100 wrote:

Show nested quote +

so a 1% difference in win rate is enough for you to go from "pretty balanced" to "favors race x". can you explain the thought behind this because I don't understand it.

Lemme double check which exact numbers you're referring to. the 54% is for TDA 1.0 PvT, and the 55% is Shakuras v2.0 TvP, correct?

alpinefpOPP

United States134 Posts

November 25 2011 18:52 GMT

#72

i dont know about anyone else's opinion but personally since the last patch ive felt so so much better about the maps, its been such an improvement to me.

Warble

137 Posts

November 25 2011 23:09 GMT

#73

Besides the arguments already made, I decided to spot check some of your calculations.

For the original Belshir ZvT with 10-6:

μ = 62.5%, which agrees with you.
σ =12.5%, which is smaller than your 15%.
95% confidence interval = 24.5%, which is larger than your 15%.

I thought perhaps you'd used the sample se instead of the hypothesis testing se for a binomial distribution, so I checked that too:

σ = 12.1%, which is smaller than your 15%.
95% confidence interval = 23.7%, which is larger than your 15%.

So I'm not sure how you got your standard errors since that's the only common mistake that comes to mind.

I suspect the same mistakes are made in the other stats threads that pop up. In your case, I think it's commendable that your presentation is transparent because that allows your results to be verified. I am troubled by the other stats threads that aren't verifiable because I suspect they may also have mistakes in their error calculations - after all, in those other threads, the errors were added as afterthoughts upon community request, which is not a good sign since errors are such a fundamental part of statistics.

For the other threads, this means that their results show significance when actually there is no significance. In your presentation, I decided to check some other results for significance:

Belshir Winter TvZ with 5-2:

μ = 71.4%, which agrees with you.
σ = 18.9%
95% confidence interval = 37.0%.

Your results show that TvZ is significant but in reality it is not. Similarly for PvZ and TvP.

However, in the case of Belshir, there's a bigger issue in that you analysed your results even though your sample sizes are so small, the normal approximation is illegitimate. In these cases it is best not to draw any conclusions at all.

Shakuras 2.0 TvZ with 356-292:

μ = 54.9%, which agrees with you.
95% confidence interval = 3.85%, a larger interval than yours, but the result remains significant, which agrees with you.

For future reference, your standard errors for hypothesis testing should be calculated using σ = 0.5/sqrt(n), where n is your sample size.

Hassybaby

United Kingdom10823 Posts

November 25 2011 23:43 GMT

#74

I actually used σ = sqrt(a)/n

Where a is the smallest number of the wins and the losses, and n is the sample size. Errors have never been my strong point, so I had help there. If you want, you can have a look at the data I was using.

https://rapidshare.com/files/1308210131/Map_stats_article.xlsx

Maybe it was a bad idea to draw conclusions on Bel'Shir, but it felt very weak to just give the results and then not conclude anything, so I gave a personal opinion. Not my best move in hindsight

Warble

137 Posts

November 26 2011 00:33 GMT

#75

All right, I'll recheck my equation. I suspect I may have confused the sample distribution with the sampling distribution.

mlspmatt

Canada404 Posts

November 26 2011 00:59 GMT

#76

Terran had a good run the last few months off the back of Blue Flame, 1-1-1, and heavy ghost usage. That's all been dealt with and the dust hasn't settled from the latest changes. Wait to see how the overall percentages pan out over the nest couple months then revisit the maps.

Gryffes

United Kingdom763 Posts

November 26 2011 01:18 GMT

#77

Sample size in some cases is way too small, ~200 games played is probably a reasonable sample.

Snorkle

United States1648 Posts

November 26 2011 01:19 GMT

#78

Including Antiga and Bel-shir in this analysis was a mistake. Especially trying to draw conclusions about the effects of the map changes from one version to the next. You say there is no choice but to "take them as they are" but that is not true. The choice is to not include them because drawing conclusions from that small of a sample is idiotic. I just went to a "coin flipping" website flipped 10 coins and got 3 heads and 7 tails. Should I conclude that tails is extremely favored over heads or should I not conclude anything because the sample is far too small to mean anything?

emc

United States3088 Posts

November 26 2011 01:39 GMT

#79

if there isn't enough data, then you kind of wasted your time because I look at belshir beach and how little results there are and I ignore that post. Thanks for taking your time to do this, some of this I already knew but should help some players (or hurt them by making them realize they can't ever win a game on a certain map so they QQ and blame it on blizzard instead of themselves)

Warble

137 Posts

November 26 2011 06:25 GMT

#80

What's the theory behind the formula you use for your standard errors? Does it have a name?

I checked your formula against mine for some dummy scenarios and your formula certainly overestimates the standard errors (a good thing), sometimes by a very wide margin (a bad thing). My formula tended to underestimate the standard errors, but they were much closer to the real value, especially when P = 50%.

However, considering the relatively small sample sizes, you don't actually need to use approximations (which are only reliable for massive sample sizes anyway). Your largest sample is only about 700 observations, so you can solve all of them directly using the binomial distribution. The downside is that you can't easily solve this using a basic calculator.

Here's a sample calculation comparing our methods:

Using your data for Shakuras Plateau 2.0 (from your post, not your raw data):

TvZ: 356-292 (54.94% ± 2.6% to Terran)

Using my formula, we get a se of 1.96%, and a p-value of 0.012. This is significant at the 95% level.

Using your formula, we get a se of 2.64% and a p-value of 0.0614. This is not significant at the 95% level.

What is the real value? Solving using the binomial distribution directly, I obtained a 1-sided p-value of 0.006634, i.e. a 2-tailed p-value of 0.013, which is significant at the 95% level.

As you can see, my formula resulted in a much closer approximation, with a tendency for underestimation of the se. The outcome is that we now have sufficient evidence that Shakuras Plateau 2.0 is imbalanced in TvZ whereas using your formula we couldn't say that.

I think you will find that the tighter se calculations will be useful for your cause. A quick glance through your results, using your numbers, show that:

There is insufficient evidence of imbalance on any version of TDA, and thus it never needed updates.

There is insufficient evidence of imbalance on any version of Antiga, even if we ignore the small sample sizes, and thus it never needed any updates.

There is evidence that TvP is imbalanced on Shakuras Plateau 2.0 and none before, so the changes were actually detrimental.

There is evidence of imbalance on Lost Temple in TvZ, and on Shattered Temple 1.0 in TvZ, and on Shattered Temple 1.1 TvZ and ZvP.

So overall it looks like Shattered Temple 1.1 has been made more favourable for zerg, going from 70% in TvZ to just 58%, and from 55% in ZvP to 60%. So I agree with your assessment about the large effects of removing close spawns, and it certainly agrees with conventional wisdom that close spawns are bad for zerg. And we can see that close spawns are bad for zerg in both ZvP and ZvT.

And we ignore Belshir Beach due to the small sample size.

Imagine how much more you could say if you calculated your se's using the binomial distribution directly?

Just be careful that there are some tricky details involved. If your results don't exactly match mine, you should recheck your methodology.

I think these sorts of statistics are fun to look at. I just wish people didn't have such knee-jerk reactions to them as we've seen in many of the responses here, and I also wish people wouldn't use them as fuel for balance whines (cheese and whine seem to be SC2's primary industries). Overall I think it's a good effort and regardless of significance levels, it's interesting to see the ideas involved.

There are simple ways to account for skill levels, but the data preparation is tedious without cooperation from a source like the TLPD.

As for accounting for metagame changes, if it is possible to break down the data for each map into small chunks, we can get a better idea of how gameplay on a particular map has developed over time. For example, with Shakuras Plateau 2.0, you have about 600 observations for each matchup. If there was a way to separate the data into, say, 6 parts based on when they were played, each with 100 observations per matchup, it would be easier to get a clearer idea of how gameplay has evolved despite there being no changes to the map. I think the TLPD already has the facilities to do this, although does require some work. Then it is simply a matter of choosing which time periods to break the data over. It can even be done to get an idea of the effects of balance changes. One interesting outcome to look out for is a sudden change in the matchup statistics in the later life of a map before it is updated. For example, a map that looks TvZ favoured overall could be TvZ favoured in the first 5 periods and then swing towards a slight ZvT favour in the final period before it is updated to a new version. This would indicate that the update occurred at an inopportune time.

And, yes, this would actually involve deliberately reducing the sample sizes in your analysis. Oh, the horror. :-P

Normal

Please or register to reply.

Statistics behind map balance

Completed

Ongoing

Upcoming