The gist of the presentation is they use alot of methods, no single method is sufficient in a vacuum.
Main methods are:
Community Feedback - community is passionate and analyse the game to death, discover things we never thought of, great resource but sometimes the loudest voice wins the debate, bandwagoning can affect discussion.
Pro tournies, replays - More concerned with gameplay rather than results at pro level because of skill level difference. Get feedback from top players, Kim talks about Maka highlighting void ray timings that was a reason for void change. Objectivity of pros difficult to judge sometimes because hard to tell if wins are due to skill difference or race imbalance - pros playing 1 race can make them biased.
One interesting spot talking about Pro strats trickling down to the rest of bnet, they saw a massive explosion of 5 rax reaper in TvZ after Morrow beat Idra at IEM.
A kind of turbo charged unit tester where they can also manuipulate unit pathing, target acquistion priority etc. Good tool but doesn't always take into account metagame, lends itself to theorycrafting and doesnt always reflect how match ups unfold
Their skill adjusted race match up statistics that was brought up in the developers corner post. They don't start to really panic about race imbalance until they see 60 - 40 numbers, 55 - 45 is a concern but within an acceptable range.
Here's their formula for their race match ups adjusted for skill:
Talk a little bit about how Korea lead the pack, and strats tend to trickle over to other servers so they pay special attention to top level balance in Korea. Bring up how P > T everywhere except Diamond Korea - possibly because top players are better at stim kiting.
This leads into how their biggest concern right now is PvT - specifically Terran being OP early (stim timing) and Protoss being OP late game (upgraded HT). Specifically say while they could technically get 50/50 win percentages with the match up like this but a MU where one race has to win in the first 12 minutes isn't good enough
I've paraphrased alot of the QA - don't take this as the be all and end all because some answers sound slightly different out of context, despite my best efforts to stay true to spririt of what was said.
Overall the presentation and Q&A made me pretty confident that these guys know what they're doing, and have their finger on the pulse of what's going on. Apologies if this is infested with typos, was trying to type and watch at the same time.
Blizzard's recap - general overview of panel, doesn't really touch Q&A - link
+ Show Spoiler +
Following the StarCraft II Art Panel, attendees were treated to an inside look at balancing StarCraft II’s multiplayer game with Game Director Dustin Browder, Senior Designer Josh Menke, and Associate Game Balance Designers David Kim and Matt Cooper.
The StarCraft II developers feel that it’s important to take a look at the various tools that are employed in defining balance. At first, each one of these tools looks like it could be the one answer you need -- but it becomes clear over time that no single tool provides the perfect solution to balance. Instead, it takes multiple tools and a complete understanding of what those tools tell the designers. So what tools do the developers use?
Player Feedback
Player feedback is perhaps the best tool available to the development team, as it allows for many voices to be heard across a variety of skill levels and experiences. This method also represents the largest pool of players. While data is a great tool, raw stats don’t qualify what players are experiencing from their perspectives. By reading the forums and getting feedback from the community team, the developers can gain insight into how the community is playing the game, what units they're using, and what difficulties or successes they're having.
There are drawbacks to utilizing player feedback exclusively. Sometimes the loudest of voices aren't portraying their experiences accurately, and the many can easily drown out a single voice that has different, yet important information the development team needs to make balancing decisions.
Pro Feedback
Pro players represent another important balancing tool to the development team. These players have a high skill level and understand the minute details of the game. They are also a great resource for critical feedback. On the downside, these players are generally very focused on one particular race and represent a very small subset of the community. When taking these players into account, it’s important to note that they may not know exactly why they lost a match -- whether it was due to their own error or an actual imbalance to the race, ability, or unit they are using.
Tournaments
Tournaments can be a great resource for observing games played at a very high skill level. When watching these matches, however, it’s important to look at the games individually and not just the end results. A talented player like Fruit Dealer may just be so good that he was going to win no matter what race he played. However, each game can give some insight into where the holes within the balance might lie. Players in these tournaments are generally very good at finding these holes and taking advantage of them, and it’s the development team’s job to keep an eye out and determine if something needs to be changed. The weakness in looking only at tournaments lies in knowing that there’s no way to be certain that matches are equal. All it really takes is a single poor performance to keep a top player from progressing.
Play the Games You Make
There’s no better way to see what players are experiencing firsthand than to play the game yourself. It’s a good way to get into the trenches, analyze gameplay, and find out what’s fun, what’s not fun, what tactics work and don’t work, and so on. However, while the development team consists of players of every skill level, the team is only so large -- and even with additional feedback from within the company, it can sometimes take time before the next new strategy gets to our team.
Spreadsheets
Spreadsheets are a great tool for looking at straight damage numbers, how fast or slow units are made, how often, what combinations of units are used, unit costs, and more. What spreadsheets don’t tell the developers is the how or why. While designers can take a look at the sizes of armies and make adjustments to building times (such as what was done during the beta with terrans), spreadsheets can’t really take into account pathing, unit size, random target acquisition, and other factors which only occur in a real game.
Make Combat
Make Combat is a great in-house simulation tool that allows the development team to run various scenarios with units to see how they stack up against each other, but running one simulation isn’t enough. Simulations need to be run multiple times before any sort of pattern begins to take shape -- if there’s even a pattern to be seen. Unlike a spreadsheet, Make Combat can take a look at unit pathing and can even allow micro to be employed if the developers wants to drill down a little bit more. What the simulation doesn’t do well is take into account all the myriad combinations of units or terrain. While it’s a handy tool, it’s only one of many, and results can’t always be taken at face value.
Battle.net Stats
Battle.net provides information on millions of games: who’s playing, what they’re playing, how people are progressing through the ladder, and more. It also allows development to look at the win/loss ratios between the races.
Matchmaking within the system, however, intentionally does not account for win/loss and looks purely at player skill -- and any existing race imbalance gets worked into that equation. Adjusted win percentage simultaneously considers both player skill and race balance. After each match, estimates of player skill and adjusted race win percentages are updated relative to the expected outcome of the match. In other words, if what happened was exactly what was expected, then nothing changes. If the system is surprised, then changes may be in order.
From there, the developers can see the win/loss ratio of the various races within each league. Generally these tend to be relatively even across the board, though there can be cause for concern if the percentage of win/loss between the races skews toward 60%/40%. When looking at these percentages, it’s important to note that they can shift very quickly -- in as short as 36 to 48 hours -- based on a change in the metagame.
Korea
While percentages between the ladders may look fairly balanced in other regions, the team also looks to Korea as a global leader in developing new strategies and setting metagame trends.
Community
It’s important to take all of these various tools into account when looking at balance. For example: When talking with the community, a common perception is that marauders are too powerful and their Stimpacks need to be nerfed. When running scenarios in Make Combat, it appears that marauder Stim isn’t overpowered and the terrans end up nearly evenly matched with zerg. The developers can see that marine Stimpacks are very powerful; however, it may be that marauders are acting as shields for the marines behind them. So is it that marauder health is too high? Or are marine Stimpacks are too powerful? We still don't know -- but we’re always looking for answers to questions like these.
Pros on PvT
When asking the pros about protoss vs. terran matchups, there are conflicting opinions and a split in whether these players think one or the other is the more powerful race. When the pros aren’t sure, then the development team needs to look deeper to see if perhaps there’s a more fundamental issue than game balance to deal with.In the developers’personal experiences playing the game, the terran tend to be strong at the start of a match, but the protoss are more powerful toward the end, which could point to a design issue. However, as you’ve realized by now, it’s impossible to make such a determination based on these tools alone.
Future Balance
The designers employ all of these tools and more on a daily basis to determine future balance changes. Currently, the focus is on making terran vs. protoss matchups more fun and analyzing Stim vs. Psi Storm balance. But today’s problems will inevitably be solved, and others will invariably pop up -- and the development team is dedicated to investigating, analyzing, and balancing for the long haul.
The StarCraft II developers feel that it’s important to take a look at the various tools that are employed in defining balance. At first, each one of these tools looks like it could be the one answer you need -- but it becomes clear over time that no single tool provides the perfect solution to balance. Instead, it takes multiple tools and a complete understanding of what those tools tell the designers. So what tools do the developers use?
Player Feedback
Player feedback is perhaps the best tool available to the development team, as it allows for many voices to be heard across a variety of skill levels and experiences. This method also represents the largest pool of players. While data is a great tool, raw stats don’t qualify what players are experiencing from their perspectives. By reading the forums and getting feedback from the community team, the developers can gain insight into how the community is playing the game, what units they're using, and what difficulties or successes they're having.
There are drawbacks to utilizing player feedback exclusively. Sometimes the loudest of voices aren't portraying their experiences accurately, and the many can easily drown out a single voice that has different, yet important information the development team needs to make balancing decisions.
Pro Feedback
Pro players represent another important balancing tool to the development team. These players have a high skill level and understand the minute details of the game. They are also a great resource for critical feedback. On the downside, these players are generally very focused on one particular race and represent a very small subset of the community. When taking these players into account, it’s important to note that they may not know exactly why they lost a match -- whether it was due to their own error or an actual imbalance to the race, ability, or unit they are using.
Tournaments
Tournaments can be a great resource for observing games played at a very high skill level. When watching these matches, however, it’s important to look at the games individually and not just the end results. A talented player like Fruit Dealer may just be so good that he was going to win no matter what race he played. However, each game can give some insight into where the holes within the balance might lie. Players in these tournaments are generally very good at finding these holes and taking advantage of them, and it’s the development team’s job to keep an eye out and determine if something needs to be changed. The weakness in looking only at tournaments lies in knowing that there’s no way to be certain that matches are equal. All it really takes is a single poor performance to keep a top player from progressing.
Play the Games You Make
There’s no better way to see what players are experiencing firsthand than to play the game yourself. It’s a good way to get into the trenches, analyze gameplay, and find out what’s fun, what’s not fun, what tactics work and don’t work, and so on. However, while the development team consists of players of every skill level, the team is only so large -- and even with additional feedback from within the company, it can sometimes take time before the next new strategy gets to our team.
Spreadsheets
Spreadsheets are a great tool for looking at straight damage numbers, how fast or slow units are made, how often, what combinations of units are used, unit costs, and more. What spreadsheets don’t tell the developers is the how or why. While designers can take a look at the sizes of armies and make adjustments to building times (such as what was done during the beta with terrans), spreadsheets can’t really take into account pathing, unit size, random target acquisition, and other factors which only occur in a real game.
Make Combat
Make Combat is a great in-house simulation tool that allows the development team to run various scenarios with units to see how they stack up against each other, but running one simulation isn’t enough. Simulations need to be run multiple times before any sort of pattern begins to take shape -- if there’s even a pattern to be seen. Unlike a spreadsheet, Make Combat can take a look at unit pathing and can even allow micro to be employed if the developers wants to drill down a little bit more. What the simulation doesn’t do well is take into account all the myriad combinations of units or terrain. While it’s a handy tool, it’s only one of many, and results can’t always be taken at face value.
Battle.net Stats
Battle.net provides information on millions of games: who’s playing, what they’re playing, how people are progressing through the ladder, and more. It also allows development to look at the win/loss ratios between the races.
Matchmaking within the system, however, intentionally does not account for win/loss and looks purely at player skill -- and any existing race imbalance gets worked into that equation. Adjusted win percentage simultaneously considers both player skill and race balance. After each match, estimates of player skill and adjusted race win percentages are updated relative to the expected outcome of the match. In other words, if what happened was exactly what was expected, then nothing changes. If the system is surprised, then changes may be in order.
From there, the developers can see the win/loss ratio of the various races within each league. Generally these tend to be relatively even across the board, though there can be cause for concern if the percentage of win/loss between the races skews toward 60%/40%. When looking at these percentages, it’s important to note that they can shift very quickly -- in as short as 36 to 48 hours -- based on a change in the metagame.
Korea
While percentages between the ladders may look fairly balanced in other regions, the team also looks to Korea as a global leader in developing new strategies and setting metagame trends.
Community
It’s important to take all of these various tools into account when looking at balance. For example: When talking with the community, a common perception is that marauders are too powerful and their Stimpacks need to be nerfed. When running scenarios in Make Combat, it appears that marauder Stim isn’t overpowered and the terrans end up nearly evenly matched with zerg. The developers can see that marine Stimpacks are very powerful; however, it may be that marauders are acting as shields for the marines behind them. So is it that marauder health is too high? Or are marine Stimpacks are too powerful? We still don't know -- but we’re always looking for answers to questions like these.
Pros on PvT
When asking the pros about protoss vs. terran matchups, there are conflicting opinions and a split in whether these players think one or the other is the more powerful race. When the pros aren’t sure, then the development team needs to look deeper to see if perhaps there’s a more fundamental issue than game balance to deal with.In the developers’personal experiences playing the game, the terran tend to be strong at the start of a match, but the protoss are more powerful toward the end, which could point to a design issue. However, as you’ve realized by now, it’s impossible to make such a determination based on these tools alone.
Future Balance
The designers employ all of these tools and more on a daily basis to determine future balance changes. Currently, the focus is on making terran vs. protoss matchups more fun and analyzing Stim vs. Psi Storm balance. But today’s problems will inevitably be solved, and others will invariably pop up -- and the development team is dedicated to investigating, analyzing, and balancing for the long haul.
Q: Offline play?
A: Maybe, people might hack achieves.
Q: Mutalisks are hard countered by thor and marine, give them +3 to light
A: They're mobile, we're happy with the muta
Q: Custom keybinds?
A: Didn't want to delay release for it, it's in testing in internal builds, it's really complex to do from a UI standpoint, no timeline but it's pretty far along
Q: Premium maps, marketplace?
A: Heart of the Swarm at the earliest
Q: 10mb map limits?
A: We're chewing up storage really quickly but it's a really hot topic at Blizz, we're looking at it
Q: Wierd question about how balance team spot and fix cheese, rushes
A: Actually the easiest things to fix most of the time, more concerned about more complex dynamics. Want rushing to be viable
Q:Give me PvZ advice
A: Build Collossi
Q: More stats and data accessible from score screen?
A: We want to improve this stuff alot, maybe in a patch but probably HotS
Q: (Kind of incoherent, I think the main theme is) "Are Zerg designed to need to out macro T and P?
A: Yes, specifically not supposed to be strong at pushing at Tier 2, need to macro up and get to tier 3
Q: Is Sc2 going to be like WoW with a balance patch that makes one race overpowered each tuesday, Roaches run right past my 10 carriers and kill my nexus because they have so much hp now (lol?)
A: We are going to slow down patching, especially compared to beta, the goal is to stabilise and slow down as much as possible but we're not going to be chicken in patching if we think there are issues, PvT is our biggest concern at the moment, re: WoW comparison, WoW has issues Sc2 doesn't even begin to have, Sc2 is specifically on a smaller scale (3 races, 12 - 14 units) to keep balance tight
Current balance: We feel it's pretty tight, the roach range has made people feel Z has gone from UP to OP, shows how tight balance is and how small changes can make big impacts, we're going to be cautious with changes and try to only make small tweaks
Q: (ExcaliburZ, author of Ladder analysis thread, esports forum mvp) - What besides hidden MMR influences player ranking and match making?
A: What you've already posted is pretty close to make pretty good guesses at what's going on. Bonus points don't have any role in MM, different leagues,skill of division comes in to play, your analysis is pretty close and accurate but not perfect. (I think I butchered this transcription a bit, didn't quite understand the answer)
Masters League, Grand Masters League - Currently comparing rankings across diamond league (ie different division rankings) can't necessarily trust those comparisons - in the new leagues you will be able to do this much more effectively
(I think what he's trying to say is that in ML / GML points will be much more accurate as an indication of relative rank, unlike currently where you see big discrepancies between Sc2Ranks and the Blizzard top 200)
Q: Was sitting behind you in Huk / Loner, when you watch games are you watching as the balance team or as fans?
A:Browder - I try to watch as a fan unless something is obviously really broken then I go to developer mode
Cooper: We're always looking for stuff thats potentially broken, but I love watching as a fan
Kim: Watch alot of pro games, Pros send me replays, I'm not surprised by tactics, I'm watching to confirm what I already suspect
Q: Are the speed at which Toss get Warp Gates where you want them to be? Toss can force a macro game very early
A: We want every race to feel insanely overpowered (but still balanced) WGs have been nerfed alot already, we think they're ok
Cooper: We think each race has broken mechanics (eg Zerg tech switching late game) but overall we think is in a good spot
Kim: We're seeing at Pro level less Warp gate rushes - people arn't used to it from SCBW, Pros are beginning to deal with it, we're hoping it trickles down into lower levels
Q: Do you think HSM is underused?
A: Kim: It was causing alot of trouble in the late game - Terrans turtle hard then mass ravens, trade mana for enemy resources so we had to nerf it. This wasnt the gameplay we were looking for
Not every spell has to be amazing in every situation, as long as abilities have applications in specific situations where they're powerful its ok (eg HSM vs Mutalisk)
Browder: It's definitely underused, but we had to nerf for it cos of what Kim said, not going to buff it in the immediate future. Going to wait and see what community does with it, community often surprises us by utitilising underused abilities.
Q: Life cycle of a patch - how much leeway is given to Pro's to develop new strats after patching
A: Plan to build a tourney server so that players / organisers can choose which patch they want to use - not ready yet, currently with GSL we tried to patch at times which would cause the least disruption eg right after prelims (wanted to get it out before prelims)
Makes the point that there's always major events on so the tourney server is a priority
Q: Roach / reaper changes killed off reaper vs Z, do you think this was too hasty and didnt give players enough time to come up with counter strats, Fruit Dealer showed us Z were all doing it wrong
A: Main focus is on competitive level, but we do care about team games and other levels, Reaper Ling was completely and utterly broken in 2v2. On the flip side for 1v1 Terran had too many openers that came out before they could be scouted so we wanted to limit T openers, 2v2 balance was a good excuse to nerf it.
Browder: Saw alot of reapers in the tourney today. Going to wait and see if Reapers become useless or not, we think it might still have some uses
Q: Thor vs Ultra - Thor is superior, cant attack air, beats ultra 1v1, do you think its imba
A: Thor is supposed to win, ultra can splash, Thor is best against clumped light air, main role for ultra is to kill large groups of small ground units, each unit is supposed to have a unique role, they're not supposed to match up and have equivalents across races, Zerg units are generally weaker due to macro mechanics in late game
Q: Boxer might have done better if he had spawned cross locations, do you take spawn locations in maps into account in balance stats.
A: Map balance is a big issue, don't think we take spawn location into account but it's a good point, we watch to see if race match ups become lopsided on specific maps, that's why we yanked DO / Kulas.
Location based stats might be drilling too deep but if we think there might be an issue we can access those stats