Introduction
This post is a followup to the original ladder analysis post, which shall go into further detail regarding the system. Please note that much of the content contained within this post is of a more speculative nature, and if a detail here is wrong it should not reflect poorly on the original analysis. I will be delving deeper into the mathematical underpinnings, though it should not be excessively complex and I will try to make it easy to follow.
Overview
To start with, we assumed that Blizzard used a system quite similar to their WoW Arena matchmaking system, albeit with refinements. The Arena system uses a Bayesian inference model to create its ladder and do its matchmaking. What this means in essence is that the rating used to represent your skill is easily updated after each match. For more details, see: http://en.wikipedia.org/wiki/Bayesian_analysis
In conjunction with this, the MMR is actually one part of the skill probability distribution. Blizzard also uses an “uncertainty” factor. That is, when you first start in Arena there is a lot of uncertainty in your rating. As you play more games, that uncertainty decreases and the system is more “confident” in the rating it has assigned to you. I will be referring to this uncertainty factor as sigma, and it is the inverse of the system's confidence. This forms a bell curve, also known as a Gaussian, or normal, distribution. For more details, see: http://en.wikipedia.org/wiki/Gaussian_distribution . The curve represents a couple related ideas: the range in which your skill may truly fall, as well as the fact that you do not play at exactly the same skill level every game. A more consistent player would have a narrower curve, for example.
This class of ladder and matchmaking is not new. The first system using a method similar to this is the Glicko system, used to rank chess players, and is arguably better than the famous ELO system which encourages some strange behavior (e.g. it is better to draw in ELO than risk a loss in many cases). Another well-known system is Microsoft TrueSkill, used in every Xbox 360 game for matchmaking and ranking, as well as PC games such as Dawn of War 2.
The published data on TrueSkill gives a glimpse at the underpinnings of a modern Bayesian ranking system designed for videogames. Blizzard’s implementations are obviously different from TrueSkill, though we can infer much from what we know about TrueSkill, and what we know about the SC2 ladder.
For a layman’s primer on TrueSkill: http://research.microsoft.com/en-us/projects/trueskill/details.aspx
For an in-depth description of TrueSkill: http://research.microsoft.com/apps/pubs/default.aspx?id=67956
Matchmaking
The short version of what the links above show is that it is possible (and computationally efficient) to take the MMR and uncertainty factor (also known as sigma, or standard deviation) for both players. The MMR and sigma form a bell curve per player. It is possible to combine the bell curves into a 3D probability distribution. This is done by combining the data to form a shape like this:
It may help to think of it as combining the two 2D curves perpendicularly and forming this 3D shape. This shape is centered on a point in the (x,y) plane, where x represents player 1’s skill, and y represents the skill of player 2. Intuitively, the best matches will be between ratings where x=y. Thus, Blizzard attempts to keep it as close as possible. Looking at this same shape top-down (try to visualize it as a topographical map):
Run a line along x=y, and you will split the shape into 2 pieces. If you sum the volume under the shape on each side of this split, and compare their relative size you will get the probability of a player victory. If the curve is contained wholly within one side of the graph then clearly that player is overwhelmingly favored by the system (Note: this is NOT the same thing as the “Favored” display on the loading screen!). Also note that this does not need to be circular when looking at a top-down section. If players have different confidence values it will look like an ellipse.
Note that this figure is taken from a TrueSkill presentation, and is copyright Microsoft. TrueSkill incorporates the possibility of a draw. More intuitively, it can be thought of as the “matchmaking sweet spot”, and something similar is likely used by SC2’s ladder to provide the system some wiggle room in matchmaking.
After a match finishes, the system needs to update the MMR and sigma for both players. Displayed rating will be discussed later in this post. Whenever a match finishes the winner’s MMR increases and the loser’s decreases. More interesting is what happens to the sigmas. If the match finished as expected with the MMR favored player winning (and remember, the loading screen “favored” display is NOT this) then both players' sigmas will decrease. That is, the system gains confidence in the ratings it has assigned to the players. If the match finishes in an upset and both players' sigmas are small, then the sigmas for both players will increase as the system thinks it may have an incorrect rating assigned to both. The change in sigma scales based upon the difference in MMR and the difference in sigmas. That is, losing to someone close to your own rank will not change your sigma too much (though it will over the course of several games).
If a lower-MMR player wins then what happens depends a lot more on their precise equations they are using. If a player's sigma is large in an upset (whether he's the winner or loser) it can decrease. That is because, given the right MMR and sigma values, it's possible in theory for the system to learn about that player's skill and rate him more accurately. If a player's sigma is small, however, it can become larger after an upset if that upset was truly unexpected.
To summarize: combining the MMR and uncertainty factor of a player creates a curve. Take two of these curves and form a 3D shape. This shape shows the probability of victory when split along x=y. Matchmaking tries to have x=y, but will expand the search if no match is found quickly.
Promotion
As initially theorized, promotion requires your MMR to be above a certain league threshold. However, because MMR changes greatly after each match and the opponent variation is so wide, often spanning multiple leagues, the system requires a particular degree of confidence before it allows promotion. Our initial theory assumed that sigma just needed to be small enough to allow promotion, but it's been confirmed that sigma never gets this small. Instead, it does this by a moving average. Here's an example:
MMR is erratic. A moving average seeks to smooth out the rapidly changing data points over time by evaluating your progress over X number of games. As we previously estimated, the system doesn't use your full match history because if it did, you would eventually get stuck in a league. Once your moving average crosses a particular league threshold, that's when you'll get promoted.
Players like CauthonLuck and Ret who had obscene win ratios had their MMR data points skyrocket. However, the moving average lags behind. In the cases of those players, it will take much longer for the moving average to reach that required threshold. This is why players like IdrA who were affected by this problem have decided to intentionally throw games in order to get promoted, because it allows the moving average to catch up more quickly.
Possibly related is players that aren't getting promoted or demoted properly despite a high likelihood that their moving average would have crossed the confidence threshold. Blizzard has said that this is indeed a bug and will be fixed by moving the affected players to new divisions.
Displayed Rating
Ok, how does all of this tie into displayed rating and the whole “favored” deal? If you remember back to WoW, ratings changed based on a direct comparison of your displayed rating to the other team’s MMR. So if your current rating was 500 and you were playing people with MMRs of 2000, your rating would jump significantly after every win because of the wide disparity. Now, we’ve identified that on the loading screen quite often players are seeing the other person as favored and the opponent (who is nominally “favored”) also sees his opponent as favored! How can this be? The theory put forth here is the system is again comparing your displayed rating to your opponent’s hidden MMR.
The reason for this is so that the system brings you toward your MMR more quickly. kzn explains:
On August 08 2010 14:30 kzn wrote:
How it works was like this: Say you've got a MMR of 2500, and you start a new team. It starts at 0 rating, but the matchmaking system will match you with other players of MMR 2500. If you lose a game, your team rating would not change at all. If you won, it would increase by 47 (a hard cap that was in place at least when I played). This was not explained as arising due to an interaction between the team rating and the opponent's MMR, however - it was explained as the system trying to get your team's rating as close as possible to your team's MMR rapidly.
How it works was like this: Say you've got a MMR of 2500, and you start a new team. It starts at 0 rating, but the matchmaking system will match you with other players of MMR 2500. If you lose a game, your team rating would not change at all. If you won, it would increase by 47 (a hard cap that was in place at least when I played). This was not explained as arising due to an interaction between the team rating and the opponent's MMR, however - it was explained as the system trying to get your team's rating as close as possible to your team's MMR rapidly.
Therefore, a corollary here is that when determining rating increase, the hidden threshold value for your league is added to your displayed rating, then compared to your opponent’s MMR, for purposes of computing the gain/loss to your displayed rating.
Example: ExcaliburZ and I play a game. His MMR: 2600, sigma: 100, displayed rating: 300. My MMR: 2500, sigma: 50, rating: 150. Diamond’s MMR threshold: 2300. Excal wins because he rules. What happens?
- His MMR will increase
- My MMR will decrease
- Both of our sigmas will decrease
- His rating will increase. How? By comparing my MMR (2500) against his rating + diamond’s MMR threshold: 300 + 2300 = 2600, his gain is thus off 2600 vs my MMR of 2500
- My rating will decrease. In the same way: his MMR: 2600. My rating + threshold: 150 + 2300. Thus I lose points proportionally to 2450 vs 2600.
Conclusions
SC2 uses a Bayesian inference system for its skill determination which forms an MMR and a confidence value for each player. These form a Gaussian distribution useful in determining win probability. Promotions/demotions occur when a player exceeds/drops below a threshold with sufficient confidence. Displayed rating changes according to a combination of the rating itself combined with the hidden MMR and league thresholds.
More clarifications from Vanick:
On August 08 2010 11:33 vanick wrote:
To be clear, the player's skill is never pinpointed. The sigma is never 0. All players vary in their performance from game to game and over time as their skill increases (or decreases!).
I left a point out in my writeup that I probably should have included. TrueSkill, and likely SC2's ladder, have a factor based off the time since your last game that increases the player's uncertainty level (sigma) by an amount related to that. Even if you're playing games back to back this factor will have a minimum value that will still increase sigma. This allows the system to adapt to a player whose skill increases over time.
To be clear, the player's skill is never pinpointed. The sigma is never 0. All players vary in their performance from game to game and over time as their skill increases (or decreases!).
I left a point out in my writeup that I probably should have included. TrueSkill, and likely SC2's ladder, have a factor based off the time since your last game that increases the player's uncertainty level (sigma) by an amount related to that. Even if you're playing games back to back this factor will have a minimum value that will still increase sigma. This allows the system to adapt to a player whose skill increases over time.
Questions
Some of these have answers. Some are open questions. You can add on; I will answer them as best I can.
Q: So how do bonus points affect the display rating changes? If the displayed rating change is based upon the comparison of the opponent's MMR with the player's displayed rating + the player's league cutoff, then wouldn't bonus points inflate the displayed rating and cause problems?
A: I'm not sure how they account for this. One possibility is they keep track of bonus points that make up your displayed rating, and ignore them when performing the calculation in the back-end.
Excal: It seems more likely that the bonus pool is only used to increase the displayed rating for division ranking purposes and ignored in back-end calculation because the bonus pool increases at the same rate for all players. This introduces a constant that is easily discarded when assessing actual skill within the system. Furthermore, if bonus points were considered in the process of point calculation, it would present an unfair advantage for players who have not yet used up their bonus pool (because their rating is therefore inflated giving them more to lose).
Q: Would it take longer to get promoted if you've played lots of games? Assuming someone played a large amount of games (say 100 with a 50% win/loss ratio). If he were to start winning 70% of his games, would it be harder for him to get promoted than someone with similar percentages but fewer games played?
A: It would take longer, yes. The moving average trails behind sharp increases in skill.
Antiquated or Incorrect Information for Archival Purposes
+ Show Spoiler +
[u]Promotion[u]
At this point we have established how matchmaking works, and how the skill belief system is updated (through MMR and sigma). How does promotion work? Our current theory is that it uses a checkpoint system, in which after a certain number of games the player’s MMR is checked and if it is above or below a certain threshold it will promote/demote that player. That may or may not exist still, based upon evidence people have provided in the release version of SC2. In any case, this section will attempt to describe the thresholds further.
In prior descriptions, it has been said that in order to be promoted, your MMR must be above the threshold for that league. That is still true. What we propose here is that your MMR must be above that threshold with 99% confidence. What does this mean? To determine this with 99% confidence, your MMR – (3 standard deviations (sigma)) must be above the threshold. For example:
Your MMR: 2500 Your sigma: 100
Diamond Threshold: 2300
Since (2500 – 3*100) = 2200, this value is less than the threshold even though your MMR is higher. In order to be promoted you would need to increase your MMR, decrease your sigma, or both. This notably creates the situation where you can be highly ranked in Platinum, but still be better than players in Diamond. In addition, if you are a borderline player who is promoted to Diamond, you would need to lose a lot of MMR to be demoted again. That is, if your MMR is 2450 and your sigma is 50 and you are in Diamond, you would need to drop to an MMR of 2150 (assuming sigma remains constant) before demotion would occur.
This above description is more theoretical than the matchmaking description, for which we have more direct evidence. However, this does go a certain length to describing some of the behavior seen with promotion/demotion, and given such a system the checkpoint review system as originally conceived may be incorrect.
Edited as the system has changed since this post
Q: So what’s the deal with people stuck in Platinum who can’t get promoted to Diamond despite clearly belonging there?
A: Short answer? It’s a bug. Longer answer: a lot of people have suggested that the system requires you to lose in order to build its confidence factor. This is almost certainly incorrect. The system in theory learns enough about you from your wins to promote you. Intuitively, if your record is 60-5 against diamond players, you ought to be in Diamond. The TrueSkill system can determine this, and I would be dollars to donuts that Blizzard’s system can too, as designed anyways. Implementation may have introduced bugs that certain players hit under certain conditions. We don’t have enough evidence to flat out state that the system requires you to lose. It may be a workaround to the bug, however.
One possible explanation is that the moving average lags so far behind that more games are required in order to cross the promotion threshold. It's also possible that the bug prevents the moving average from changing.
EDIT 10/25/2010: Made crucial updates to several sections in light of new information acquired from Blizzcon 2010.
EDIT 8/11/2010: Made an important clarification to the Matchmaking section.
EDIT 8/10/2010: Added a third question related to promotion opportunity.
EDIT 8/9/2010: Added extra information to the first question about the circumstances under which sigma may increase or decrease. Also removed a misleading sentence regarding ideal matches.
EDIT 8/7//2010: Modified the second question to make it less vague, and removed incorrect information from the Displayed Ratings section.
_________
Thanks to myself for proofreading, editing, and analytical input (hehhh self-credit).