SC2 Ladder Analysis: Part 2

Excalibur_Z

United States12235 Posts

August 07 2010 23:59 GMT

Following my previous ladder analysis post located here: http://www.teamliquid.net/forum/viewmessage.php?topic_id=118212 Vanick has developed a more in-depth theory regarding the inner workings of the SC2 ladder system.

Introduction

This post is a followup to the original ladder analysis post, which shall go into further detail regarding the system. Please note that much of the content contained within this post is of a more speculative nature, and if a detail here is wrong it should not reflect poorly on the original analysis. I will be delving deeper into the mathematical underpinnings, though it should not be excessively complex and I will try to make it easy to follow.

Overview

To start with, we assumed that Blizzard used a system quite similar to their WoW Arena matchmaking system, albeit with refinements. The Arena system uses a Bayesian inference model to create its ladder and do its matchmaking. What this means in essence is that the rating used to represent your skill is easily updated after each match. For more details, see: http://en.wikipedia.org/wiki/Bayesian_analysis

In conjunction with this, the MMR is actually one part of the skill probability distribution. Blizzard also uses an “uncertainty” factor. That is, when you first start in Arena there is a lot of uncertainty in your rating. As you play more games, that uncertainty decreases and the system is more “confident” in the rating it has assigned to you. I will be referring to this uncertainty factor as sigma, and it is the inverse of the system's confidence. This forms a bell curve, also known as a Gaussian, or normal, distribution. For more details, see: http://en.wikipedia.org/wiki/Gaussian_distribution . The curve represents a couple related ideas: the range in which your skill may truly fall, as well as the fact that you do not play at exactly the same skill level every game. A more consistent player would have a narrower curve, for example.

This class of ladder and matchmaking is not new. The first system using a method similar to this is the Glicko system, used to rank chess players, and is arguably better than the famous ELO system which encourages some strange behavior (e.g. it is better to draw in ELO than risk a loss in many cases). Another well-known system is Microsoft TrueSkill, used in every Xbox 360 game for matchmaking and ranking, as well as PC games such as Dawn of War 2.

The published data on TrueSkill gives a glimpse at the underpinnings of a modern Bayesian ranking system designed for videogames. Blizzard’s implementations are obviously different from TrueSkill, though we can infer much from what we know about TrueSkill, and what we know about the SC2 ladder.
For a layman’s primer on TrueSkill: http://research.microsoft.com/en-us/projects/trueskill/details.aspx
For an in-depth description of TrueSkill: http://research.microsoft.com/apps/pubs/default.aspx?id=67956

Matchmaking

The short version of what the links above show is that it is possible (and computationally efficient) to take the MMR and uncertainty factor (also known as sigma, or standard deviation) for both players. The MMR and sigma form a bell curve per player. It is possible to combine the bell curves into a 3D probability distribution. This is done by combining the data to form a shape like this:

[image loading]

It may help to think of it as combining the two 2D curves perpendicularly and forming this 3D shape. This shape is centered on a point in the (x,y) plane, where x represents player 1’s skill, and y represents the skill of player 2. Intuitively, the best matches will be between ratings where x=y. Thus, Blizzard attempts to keep it as close as possible. Looking at this same shape top-down (try to visualize it as a topographical map):

[image loading]

Run a line along x=y, and you will split the shape into 2 pieces. If you sum the volume under the shape on each side of this split, and compare their relative size you will get the probability of a player victory. If the curve is contained wholly within one side of the graph then clearly that player is overwhelmingly favored by the system (Note: this is NOT the same thing as the “Favored” display on the loading screen!). Also note that this does not need to be circular when looking at a top-down section. If players have different confidence values it will look like an ellipse.

Note that this figure is taken from a TrueSkill presentation, and is copyright Microsoft. TrueSkill incorporates the possibility of a draw. More intuitively, it can be thought of as the “matchmaking sweet spot”, and something similar is likely used by SC2’s ladder to provide the system some wiggle room in matchmaking.

After a match finishes, the system needs to update the MMR and sigma for both players. Displayed rating will be discussed later in this post. Whenever a match finishes the winner’s MMR increases and the loser’s decreases. More interesting is what happens to the sigmas. If the match finished as expected with the MMR favored player winning (and remember, the loading screen “favored” display is NOT this) then both players' sigmas will decrease. That is, the system gains confidence in the ratings it has assigned to the players. If the match finishes in an upset and both players' sigmas are small, then the sigmas for both players will increase as the system thinks it may have an incorrect rating assigned to both. The change in sigma scales based upon the difference in MMR and the difference in sigmas. That is, losing to someone close to your own rank will not change your sigma too much (though it will over the course of several games).

If a lower-MMR player wins then what happens depends a lot more on their precise equations they are using. If a player's sigma is large in an upset (whether he's the winner or loser) it can decrease. That is because, given the right MMR and sigma values, it's possible in theory for the system to learn about that player's skill and rate him more accurately. If a player's sigma is small, however, it can become larger after an upset if that upset was truly unexpected.

To summarize: combining the MMR and uncertainty factor of a player creates a curve. Take two of these curves and form a 3D shape. This shape shows the probability of victory when split along x=y. Matchmaking tries to have x=y, but will expand the search if no match is found quickly.

Promotion

As initially theorized, promotion requires your MMR to be above a certain league threshold. However, because MMR changes greatly after each match and the opponent variation is so wide, often spanning multiple leagues, the system requires a particular degree of confidence before it allows promotion. Our initial theory assumed that sigma just needed to be small enough to allow promotion, but it's been confirmed that sigma never gets this small. Instead, it does this by a moving average. Here's an example:

MMR is erratic. A moving average seeks to smooth out the rapidly changing data points over time by evaluating your progress over X number of games. As we previously estimated, the system doesn't use your full match history because if it did, you would eventually get stuck in a league. Once your moving average crosses a particular league threshold, that's when you'll get promoted.

Players like CauthonLuck and Ret who had obscene win ratios had their MMR data points skyrocket. However, the moving average lags behind. In the cases of those players, it will take much longer for the moving average to reach that required threshold. This is why players like IdrA who were affected by this problem have decided to intentionally throw games in order to get promoted, because it allows the moving average to catch up more quickly.

Possibly related is players that aren't getting promoted or demoted properly despite a high likelihood that their moving average would have crossed the confidence threshold. Blizzard has said that this is indeed a bug and will be fixed by moving the affected players to new divisions.

Displayed Rating

Ok, how does all of this tie into displayed rating and the whole “favored” deal? If you remember back to WoW, ratings changed based on a direct comparison of your displayed rating to the other team’s MMR. So if your current rating was 500 and you were playing people with MMRs of 2000, your rating would jump significantly after every win because of the wide disparity. Now, we’ve identified that on the loading screen quite often players are seeing the other person as favored and the opponent (who is nominally “favored”) also sees his opponent as favored! How can this be? The theory put forth here is the system is again comparing your displayed rating to your opponent’s hidden MMR.

The reason for this is so that the system brings you toward your MMR more quickly. kzn explains:

On August 08 2010 14:30 kzn wrote:
How it works was like this: Say you've got a MMR of 2500, and you start a new team. It starts at 0 rating, but the matchmaking system will match you with other players of MMR 2500. If you lose a game, your team rating would not change at all. If you won, it would increase by 47 (a hard cap that was in place at least when I played). This was not explained as arising due to an interaction between the team rating and the opponent's MMR, however - it was explained as the system trying to get your team's rating as close as possible to your team's MMR rapidly.

Therefore, a corollary here is that when determining rating increase, the hidden threshold value for your league is added to your displayed rating, then compared to your opponent’s MMR, for purposes of computing the gain/loss to your displayed rating.

Example: ExcaliburZ and I play a game. His MMR: 2600, sigma: 100, displayed rating: 300. My MMR: 2500, sigma: 50, rating: 150. Diamond’s MMR threshold: 2300. Excal wins because he rules. What happens?
- His MMR will increase
- My MMR will decrease
- Both of our sigmas will decrease
- His rating will increase. How? By comparing my MMR (2500) against his rating + diamond’s MMR threshold: 300 + 2300 = 2600, his gain is thus off 2600 vs my MMR of 2500
- My rating will decrease. In the same way: his MMR: 2600. My rating + threshold: 150 + 2300. Thus I lose points proportionally to 2450 vs 2600.

Conclusions

SC2 uses a Bayesian inference system for its skill determination which forms an MMR and a confidence value for each player. These form a Gaussian distribution useful in determining win probability. Promotions/demotions occur when a player exceeds/drops below a threshold with sufficient confidence. Displayed rating changes according to a combination of the rating itself combined with the hidden MMR and league thresholds.

More clarifications from Vanick:

On August 08 2010 11:33 vanick wrote:
To be clear, the player's skill is never pinpointed. The sigma is never 0. All players vary in their performance from game to game and over time as their skill increases (or decreases!).

I left a point out in my writeup that I probably should have included. TrueSkill, and likely SC2's ladder, have a factor based off the time since your last game that increases the player's uncertainty level (sigma) by an amount related to that. Even if you're playing games back to back this factor will have a minimum value that will still increase sigma. This allows the system to adapt to a player whose skill increases over time.

Questions

Some of these have answers. Some are open questions. You can add on; I will answer them as best I can.

Q: So how do bonus points affect the display rating changes? If the displayed rating change is based upon the comparison of the opponent's MMR with the player's displayed rating + the player's league cutoff, then wouldn't bonus points inflate the displayed rating and cause problems?
A: I'm not sure how they account for this. One possibility is they keep track of bonus points that make up your displayed rating, and ignore them when performing the calculation in the back-end.

Excal: It seems more likely that the bonus pool is only used to increase the displayed rating for division ranking purposes and ignored in back-end calculation because the bonus pool increases at the same rate for all players. This introduces a constant that is easily discarded when assessing actual skill within the system. Furthermore, if bonus points were considered in the process of point calculation, it would present an unfair advantage for players who have not yet used up their bonus pool (because their rating is therefore inflated giving them more to lose).

Q: Would it take longer to get promoted if you've played lots of games? Assuming someone played a large amount of games (say 100 with a 50% win/loss ratio). If he were to start winning 70% of his games, would it be harder for him to get promoted than someone with similar percentages but fewer games played?
A: It would take longer, yes. The moving average trails behind sharp increases in skill.

Antiquated or Incorrect Information for Archival Purposes
+ Show Spoiler +

[u]Promotion[u]

At this point we have established how matchmaking works, and how the skill belief system is updated (through MMR and sigma). How does promotion work? Our current theory is that it uses a checkpoint system, in which after a certain number of games the player’s MMR is checked and if it is above or below a certain threshold it will promote/demote that player. That may or may not exist still, based upon evidence people have provided in the release version of SC2. In any case, this section will attempt to describe the thresholds further.

In prior descriptions, it has been said that in order to be promoted, your MMR must be above the threshold for that league. That is still true. What we propose here is that your MMR must be above that threshold with 99% confidence. What does this mean? To determine this with 99% confidence, your MMR – (3 standard deviations (sigma)) must be above the threshold. For example:

Your MMR: 2500 Your sigma: 100
Diamond Threshold: 2300

Since (2500 – 3*100) = 2200, this value is less than the threshold even though your MMR is higher. In order to be promoted you would need to increase your MMR, decrease your sigma, or both. This notably creates the situation where you can be highly ranked in Platinum, but still be better than players in Diamond. In addition, if you are a borderline player who is promoted to Diamond, you would need to lose a lot of MMR to be demoted again. That is, if your MMR is 2450 and your sigma is 50 and you are in Diamond, you would need to drop to an MMR of 2150 (assuming sigma remains constant) before demotion would occur.

This above description is more theoretical than the matchmaking description, for which we have more direct evidence. However, this does go a certain length to describing some of the behavior seen with promotion/demotion, and given such a system the checkpoint review system as originally conceived may be incorrect.

Edited as the system has changed since this post
Q: So what’s the deal with people stuck in Platinum who can’t get promoted to Diamond despite clearly belonging there?
A: Short answer? It’s a bug. Longer answer: a lot of people have suggested that the system requires you to lose in order to build its confidence factor. This is almost certainly incorrect. The system in theory learns enough about you from your wins to promote you. Intuitively, if your record is 60-5 against diamond players, you ought to be in Diamond. The TrueSkill system can determine this, and I would be dollars to donuts that Blizzard’s system can too, as designed anyways. Implementation may have introduced bugs that certain players hit under certain conditions. We don’t have enough evidence to flat out state that the system requires you to lose. It may be a workaround to the bug, however.

One possible explanation is that the moving average lags so far behind that more games are required in order to cross the promotion threshold. It's also possible that the bug prevents the moving average from changing.

EDIT 10/25/2010: Made crucial updates to several sections in light of new information acquired from Blizzcon 2010.

EDIT 8/11/2010: Made an important clarification to the Matchmaking section.

EDIT 8/10/2010: Added a third question related to promotion opportunity.

EDIT 8/9/2010: Added extra information to the first question about the circumstances under which sigma may increase or decrease. Also removed a misleading sentence regarding ideal matches.

EDIT 8/7//2010: Modified the second question to make it less vague, and removed incorrect information from the Displayed Ratings section.

_________
Thanks to myself for proofreading, editing, and analytical input (hehhh self-credit).

Heyoka

Katowice25012 Posts

August 08 2010 00:07 GMT

This is so awesome, thanks for taking the time to put it together.

Surrealz

United States449 Posts

August 08 2010 00:13 GMT

epic applied mathematics, thanks for this.

Dionyseus

United States2068 Posts

August 08 2010 00:14 GMT

Interesting read, thanks.

Kollapse

United States125 Posts

August 08 2010 00:14 GMT

very interesting read. thanks for taking the time

NuKedUFirst

Canada3139 Posts

August 08 2010 00:17 GMT

Wow! Very interesting read, thanks for putting this together

vanick

United States53 Posts

August 08 2010 00:17 GMT

Just to be clear since I am afraid I was inconsistent in my naming: the "confidence" value is referring to the uncertainty factor (sigma). It is often easier to think of it in terms of confidence, even though what is stored and used for the distribution is the uncertainty. High confidence merely refers to low uncertainty while low confidence would refer to high uncertainty.

gerundium

Netherlands786 Posts

August 08 2010 01:05 GMT

On August 08 2010 08:59 Excalibur_Z wrote:

Q: So what’s the deal with people stuck in Platinum who can’t get promoted to Diamond despite clearly belonging there?
A: Short answer? It’s a bug. Longer answer: a lot of people have suggested that the system requires you to lose in order to build its confidence factor. This is almost certainly incorrect. The system in theory learns enough about you from your wins to promote you. Intuitively, if your record is 60-5 against diamond players, you ought to be in Diamond. The TrueSkill system can determine this, and I would be dollars to donuts that Blizzard’s system can too, as designed anyways. Implementation may have introduced bugs that certain players hit under certain conditions. We don’t have enough evidence to flat out state that the system requires you to lose. It may be a workaround to the bug, however.

_________
Thanks to myself for proofreading, editing, and analytical input (hehhh self-credit).

This happened in Halo 3 as well ( it uses a modified Trueskill system fit for 4v4 matches so not entirely the same.), you'd have to look up which Bungie weekly update it is discussed in. In general though it was a case of a few friends playing together and getting stuck due to the certainty factor i believe, they ended up in level 26 or so (ouf of 50) where they proceeded to go 46-0 in games or something retarded like that without ranking up.

Edit: very well done btw, i was reading a lot about trueskill when halo 3 came around and the ranking system was a hot topic. It really hit some points home for me, especially the 3d distribution is very enlightening.

jamesr12

United States1549 Posts

August 08 2010 01:07 GMT

Very nice right up, well done. Math major?

Integra

Sweden5626 Posts

August 08 2010 01:51 GMT

#10

Someone who actually know what he's talking about.Didn't think such people existed on TL

s.a.y

Croatia3840 Posts

August 08 2010 01:59 GMT

#11

Are you a rocket scientist?

Synwave

United States2803 Posts

August 08 2010 02:05 GMT

#12

Holy crap man, alot of work. I will need to reread this a few times. Awesome v2.0 explanation though!
I read the heck out of your first version btw.

vanick

United States53 Posts

August 08 2010 02:07 GMT

#13

On August 08 2010 10:07 jamesr12 wrote:
Very nice right up, well done. Math major?

Computer science.

And, gerundium, it's interesting to hear that about Halo 3. From my understanding even though the system receives less information from people (or groups of people) never losing it does get enough information to in theory promote them. TrueSkill has its own functionality that is supposed to allow it to rank individuals who play in random teams (as does SC2, see 2v2 random etc.). Perhaps there was a bug there?

theqat

United States2856 Posts

August 08 2010 02:09 GMT

#14

Cool thread! Thanks for the hard work.

virgozero

Canada412 Posts

August 08 2010 02:23 GMT

#15

More interesting is what happens to the sigmas. If the match finished as expected with the MMR favored player winning (and remember, the loading screen “favored” display is NOT this) then both players' sigmas will decrease. That is, the system gains confidence in the ratings it has assigned to the players. If the match finishes in an upset then the sigmas for both players will increase as the system thinks it may have an incorrect rating assigned to both. The change in sigma scales based upon the difference in MMR. That is, losing to someone close to your own rank will not change your sigma too much (though it will over the course of several games).

I think there is a huge problem with this system.
The system is basically setting every player as an average joe and have it continually play games and use its MMR&Sigma to determine its skill level.

However, it has already been said that in order for this to work, it requires the player to play a course a game. The system can assign a player as a GOLD level and then when it looses to a silver, the uncertainty increases. This will have to continue to happen until a consistency is reached. The problem lies in the fact that player GETS BETTER and over the course of the games necessary to pinpoint the players skill level. By the time the system can safely assume a players skill level, the player skill level has already changed.

Meaning the first 3 games used to pinpoint a players skill level is now negligible because that player is not longer the same player as he was 3 games ago.

Now all this is assuming the player is a fast learner. However the rate @ which players learn is completely random, so i wonder how they can utilize math to incorporate this into their system (which imo is impossible).

For this example I used 3 games but I am sure for the system to reach any sort of consistency it may take at least 20 games or so (which is possibly why most people get into diamond league in 20 games or so). And I don't know about you guys but my 21st game and my 3rd game of Sc2 are in fact different. After each game a person gets better, be it big or small. The difference is enough to change the W/L expected (considering the system puts you at some sort of equal setting)

vanick

United States53 Posts

August 08 2010 02:33 GMT

#16

To be clear, the player's skill is never pinpointed. The sigma is never 0. All players vary in their performance from game to game and over time as their skill increases (or decreases!).

I left a point out in my writeup that I probably should have included. TrueSkill, and likely SC2's ladder, have a factor based off the time since your last game that increases the player's uncertainty level (sigma) by an amount related to that. Even if you're playing games back to back this factor will have a minimum value that will still increase sigma. This allows the system to adapt to a player whose skill increases over time.

Excalibur_Z

United States12235 Posts

August 08 2010 02:42 GMT

#17

Updated the original post with that.

sYz-Adrenaline

United States1850 Posts

August 08 2010 02:52 GMT

#18

my brain hurts

virgozero

Canada412 Posts

August 08 2010 02:56 GMT

#19

On August 08 2010 11:33 vanick wrote:

I left a point out in my writeup that I probably should have included. TrueSkill, and likely SC2's ladder, have a factor based off the time since your last game that increases the player's uncertainty level (sigma) by an amount related to that.

yes but thats also very icky because we have no idea how accurate that is and how that differentiates from person to person. I am assuming it is a constant which would assume all players learn @ the same rate which they don't. Sure you can get a general consensus that in 1 week time a player should be X better and therefor we would adjust our system in accordance with X by mutliplying certain varaibles by Y or w/e but it still won't be accurate or anything near accurate.

Even if you're playing games back to back this factor will have a minimum value that will still increase sigma. This allows the system to adapt to a player whose skill increases over time.

Again I dont quite understand how this can be accurate though, this minimum value? Can you explain a lil more.

Rinrun

Canada3509 Posts

August 08 2010 03:08 GMT

#20

My goodness this was an intriguing post, due to the fact that I actually understand the stuff going on! Great write up, great read.

1 2 3 4 5 19 20 21 Next All

Please or register to reply.

SC2 Ladder Analysis: Part 2

Completed

Ongoing

Upcoming