|
On July 11 2012 16:42 lazyitachi wrote:Show nested quote +On July 11 2012 16:23 Cascade wrote:On July 11 2012 16:20 lazyitachi wrote: If you show me that each person has less than 100 - 200 MMR movement then I would be more assured but seems that the calculation of the MMR is so suspect to generate movement of MMR up to 1000+++. Lol.. Math. GIGO
the standard deviation is over the distribution of all players, not for a single player. each sample is one player. So if I take 1000000000 faulty data then my data is now correct? Logic?
Warning: I speak physics language when it comes to these matters, so it may take some translation if the terminology among statisticians differs.
Assuming by "faulty" you mean "high uncertainty," then yes, many more samples will give you a much more accurate estimate of the value of the mean. For a normal distribution (maybe for all distributions? I don't know), the uncertainty scales with 1/sqrt(n), so if your one-standard-deviation uncertainty in a single MMR measurement were 1000, and you averaged 10000 of them, you'd get a one-standard-deviation uncertainty of 10 in the value of your average.
I gather, though, based on your prior posts, that you're probably familiar with this, and that by "faulty" you instead mean some kind of systematic bias. To call out that his data includes MMR values tracking up and down over a wide range doesn't suggest systematic bias, though. That's just what you'd expect with a high random variance in an individual's result, which is typical of this kind of system and doesn't invalidate the measurement. (If it did, this type of system wouldn't be used so widely for rating players.)
What he's done is start with the assumption that there is no systematic bias in his data collection that would cause one race to tend to be better than another across the board as a result of how he's collected the data. I think that's reasonable -- even though his players are self-selected, there's nothing to suggest that one race would self-select more or differently than another, let alone in a way that varies across skill level. Then, he's calculated the likelihood that the differences in MMR distribution he's seeing could occur randomly.
If you're going to argue that there is systematic bias in his data collection that means his data collection prefers picking lower-skilled Terrans and higher-skilled Protoss, fair enough, but you haven't yet described how you think that might be the case.
So, since there's a low chance that the variation in race's average scores could occur randomly, that result is either (a) a result of some kind of race-specific systematic bias in his data collection, the nature of which has yet to be described, or (b) due to an outside factor.
I don't see where there could be race-specific bias in his data collection, unless one's going to make an argument about personalities of different race's players, their need for personal validation, and their likelihood to install his software. That'll be a fun argument, so I look forward to it, but let's put that aside for now.
Outside factors could include all kinds of things. I, personally, think that even given the data set's strong emphasis on higher leagues, there could be some residual impact to lower league players preferring Terran. The average difference in race's MMR in his analysis is not large at all, a few games' difference, and if there's a huge preference toward Terran among bronze and silver players, that might weight Terran ever so slightly toward the lower end even among a higher-league population, since some of those new players will no doubt improve quickly, wind up Diamond TL readers, and install his software.
Then there are all the potential game design reasons. In a broad sense, I'd suggest that Protoss as a race seems slightly simpler to play overall than Zerg or Terran, and Zerg's in the middle, with Terran most complex, so that might be all of it right there. (Note that I am NOT saying that these are large differences, and for the kind of numbers the OP is seeing, I don't think they'd have to be.) That kind of general observation seems to me far more likely to explain population-wide differences than fiddly meta-game arguments or this or that unit having been buffed or nerfed, since those kinds of things are most likely to have a big impact at the high end.
Anyway, I think the OP's clear that he's not making an observation about racial balance in the sense most people on here mean it, which is that two players with equal secondary indicators of skill (things like reaction time, click accuracy, APM, etc) would wind up at different MMRs. All he's saying is the distributions differ slightly. No amount of staring at his data, by itself, will show where this difference comes from, we can only guess at it.
Finally, I have to disagree with this guy:
Then Dustin B. just says in an interview that everything in every ladder and server is 50-50 and in winrates in every matchup early game late game what ever still 50-50. Then he says they are monitoring a situation where last month there was a 0,5% imbalance. And everything without zero facts.
I've followed a lot of Blizzard's commentary about racial balance and they've never said anything like that. What they have said is:
1) They have a way to estimate win rates that factors out skill-related biases. The Blizzard matchmaking designer talks a little bit about how this works (in general terms) in his Q&A session after his presentation at UCI that Excalibur_Z linked here:
http://www.teamliquid.net/forum/viewmessage.php?topic_id=195273¤tpage=62#1228
It's worth watching if someone has interest in the statistical techniques Blizzard uses internally.
2) They've said that they don't consider variances of up to 5% either direction in win-loss rates a big deal from a balance standpoint, because the numbers vary by that much from month to month on their own.
3) Just because Blizzard doesn't share their entire data set with you doesn't mean that they don't have knowledgeable statisticians (like the designer in that video Excalibur_Z linked) doing the work.
|
Many people forget that the people who give the interviews are not the same who actually do the stuff. You should never take what dustin say 1-1. He just repeat the parts he can remember or think he remembers from last meeting.
So, since there's a low chance that the variation in race's average scores could occur randomly, that result is either (a) a result of some kind of race-specific systematic bias in his data collection, the nature of which has yet to be described, or (b) due to an outside factor.
I don't see where there could be race-specific bias in his data collection, unless one's going to make an argument about personalities of different race's players, their need for personal validation, and their likelihood to install his software. That'll be a fun argument, so I look forward to it, but let's put that aside for now. This is the point i try to make all the time. And if you check the post here the main point people dont understand and i run out of ideas how to explain it different.
|
The samples have huge differentials in their MMR movement. Why is that so? Shouldn't it be stable if the method is calculating your true MMR? It's obvious then the comparison does not take into account the fact that the MMR is not the true MMR because it is still moving.
Why then take the average of the MMR? Should it not be the true representative MMR i.e. the latest MMR and not the average over the 100000 games submitted given the fact that the MMR calculation itself is suspect to have such big differences in MMR?
If the MMR calculation itself is suspect then how can I even be sure that for those small sampling per person (i.e. 0 MMR movement most likely) which makes up a large portion of the data is even calculating the correct MMR. Those differences itself can contribute way more than the 50 - 60 points of MMR thus invalidating this exercise as a whole.
Interesting look though but I feel it only measures if MMR for certain race is higher which tells nothing of balance. More interesting obs is people who don't even understand the scope just agree or disagree based on their own bias. Typical in TL anyways.
|
This data is completely useless to say if a race is better than other for so many obvious reasons. Just to name a few:
- All races have certain "potential", that in some certain patch is constant. The variable is the human skill. So to find out the race potential, you should minimize the errors coming from the human factor, so if you want to find out what race may be better, you should check the average top players. This is pretty obvious, but for some reason, lower league players do not undertand that the difference between top players and low leaguers is exponential.
- Better players uses ladder in different ways, almost always not playing at their best (like in a tourney), some day they may train an unpolished new build, some days they test some weak points in their game, and so. Ask any top player. They never play their best in ladder, ladder is to test, train and improve for real games (tourneys)
- A lot of top level active players use ladder in about 50% of their games (aka a lot of custom train with partners). So for this players (as i said, gm/top masters, aka the relevant players for balance) their mmr is almost always not updated.
Do this same statistics but with every ro32 of top tournaments and there you will have some significant data for balance.
|
Edit: It's a mistake to call the OP's measure MMR, since it's not being used for matchmaking. It's a skill measure that's meant to work similarly to Blizzard's hidden, actual MMR based on what they've said about how that works.
On July 11 2012 18:09 lazyitachi wrote: The samples have huge differentials in their MMR movement. Why is that so? Shouldn't it be stable if the method is calculating your true MMR? It's obvious then the comparison does not take into account the fact that the MMR is not the true MMR because it is still moving.
Elo-like systems such as the OP's or Blizzard's MMR system use a Bayesian model to estimate skill based on the difference between predicted likelihood of a win/loss event before a game and its result. This leads to a few reasons that some players will have much more stable MMRs than others:
1) New players tend to have MMRs that move rapidly. This means their MMRs are very uncertain, but that's also accounted for in the system by taking note of their having played few games.
2) Some players play risky strategies that are more susceptible to minor differences in their opponents' play, so their results are less predictable from game to game, leading to a higher-uncertainty MMR.
3) In the most broken cases, you may have multiple players of different skill levels playing the same account, causing a permanently high MMR uncertainty.
Why then take the average of the MMR? Should it not be the true representative MMR i.e. the latest MMR and not the average over the 100000 games submitted given the fact that the MMR calculation itself is suspect to have such big differences in MMR?
It's definitely not valid to average MMR over multiple of one player's games. Only the latest MMR matters as that's a cumulative estimate based on their entire history. However, that's not what the OP is doing -- he's measuring an average across accounts, not across games.
Edit: I misspoke about this, and I think the OP should have a look at changing how he's looking at multiple data points for each player. See below in the thread.)
If the MMR calculation itself is suspect
It's not -- a Bayesian estimate of skill is going to be well-behaved as long as it's applied to a normal distribution of skills.
then how can I even be sure that for those small sampling per person (i.e. 0 MMR movement most likely) which makes up a large portion of the data is even calculating the correct MMR.
Remember that he's not sampling a particular player's MMR more than once, since MMR is a cumulative statistical estimate already. If a large majority of the players in the data set had only played one or two games, though, that might be a valid concern because Elo-like MMRs need several games to stabilize.
Edit: I misspoke about this too.
Interesting look though but I feel it only measures if MMR for certain race is higher which tells nothing of balance. More interesting obs is people who don't even understand the scope just agree or disagree based on their own bias. Typical in TL anyways.
You're right that using only one skill measure it's not possible to distinguish between race differences that come from balance and race differences that come from other causes (such as, say, too many low-level Terrans due to the campaign's emphasis on that race.)
Blizzard's approach to this (in general terms) is to look at race differences using multiple skill measures, not just MMR. The video in Excalibur_Z's post that I linked above talks about this a bit.
|
On July 11 2012 18:13 Belha wrote: This data is completely useless to say if a race is better than other for so many obvious reasons.
Not exactly. What the data says is that the population playing the races have different likelihoods to win. The OP makes no statement about why this is the case.
- All races have certain "potential", that in some certain patch is constant. The variable is the human skill. So to find out the race potential, you should minimize the errors coming from the human factor, so if you want to find out what race may be better, you should check the average top players. This is pretty obvious, but for some reason, lower league players do not undertand that the difference between top players and low leaguers is exponential.
While I agree it's interesting to ask whether the distributions are different among top players, I don't agree that this tells you anything about each race's "potential." Top players are as susceptible as anyone to fads in race selection or strategy choices that have nothing to do with game design. (Note that following fads isn't necessarily a game-losing strategy, but it might be for a time because the weight of peer reinforcement can be pretty strong even in the face of evidence that what everyone is doing is bad.) Note also that I'm not saying that game choices driven by fads are bad play -- in fact they can be self-reinforcing and nevertheless yield stronger performance because pros are highly aware of what other pros are doing and make decisions about their own play based on that.
- Better players uses ladder in different ways, almost always not playing at their best (like in a tourney), some day they may train an unpolished new build, some days they test some weak points in their game, and so. Ask any top player. They never play their best in ladder, ladder is to test, train and improve for real games (tourneys)
- A lot of top level active players use ladder in about 50% of their games (aka a lot of custom train with partners). So for this players (as i said, gm/top masters, aka the relevant players for balance) their mmr is almost always not updated.
These would be issues for a study that focused on the ladder distribution of very top players. This one does not.
Do this same statistics but with every ro32 of top tournaments and there you will have some significant data for balance.
This has been done to death, and posted elsewhere on TL. However, the data sets are so small that the results are all over the place from month to month and don't say much about either the game or the players.
|
Why then take the average of the MMR? Should it not be the true representative MMR i.e. the latest MMR and not the average over the 100000 games submitted given the fact that the MMR calculation itself is suspect to have such big differences in MMR?
It's definitely not valid to average MMR over multiple of one player's games. Only the latest MMR matters as that's a cumulative estimate based on their entire history. However, that's not what the OP is doing -- he's measuring an average across accounts, not across games.
i do. For the users i use more than one game not the latest. thats like 5-10% of the data Whats wrong with it? The last value would do it to and would be more actual.
PS: i admire your patience. I lost my on site 2....
|
Look at his data file. It says AVG mmr, Min MMR, Max MMR. Each row is one player. There is no deviation for each row if there is only one single game. It is also not possible to have 0 MMR differential for a standard error estimation (I doubt so many people submit gazillion games).
Unless it is not what it says it is, it seems he is taking the average across multiple games and also showing that the MMR calculation is highly unstable thus any player with small sample will have inaccurate MMR (as shown by the high deviation of MMR for a single person).
Please correct me if the data means something else. The header cannot be so badly mislabelled???
|
Lysenko, the uncertainty value associated with SC2 accounts behaves much simpler than you may think. It likely starts high and drops down to a minimum which it reaches after a certain amount of games, or once dipping down to it initially.
Have a look at this graph:
![[image loading]](http://s14.postimage.org/8w8e4juct/astraflame.jpg) (red line his MMR estimate, which raises by average of 16 per win and drop by 16 per loss. Bars are MMR of his opponents and their league)
This is a guy who leveled a low Bronze account to mid Master. He went on something like 138-11 wins-losses streak. His uncertainty value didn't change from the minimum value despite the massive shift in skill. This means that the MMR number alone gives you all the data you need (and that exists) about a player (with the exception of new players). There's also a weighted moving average of his MMR over the last X number of games, but that's only used for league placement, and since leagues are entirely and utterly meaningless, we can safely ignore it.
|
On July 11 2012 18:35 lazyitachi wrote: Look at his data file. It says AVG mmr, Min MMR, Max MMR. Each row is one player. There is no deviation for each row if there is only one single game. It is also not possible to have 0 MMR differential for a standard error estimation (I doubt so many people submit gazillion games).
Unless it is not what it says it is, it seems he is taking the average across multiple games and also showing that the MMR calculation is highly unstable thus any player with small sample will have inaccurate MMR (as shown by the high deviation of MMR for a single person).
Please correct me if the data means something else. The header cannot be so badly mislabelled??? The derivation of the mmr does not care at all for this calculation. How is the derivation of the mmr a race depending factor? Its 100% independent to everything else. So it equals out.
Even if its depended, what its not, its 100% not race depending. I feel like i explain the same fact over and over again. If you want to say my data is wrong you have to bring a RACE DEPENDING mistake, Any race independent mistake does not care at all! I can add random numbers to every mmr point and would come to the same result!
|
On July 11 2012 18:35 lazyitachi wrote: Look at his data file. It says AVG mmr, Min MMR, Max MMR. Each row is one player. There is no deviation for each row if there is only one single game. It is also not possible to have 0 MMR differential for a standard error estimation (I doubt so many people submit gazillion games).
Unless it is not what it says it is, it seems he is taking the average across multiple games and also showing that the MMR calculation is highly unstable thus any player with small sample will have inaccurate MMR (as shown by the high deviation of MMR for a single person).
Please correct me if the data means something else. The header cannot be so badly mislabelled???
No, you're right, and I misspoke. (I edited my posts to reflect this btw.)
Generally these systems provide for a measurement of both value and uncertainty for the MMR value. How that uncertainty value is calculated is beyond my knowledge of these kinds of systems. The short version is that I don't know whether looking at a standard deviation for variation of MMR over time is a valid estimate of that uncertainty. I'm inclined to say it's not that simple because a series of MMR values over time are not independent measurements, but I don't know how close it comes.
Because they aren't independent measurements, though, I'd say using only the latest MMR is the right way to treat the OP's data set. (As a simple case, if you include all the games for player X as their MMR travels from 0 to wherever they stabilize, if you average all of them you'll get half the player's actual skill level, which is not a correct result.)
(Edit: Apologies to any statisticians who are reading, who will already know this, but normal distributions and the use of standard deviation to measure the likelihood of a measurement taking place all assume that each measurement is entirely independent of each other. A series of MMR measurements for one player are never independent, because if my current value is 1500 and I play a game, my new value is guaranteed to be close to 1500 regardless of my result. It's not going to be 250 or 750 or 3000.)
|
On July 11 2012 18:39 Not_That wrote: This is a guy who leveled a low Bronze account to mid Master. He went on something like 138-11 wins-losses streak. His uncertainty value didn't change from the minimum value despite the massive shift in skill. This means that the MMR number alone gives you all the data you need (and that exists) about a player (with the exception of new players). There's also a weighted moving average of his MMR over the last X number of games, but that's only used for league placement, and since leagues are entirely and utterly meaningless, we can safely ignore it.
Thanks for the graph. The cases I was talking about were looking at differences in MMR uncertainties for people with stable MMRs, though. Because a Bayesian system like this uses the error measure to push one's MMR around, the uncertainty should be stable (and high) until a player's MMR curve flattens out at their actual skill, which seems to not happen in the range of your graph.
|
On July 11 2012 16:23 graNite wrote: Good job, nice statistics.
Is it possible to determine whether a matchup is random or not (espacially mirrors) by looking at winrates by MMR? What I mean: in ZVZ, how often does a player with (lets say 200 points) smaller MMR win ? Would that be a good way to detect randomness?
It's theoretically possible. I think this is one correct way of doing it. One way could be to collect data about the proportion of players who beat an opponent 200 MMR higher than them in a mirror match. You would compare that to the proportion of players who beat an opponent 200 MMR higher than them in a all matchups. This is your estimate for, on average, how often you would expect someone of 200 MMR lower to beat that opponent. You don't necessarily need to use the same sample size for both distributions, but assume that you do. You would then do a proportion test to compare the proportions from both samples with a null hypothesis of: is the proportion of players who beat an opponent 200 MMR higher than them in a mirror match different than those for all matchups. If the mirror matchup has a statistically significantly closer to .5 from either direction of people who beat an opponent 200 MMR higher than them than for the non-mirror matchup then you know that the match up is more "random". This is because as the proportion deviates away from 0.5 in either direction, standard deviation goes down and "randomness" goes down as well.
TLDR: 1. Make a sample and obtain a proportion of people of 200 MMR less than their opponent for both all and mirror 2. Do a proportion test to compare two different proportions 3. Because of the standard deviation formula for a single proportion (assuming that you used the same sample size), if the mirror matchup is statistically significantly closer to 0.5 from either direction when compared to all matchups you know that standard deviation is larger and that the matchup is more "random" than all other matchups
|
On July 11 2012 18:35 lazyitachi wrote: Look at his data file. It says AVG mmr, Min MMR, Max MMR. Each row is one player. There is no deviation for each row if there is only one single game. It is also not possible to have 0 MMR differential for a standard error estimation (I doubt so many people submit gazillion games).
Unless it is not what it says it is, it seems he is taking the average across multiple games and also showing that the MMR calculation is highly unstable thus any player with small sample will have inaccurate MMR (as shown by the high deviation of MMR for a single person).
Please correct me if the data means something else. The header cannot be so badly mislabelled???
MMR is not so unstable. On average it changes at 16 per game compared to ~12 for adjusted ladder points. Here's my MMR graph:
![[image loading]](http://s12.postimage.org/ka7706v9l/image.jpg)
That's reasonably stable considering the win and loss streaks I had.
A case can be made why the last MMR value should be used for the purpose of this thread, but in general if you want to describe a player's skill, for players who have been playing for a while, the average of their MMR is probably more accurate description of their skill than the MMR value they happen to sit at in any given moment.
|
On July 11 2012 18:49 Lysenko wrote:Show nested quote +On July 11 2012 18:39 Not_That wrote: This is a guy who leveled a low Bronze account to mid Master. He went on something like 138-11 wins-losses streak. His uncertainty value didn't change from the minimum value despite the massive shift in skill. This means that the MMR number alone gives you all the data you need (and that exists) about a player (with the exception of new players). There's also a weighted moving average of his MMR over the last X number of games, but that's only used for league placement, and since leagues are entirely and utterly meaningless, we can safely ignore it. Thanks for the graph. The cases I was talking about were looking at differences in MMR uncertainties for people with stable MMRs, though. Because a Bayesian system like this uses the error measure to push one's MMR around, the uncertainty should be stable (and high) until a player's MMR curve flattens out at their actual skill, which seems to not happen in the range of your graph.
But it's not high. I'm telling you his uncertainty value throughout the games is the minimum value possible for a player.
|
On July 11 2012 18:52 Not_That wrote: A case can be made why the last MMR value should be used for the purpose of this thread, but in general if you want to describe a player's skill, for players who have been playing for a while, the average of their MMR is probably more accurate description of their skill than the MMR value they happen to sit at in any given moment.
Using mean and standard deviation to characterize a measure like MMR where each measurement is strictly dependent on the previous one is a fundamental statistical error. It might be close to valid but my guess is that it's absolutely enough to screw up measures of the likelihood of random results matching one's data set.
The reason is that independent measurements generally follow normal, or Gaussian, distributions, which peak at the mean and have well-defined behavior away from the mean based on standard deviation, while measurements that depend on previous ones follow other asymmetric distributions like Poisson that require different measures for uncertainty and for which (critically) the mean doesn't define the peak.
|
On July 11 2012 18:52 Not_That wrote:Show nested quote +On July 11 2012 18:35 lazyitachi wrote: Look at his data file. It says AVG mmr, Min MMR, Max MMR. Each row is one player. There is no deviation for each row if there is only one single game. It is also not possible to have 0 MMR differential for a standard error estimation (I doubt so many people submit gazillion games).
Unless it is not what it says it is, it seems he is taking the average across multiple games and also showing that the MMR calculation is highly unstable thus any player with small sample will have inaccurate MMR (as shown by the high deviation of MMR for a single person).
Please correct me if the data means something else. The header cannot be so badly mislabelled??? MMR is not so unstable. On average it changes at 16 per game compared to ~12 for adjusted ladder points. Here's my MMR graph: ![[image loading]](http://s12.postimage.org/ka7706v9l/image.jpg) That's reasonably stable considering the win and loss streaks I had. A case can be made why the last MMR value should be used for the purpose of this thread, but in general if you want to describe a player's skill, for players who have been playing for a while, the average of their MMR is probably more accurate description of their skill than the MMR value they happen to sit at in any given moment.
I don't think it's really going to matter as long as the sample size is large enough.
|
On July 11 2012 18:55 Not_That wrote: But it's not high. I'm telling you his uncertainty value throughout the games is the minimum value possible for a player.
How are you calculating the uncertainty?
Edit: I hope to God that I am not going to have to learn all the math related to uncertainty measures in Elo-like systems to resolve this discussion.... lol
|
On July 11 2012 19:01 Lysenko wrote:Show nested quote +On July 11 2012 18:55 Not_That wrote: But it's not high. I'm telling you his uncertainty value throughout the games is the minimum value possible for a player. How are you calculating the uncertainty? When we searched for it in the data we found out: its just not there. After 40+ games it must be so close to 0 (or better 1) that we can not detect it.
There is no diffrence between a player that have 40 games played and loose 200 in a row and a guy who win loose trade every game. Both get/loose same mmr compare to the mmr diffrence.
For a new player its there but we dont have much data over total new accounts. So for the time being we can just ignore it.
I still dont understand why. - uncertainty lowers with every game and reach neutral factor fast - is way to simple to be helpful. But it makes our mmr calculation a lot easer.
|
On July 11 2012 19:02 skeldark wrote: There is no diffrence between a player that have 40 games played and loose 200 in a row and a guy who win loose trade every game. Both get/loose same mmr compare to the mmr diffrence.
That's not what uncertainty in a player's MMR is. Elo-like systems always deduct the same number of points from the loser that they award to the winner, because if they don't do this then there's score inflation or deflation. (Edit: Some systems like this may fiddle with these numbers in early games to try to get to a stable result faster, and if that's what yours is doing, that would explain why it looks like something is "converging to 0" over 40 or so games.)
Edit: Uncertainty in a player's skill score is a measure of how accurately the Elo-like system is predicting the results of a game. If, over a series of 3 games, the system predicts a 50% win rate and a player wins no games, that places a lower bound on the uncertainty of the player's score.
In Not_That's plot, that player's uncertainty is very high, because the MMR is greatly underestimating chances of a win. (In Blizzard's system, it will usually go for an opponent whose predicted win/loss is close to 50%, and the actual numbers through the whole run were 90%+ wins.)
That high uncertainty means that either the player's MMR is much different from their actual MMR (which is the case there) or they have been extremely lucky.
Edit 2: These types of systems don't adjust how many points they do or don't award based on the players' uncertainties. However, players with higher uncertainties might get matched against a wider range of opponents, so that they're more likely to get even games now and then, or to get them to their optimal score faster. Some of the strategies for this in Blizzard's system are discussed in that UCI talk Excalibur_Z linked in the thread I linked above.
|
|
|
|