Ladder-Balance-Data - Page 12

skeldark

Germany2223 Posts

July 11 2012 10:16 GMT

#221

On July 11 2012 19:07 Lysenko wrote:

That's not what uncertainty in a player's MMR is. Elo-like systems always deduct the same number of points from the loser that they award to the winner, because if they don't do this then there's score inflation or deflation.

Edit: Uncertainty in a player's skill score is a measure of how accurately the Elo-like system is predicting the results of a game. If, over a series of 3 games, the system predicts a 50% win rate and a player wins no games, that places a lower bound on the uncertainty of the player's score.

In Not_That's plot, that player's uncertainty is very high, because the MMR is greatly underestimating chances of a win. (In Blizzard's system, it will usually go for an opponent whose predicted win/loss is close to 50%, and the actual numbers through the whole run were 90%+ wins.)

That high uncertainty means that either the player's MMR is much different from their actual MMR (which is the case there) or they have been extremely lucky.

You missunderstood me.

When we talk about uncertainty we talk about the counter messures of the system against such a case. ( We back-engineered so long that we see the things from the other site ^^ )

And there are no! The player drops his mmr by loosing, than he fight up again and the system still think he would win 50%.
Obvious he will win way more. So a system should have a high uncertainty value for such an player to correct this cases.
Blizzard system don't have it. Thats why we see the long way to fight up again.

If it have such an value for "old" players we would notice by now.
I wanted to say: he win / loose as much points as we expect him to do . So there is no other function ( the uncertainty correction)
Thats what we mean with : We can ignore it = There is nothing to back-engineer

NoobCrunch

79 Posts

July 11 2012 10:22 GMT

#222

On July 11 2012 18:58 Lysenko wrote:

Show nested quote +

Using mean and standard deviation to characterize a measure like MMR where each measurement is strictly dependent on the previous one is a fundamental statistical error. It might be close to valid but my guess is that it's absolutely enough to screw up measures of the likelihood of random results matching one's data set.

The reason is that independent measurements generally follow normal, or Gaussian, distributions, which peak at the mean and have well-defined behavior away from the mean based on standard deviation, while measurements that depend on previous ones follow other asymmetric distributions like Poisson that require different measures for uncertainty and for which (critically) the mean doesn't define the peak.

I'm pretty sure it doesn't really matter if MMR is strictly dependent on previous MMR value. It's the same thing as height or age or something like that. Height is dependent on a previous height value (like you're 5' when you're 10 and you're 5'1" when you're 11). It doesn't really matter it's just data.

I'm not even sure how to begin to frame this problem in a binomial or poisson setting.

Lysenko

Iceland2128 Posts

July 11 2012 10:24 GMT

#223

On July 11 2012 19:16 skeldark wrote:
Blizzard system dont have it. Thats why we see the long way to fight up again.

Blizzard's system does maintain such measures internally, but it's tuned to be a lot less than optimal in stabilizing a player's scores because the optimal tuning results in lots of crushing losses early for new players, and they're interested in giving new players a more fun experience. This is all described in that UCI presentation.

Regardless -- averaging a series of a single player's MMRs is always a mistake because the distribution won't be centered on the mean for that player. Unless you go back and correct that in your analysis, it's really not possible to draw any conclusions from your numbers.

skeldark

Germany2223 Posts

July 11 2012 10:29 GMT

#224

On July 11 2012 19:24 Lysenko wrote:

Show nested quote +

I understand your point but it dont affect the result at all because its an race independent value that is equal for all races.
I will remove it any way because it cost a lot of calculation time and its better to have actual data.

Lysenko

Iceland2128 Posts

July 11 2012 10:37 GMT

#225

On July 11 2012 19:22 NoobCrunch wrote:
I'm pretty sure it doesn't really matter if MMR is strictly dependent on previous MMR value. It's the same thing as height or age or something like that. Height is dependent on a previous height value (like you're 5' when you're 10 and you're 5'1" when you're 11). It doesn't really matter it's just data.

One individual's height doesn't follow a normal distribution either!

It's perfectly valid to compare multiple individual's heights as independent measurements, but you can't take (for example) all the lifetime measures of a person's height, average them together, and then say that's a more accurate measure of their height than the latest single measurement. You can certainly get away with measuring a person's height five times in five minutes and averaging those, because the systematic error (due to growth or shrinkage) is likely to be very small in those five minutes.

However, in the case of Elo-like skill numbers, if you average, say, the last 50 values for a player, the differences between those 50 values are NOT random because they contain cumulative changes in skill over those 50 games. You have to take the latest measurement.

Anyway, I'd note a few things at this point. You don't have to figure all of this out for yourself. First is that according to the Wikipedia page on Elo (references are included there of course) the distribution of strict Elo ratings follows a logistical rather than a normal distribution across multiple players.

http://en.wikipedia.org/wiki/Elo_rating_system

Second is that the more advanced Glicko system, which is closer in design to the Starcraft 2 MMR, does explicitly define an uncertainty measure based on accuracy of its matchmaking predictions. This might be useful to at least see how the people doing this for a living think about these issues:

http://en.wikipedia.org/wiki/Glicko_rating_system

Finally, I believe that Starcraft's system is closer still to Microsoft's TrueSkill, though there are some differences. All the math is here, but I haven't read it closely enough to understand it:

http://research.microsoft.com/en-us/projects/trueskill/details.aspx

Lysenko

Iceland2128 Posts

July 11 2012 10:41 GMT

#226

On July 11 2012 19:29 skeldark wrote:
I understand your point but it dont affect the result at all because its an race independent value that is equal for all races.
I will remove it any way because it cost a lot of calculation time and its better to have actual data.

The problem is that the impact of such a mistake is cumulative depending on the specifics of your data set. So, if you have more players in a race or you have more games per player for certain races, you'll get more or less inaccurate results based on that. If you're not using statistical methods that converge to accurate when you add more data, you are certain to get large systematic errors compared to what you'd expect from a random dataset.

Mendelfist

Sweden356 Posts

July 11 2012 10:41 GMT

#227

On July 11 2012 19:07 Lysenko wrote:
Edit 2: These types of systems don't adjust how many points they do or don't award based on the players' uncertainties.

Really? In this description of trueskill, it seems to me that they are doing exactly that.
http://research.microsoft.com/en-us/projects/trueskill/details.aspx

skeldark

Germany2223 Posts

July 11 2012 10:42 GMT

#228

With last mmr value instead of average and the new accounts that got uploaded.

Maxerror : 35.417392849208
ERRORCOUNT : 44.294444444444444% in 5 76.26333333333334% in 10 92.32222222222222% in 15 98.15333333333334% in 20 99.71111111111111% in 25 99.96555555555555% in 30
Race...
T: -26.988169227922526 P 23.766341174516356 Z -1.3573738479972235
Analyse DONE

not much change.

@Lysenko btw you find us often on TL teamspeak.
Easier to discuss in voice than in text.

Lysenko

Iceland2128 Posts

July 11 2012 10:44 GMT

#229

On July 11 2012 19:41 Mendelfist wrote:
Really? In this description of trueskill, it seems to me that they are doing exactly that.
http://research.microsoft.com/en-us/projects/trueskill/details.aspx

I was speaking of simpler systems like Elo. TrueSkill is a hell of a lot more complicated and they may well do that as a means of trying to get the number of games to an accurate estimate down (which was a primary design goal) but they have to be careful to do that in a way that ensures no inflation or deflation.

Lysenko

Iceland2128 Posts

July 11 2012 10:45 GMT

#230

On July 11 2012 19:42 skeldark wrote:
@Lysenko btw you find us often on TL teamspeak.
Easier to discuss in voice than in text.

Thanks!! I may join you sometime, but I find it worthwhile to discuss here because having to go back and correct any earlier misstatements keeps me honest. :D

skeldark

Germany2223 Posts

July 11 2012 10:47 GMT

#231

On July 11 2012 19:45 Lysenko wrote:

Show nested quote +

Thanks!! I may join you sometime, but I find it worthwhile to discuss here because having to go back and correct any earlier misstatements keeps me honest. :D

We talk about statistic. There is no room for being honest here

check edit with new data above.

NoobCrunch

79 Posts

July 11 2012 11:22 GMT

#232

[QUOTE]On July 11 2012 19:37 Lysenko wrote:
[QUOTE]On July 11 2012 19:22 NoobCrunch wrote:

It's perfectly valid to compare multiple individual's heights as independent measurements, but you can't take (for example) all the lifetime measures of a person's height, average them together, and then say that's a more accurate measure of their height than the latest single measurement. You can certainly get away with measuring a person's height five times in five minutes and averaging those, because the systematic error (due to growth or shrinkage) is likely to be very small in those five minutes.

[/QUOTE]

I'm pretty sure it doesn't matter if Lysensko only calculated the latest part of someone's mmr rating. If you think of someone's mmr oscillating like a sin wave around some average point, then (for a large sample) when you happen to pick someone at the height of that wave you will also have picked someone that is at the bottom of that wave. In the long run for a large sample, the net effect will be zero. In the case of someone rapidly climbing or declining on the ladder there will be obviously some error due to not using the average. However, there's no reason to believe that this happens more often for zerg, protoss, or terran so the comparative impact is unclear.

[url=http://i50.tinypic.com/213m4pl.jpg]http://i50.tinypic.com/213m4pl.jpg[/url] - that's the distribution

I can kind of see the logarithmic dip at the end but I would say that using a normal probability model is perfectly fine.

paralleluniverse

4065 Posts

July 11 2012 11:31 GMT

#233

On July 11 2012 19:44 Lysenko wrote:

Show nested quote +

I think you're talking out of your ass. There is no inflation or deflation in TrueSkill. Why would there be?

Also, the above poster talking about independence implies mean = peak and dependance implies mean != peak is also talking out of his ass.

Independence is a property that is possessed by sets of random variables. Any set of independent or dependent random variables can have any probability distribution you want, regardless of whether mean = peak or not. There is no connection between independence and mean = peak.

Finally, in TrueSkill, the skill (or MMR) is explicitly modeled by a normal distribution.

paralleluniverse

4065 Posts

July 11 2012 11:36 GMT

#234

On July 11 2012 20:22 NoobCrunch wrote:
On July 11 2012 19:37 Lysenko wrote:
On July 11 2012 19:22 NoobCrunch wrote:

It's perfectly valid to compare multiple individual's heights as independent measurements, but you can't take (for example) all the lifetime measures of a person's height, average them together, and then say that's a more accurate measure of their height than the latest single measurement. You can certainly get away with measuring a person's height five times in five minutes and averaging those, because the systematic error (due to growth or shrinkage) is likely to be very small in those five minutes.

I'm pretty sure it doesn't matter if Lysensko only calculated the latest part of someone's mmr rating. If you think of someone's mmr oscillating like a sin wave around some average point, then (for a large sample) when you happen to pick someone at the height of that wave you will also have picked someone that is at the bottom of that wave. In the long run for a large sample, the net effect will be zero. In the case of someone rapidly climbing or declining on the ladder there will be obviously some error due to not using the average. However, there's no reason to believe that this happens more often for zerg, protoss, or terran so the comparative impact is unclear.

http://i50.tinypic.com/213m4pl.jpg - that's the distribution

I can kind of see the logarithmic dip at the end but I would say that using a normal probability model is perfectly fine.

MMR should not be sampled. A sample of MMR isn't a set of independent observations like the height of a random group of people for which the usual techniques of statistical inference can be applied to.

MMR is an updated belief. It is the prior belief of skill, updated by the evidence given by whether you win or lose a game. It's the "best Bayesian belief" about a player's skill.

Take the last recorded MMR. Do not sample MMR and average it.

xelnaga_empire

627 Posts

July 11 2012 12:36 GMT

#235

Good work Skeldar. It's obvious Terran is severely underpowered now with all the tournament results lately. However, your statistics confirm how weak Terran really is.

Niazger

Germany41 Posts

July 11 2012 12:50 GMT

#236

On July 11 2012 13:27 _Search_ wrote:

Show nested quote +

On July 11 2012 12:57 Niazger wrote:

On July 11 2012 11:37 _Search_ wrote:
I'm really not understanding how the OP draws his conclusions.

Is he comparing the win rates of races where players have different MMRs? As in, Zerg is overpowered because players with lower MMRs are beating players with higher MMRs? If so, the conclusions are laughably overreaching. Despite all the esteem given to MMR, it's a terrible indicator of skill because it's based on win rates and averaged across the race. To put it concisely: balance dictates win rates, which dictate MMR, which the OP is using to determine balance. It's totally circular.

Also, certain races are just plain easier to win with using lower skill. Some races rely more on luck. How many Protoss wins can be attributed to a lucky DT timing? How many TvZs have been won by getting one medivac in the right place at the right time? Its widely accepted that Protoss is the easiest race to play and Zerg is the hardest. How does that factor into the OPs findings? Naniwa, for one, has said that the immortal sentry PvZ allin is far easier to execute than it is to stop, (though I think this description could be applied to most Protoss attacks, and to attacking in general, which helps Protoss the most since they have the strongest attacks).

It's also easier to cheese with certain races, and, assuming that a cheese win is a non-skill based win, that would give Protosses another undeserved boost in win rates, since they are doubtless the biggest cheesers. The OP treats all wins as equally legitimate, when many are clearly bullshit. I play Terrans on the ladder all the time who refuse to guard against a 6 pool, saying they'd rather lose. They go for a super greedy opening that plain straight up loses to a potential counter build. Others refuse to guard against DT openings. How are those games legitimate? These players will never be able to win against the same opponent twice!

I also totally reject the notion that each race receives an equal degree of skilled and unskilled players. Heck, just comparing the Korean to the foreigner Terrans one can see a readily apparent skill gap, one that isn't there with Protoss and Zerg.

Even then, most newcomers gravitate to Terran or Protoss (because of the campaign/because of the instant easiness). I have more than one friend who has abandoned SC2 entirely because Zerg was just too difficult to play.

Last, Zerg recently received a fairly significant buff, which means that, if the buff did what it was supposed to do, Zergs SHOULD be winning over higher MMR opponents right now. That was the point of the buff! To move Zergs up the ladder and give them higher tournament representation! In other words, something would be wrong if Zergs WEREN'T winning more! Did the OP take this into account? Did he calculate the win rates before and after the patch separately?

These are the issues I have the OPs method.

Edit: I would also love to see how this relates to the maps. Many of the maps in the pool have severe balance issues, which always affect Zerg most heavily. But those maps are being slowly weeded out and as more balanced maps enter the pool we see Zergs winning more. Most recently Korhal Compound and Metalopolis were removed (both of which were terrible for Zerg if they spawned close positions on Metalopolis). Every season the map changes have been a subtle buff to Zerg. How do the recent map changes affect the OPs findings?

Im sorry bro but you couldve saved a lot of time. You arent even close to understanding how the OP came to his results yet you post this wall of text.

Also your rant about luck is pretty retarded tbh. If getting a medivec in the right postion/dts are luck I guess we all should just roll the dice at the beginning of the game.

No u.

Rather than just saying, "you're wrong" why don't you say something that might show how I'm wrong?

And if you think there is no risk-to-reward skew in this game than you've never faced a TvP where the Protoss hid a pylon in your base.

Well, In your initial post you make it pretty obvious that you don't understand the basis of the OP so it's kind of pointless to argue with you.

The risk/reward might be skewed but thats not luck in my opinion. The risk of losing a pylon in your opponents base, which will win you the game if you get 4 gates up, is kind of skewed but then again that has nothing to do with luck if you forget to check for pylons/dont watch the probe.

Also MMR is =/= skill. If MMR was = skill all three races should have the exact same MMR unless somehow all the "skilled" players only played one race.

MockHamill

Sweden1798 Posts

July 11 2012 12:54 GMT

#237

Yes it is great to have statistics to support what most players knew all along, Terran is UP, especially against Zerg.

But statistics is just part of the picture, just look at how much more the Terran needs to do each game in order to win. Terran need to attack constantly in order to avoid late game at all cost. Meanwhile Protoss and even more Zerg can just lean back, defend and macro up.

The difference in required micro is close to absurd. What Zerg and Protoss really would need are units that are worthless without micro but good with micro. That in itself would even out the game.

Aunvilgod

2653 Posts

July 11 2012 12:55 GMT

#238

Meh. Doesn´t say a lot in my opinion. Nobody should use anything below korean GM for statistics. Below Gold Zerg is very weak because the players are lacking the mechanics. Protoss ist quite easy on the other hand so lower level players are much better with it. I play Protoss and Terran btw.

lazyitachi

1043 Posts

July 11 2012 12:59 GMT

#239

Reaffirms and confirms that blizzard is most qualified to gauge balance statistically.

Faust852

Luxembourg4004 Posts

July 11 2012 13:03 GMT

#240

On July 11 2012 21:55 Aunvilgod wrote:
Meh. Doesn´t say a lot in my opinion. Nobody should use anything below korean GM for statistics. Below Gold Zerg is very weak because the players are lacking the mechanics. Protoss ist quite easy on the other hand so lower level players are much better with it. I play Protoss and Terran btw.

So only korean progamers are alllowed to play a balanced game? Sorry but SC2 is also for gold zergs. If you need to practice 14h a day to enjoy a balanced game, there is a huge problem.

I'm a mid master Terran on EU and even if I enjoy a lot the pro scene, I don't give a fuck of the korean GM ladder. I play for myself and I wish I can play an even game with a protoss or a zerg.

Anyway, awesome thread, I enjoyed quite a lot reading the all thing.

Prev 1 10 11 12 13 14 26 Next All

Please or register to reply.

Ladder-Balance-Data - Page 12

Completed

Ongoing

Upcoming