|
On July 13 2012 02:40 Junichi wrote: You say in your OP that you were able to calculate the mmr very accurately. Is the so to speak official mmr, used by the bnet, somehow observable? I thought it was not. If it is not, how do you know that your results are very accurate?
Great thread. =) i can observe it because im a fucking genius. ^^ Nah it was month of work to find a way and i did not do it alone : http://www.teamliquid.net/forum/viewmessage.php?topic_id=334561
Good question. Its very hard to judge that. Promotion demotion is one way. I know before the promotion that he should get a promotion and he really gets promoted. Also i know that the opponent should be close to the player ( in the end Match making). So i can see if this is the case in average. Not allways in practise because i judge the opponent sometimes on the player so i would only observ my own mistake.
They main way to see it is by analyse the gamedata. We could find out many special rules we would never able to see if we dont have accurate values. Like MMr caps.
Also i can test single part of the process. Like my tier analyser. In the second someone plays a master player i know the exact tier. ( master only have 1) So i take a high diamond player of my user-base and predict the mmr of his next game. Than he plays a master player and i calculate the mmr independent from his history. After that i check how close my prediction from the tier analyser is to reality. Last values i checked i was not more than 5 mmr off.
But its not perfect. My data say incontroll is one of the best us ladder players and thats obvious a bug ^^
|
This is a chunk of data you analyzed there! Thanks for your insights.
I have to disagree in one point, though:
2) Why mistakes in the MMR calculation don't affect the result.
First: the accuracy of my mmr calculation is very good. But i can be wrong in some points or for some users. However nothing in the calculation takes the race into account. So every mistake in mmr calculation is independent from the race average mmr result!
Well imagine the following situation: A few month ago all races were perfectly balanced, but Terran was too strong. Then Terran got nerved, so finally all races are in perfect balance. Naturally Terran players will now start to fall in their rankings a little. This is of course reflected in the MMR value. But what if you have to much weight on the soaring/sinking factor? Then your calculated MMR for all those sinking terran players would be even lower than the "correct" MMR value. This would lead to a miss judgment of your data even thou you don't specifically check for the race.
Im not saying any of the above is true. I'm just trying to say that the conclusion (I-dont-check-for-races => (implies) Wrong-MMRs-Do-No-Harm-Or-Good-To-A-Specific-Race) must not be always true.
Btw: I love the way you presented your data. Totally unbiased and even hinting towards the smallness of the difference.
|
On July 13 2012 03:05 AKnopf wrote:This is a chunk of data you analyzed there! Thanks for your insights. I have to disagree in one point, though: Show nested quote +2) Why mistakes in the MMR calculation don't affect the result.
First: the accuracy of my mmr calculation is very good. But i can be wrong in some points or for some users. However nothing in the calculation takes the race into account. So every mistake in mmr calculation is independent from the race average mmr result! Well imagine the following situation: A few month ago all races were perfectly balanced, but Terran was too strong. Then Terran got nerved, so finally all races are in perfect balance. Naturally Terran players will now start to fall in their rankings a little. This is of course reflected in the MMR value. But what if you have to much weight on the soaring/sinking factor? Then your calculated MMR for all those sinking terran players would be even lower than the "correct" MMR value. This would lead to a miss judgment of your data even thou you don't specifically check for the race. Im not saying any of the above is true. I'm just trying to say that the conclusion (I-dont-check-for-races => (implies) Wrong-MMRs-Do-No-Harm-Or-Good-To-A-Specific-Race) must not be always true. Btw: I love the way you presented your data. Totally unbiased and even hinting towards the smallness of the difference.
That is a good point. I wrote a long post why this is theoretical not possible but when i think about it it is. I dont have a shrinking factor so the example cant happen. I calculate each game new.
But i get the point that i could make mistake that is race biased without even knowing the race. I have to think about what this can be and if any of this factors affect my calculation At the moment i dont see such a point and if there is one we would notice in the datasets allready. But i realise that "notice" and "there is no" is not a prove against your point.
--- Site note for the guys i discussed with that mmr dont care for the result. This still holds. The statistic significant change in the result would be still significant. This would even prove that the race is a depending factor on mmr. Also this kind of mistake would not show up in any statistic analyse.
A very good point and the first valid critic on my method i see.
|
On July 13 2012 02:40 Junichi wrote: You say in your OP that you were able to calculate the mmr very accurately. Is the so to speak official mmr, used by the bnet, somehow observable? I thought it was not. If it is not, how do you know that your results are very accurate?
The answer is that it's "kind of" observable. You can figure out the relationship between the MMR of one player and the adjusted point score of another. If the second player has a fairly stable point score and has played a lot of games, then you can assume their MMR is stable and in equilibrium with their point score. Then, you look at how many points another player vs. them gains or loses. If the gain or loss is 12 points, then they have the same MMR in units of adjusted points.
What you can't really measure from a single game between two players is how the MMR probability function works for players with different MMRs. What I mean by this is that if two players gain or lose 12 points after a game, they'll have a 50/50 win/loss rate vs. each other, but if two players play and the point differential is +10/-14 if the better player wins, what's the likelihood of a win or loss?
Having a large enough data set, like skeldark is collecting, can potentially answer that question. I haven't read what they've written closely enough to know if they've backed that out, but it should be possible (just by, for example, selecting all the games that result in a +10/-14 result among players with stable point values, and looking at the percentage results.)
There's also the potential possibility that +10/-14 games have a different win likelihood in Diamond than they do in Bronze. It might be possible to back that out from the data as well, but I'm guessing that everyone's assumed that's not the case. I don't have an opinion, just mentioning the possibility for completeness.
Edit: Given skeldark's answer above, it looks like they haven't done the kind of analysis I described here, but combined with looking at promotions and demotions, this kind of analysis might help provide more useful information from this data set.
Note that I'm not criticizing the data collection or some of the aggregate info they've extracted from it, my concerns mostly focus on the monte carlo simulation and analysis in this particular OP. I think the rest of their work is very interesting stuff.
|
On July 13 2012 03:39 Lysenko wrote:Show nested quote +On July 13 2012 02:40 Junichi wrote: You say in your OP that you were able to calculate the mmr very accurately. Is the so to speak official mmr, used by the bnet, somehow observable? I thought it was not. If it is not, how do you know that your results are very accurate? The answer is that it's "kind of" observable. You can figure out the relationship between the MMR of one player and the adjusted point score of another. If the second player has a fairly stable point score and has played a lot of games, then you can assume their MMR is stable and in equilibrium with their point score. Then, you look at how many points another player vs. them gains or loses. If the gain or loss is 12 points, then they have the same MMR in units of adjusted points. What you can't really measure from a single game between two players is how the MMR probability function works for players with different MMRs. What I mean by this is that if two players gain or lose 12 points after a game, they'll have a 50/50 win/loss rate vs. each other, but if two players play and the point differential is +10/-14 if the better player wins, what's the likelihood of a win or loss? Having a large enough data set, like skeldark is collecting, can potentially answer that question. I haven't read what they've written closely enough to know if they've backed that out, but it should be possible (just by, for example, selecting all the games that result in a +10/-14 result, and looking at the percentage results. There's also the potential possibility that +10/-14 games have a different win likelihood in Diamond than they do in Bronze. It might be possible to back that out from the data as well, but I'm guessing that everyone's assumed that's not the case. I don't have an opinion, just mentioning the possibility for completeness. Understand the points you have. We checked all this month ago. The f function not-that publish in his thread is version 3. If there is something like you mention we would see that long long time ago. We searched for it. +24 -1 games have a high derivation but the f function still give results that fit in the picture of 12/-12 games . Any of this factors would show up in an not expected mmr for an single game that be noticeable even for only 1 player and for sure noticed in our 100k gamedatabase. If you where there 2 month ago you could help us a lot figure all this out
My main defends point against critic on the MMR-method is : it work in practise without mistakes for 1 month now.
Beside that, the f function is one part not the hole MMR calculation, it only give you dmmr and even this only if some special rules are not active.
|
On July 13 2012 03:45 skeldark wrote: +24 -1 games have a high derivation but the f function still give results that fit in the picture of 12/-12 games . Any of this factors would show up in an not expected mmr for an single game that be noticeable even for only 1 player and for sure noticed in our 100k gamedatabase.
Can you explain this part again? I do not understand what you're saying.
Edit: Not sure what "high derivation" means. Also, I don't understand what a "not expected MMR" is.
|
The f function calculate the dmmr depending on adjusted points and changepoints. Its see how far the opponents mmr is away of someones adjusted points by the change points. You look at the skill-function and how it act. But thats not the important part. We back-engenier. We look at what cases the change not the diffrence of the players and how we have to change them. Blizzard already did this! We dont need to calculate the MMR we just have to READ it.
Not_that found a function that would act like the one we can observe on their results ( the point-change is the result of this function)
We calculate back to the start value ( DMMR) , the function used, to get to this pointchange. The derivation of the startvalue is higher if the pointchange is away form 12/-12. That has nothing to do with the derivation of the skill-function!
We can not see exactly where the startvalue (DMMR) is (information loose of the function because it calculate a small number out of a big number) but we can see the range where it is.
Not expected MMR is an value that that we know can not be correct. e.g. you won a game and your mmr falls. Falling and raising after win and looses was the main indicator to find the f function! We calculate the mmr before a game. And before the next game. If there is any mistake in the function in one of the datapoints we would see a raise after a loose or a fall after a win. We dont have such a point in any game!
All this is explained in not_that thread. You should post your question there. NT is also better in explaining what he did than me
|
On July 13 2012 01:06 skeldark wrote:Show nested quote +On July 13 2012 00:23 lolcanoe wrote:On July 12 2012 07:48 lolcanoe wrote: 1. Run an Anderson–Darling test on the data. This can be done with 3 clicks through Minitab which will automatically give you a P-value for whether or not the data is normal. If you cannot run this test or it tells you that your normality is problematic - note in the OP that your test assumes normality but was not verified to be normal.
2. The specific question here is whether or not one race has a signficantly higher MMR average than another. What your current test is actually testing for (although somewhat incorrectly), is whether or not the sample average varies significantly from the population mean. If executed correctly, this test also has application to understanding balance, but it doesn't answer the specific question. The specific question should be tested for under a very simple 2 sample t test (google it) and be tested 3 times - tz, pz, and zt. This is a much better test to fit the question and allows you to ignore the further confusion of taking another average.
3. In these calculations, independence between populations is a fair concern - and should likewise be noted.
4. Finally, be very clear about your conclusion. Your data allows you to conclude that the average skill rating of a certain race is potentially different than the skill rating of another. It is yet another jump to equate this difference to a problem in a balance, due to a potential cause-correlation problem (ie: Does terran make players bad, or do bad players pick terran?). Unfortunately, there's no way to resolve this concern with the data that you possess, so you'll have to make note of this caveat as well. At least do the easy part and fix 1 and 2, and note very carefully what test was run (which STD's did you use?) to calculate statistical signficance. 1) I dont assume normality. I show that 99.99% of random values are in a range +- x and my value is outsite of range x. So its very unlikely that my value is random! THATS ALL. You call yourself statistic freaks but fail to understand this simple method! I fail to understand because you failed to explain.
What you did:
You take a large pool of data. Find average.
You take a subset of that data. Find average.
Subtract the difference. Then what? What do you mean by you "showed" 99.99% of the random values are +-x. From what ,the average? By random values, you mean data points?
If my understanding is correct, you are implying that 99+% of the data lies between +-25 of the mean. That's NOT what the graph here: http://postimage.org/image/n60jmstyz/ shows at all. You need to be more thorough in your explanation and your calculations.
And once again, it's not a more sophicated test, but you should not be comparing dependent sample averages. You should be comparing average MMR of t to average of z, t to p, and p to z, and so forth. So you want 3 tests to see if the averages are different from each other, not testing if a single race varies signficantly from the average of all races.
|
On July 13 2012 04:06 lolcanoe wrote:Show nested quote +On July 13 2012 01:06 skeldark wrote:On July 13 2012 00:23 lolcanoe wrote:On July 12 2012 07:48 lolcanoe wrote: 1. Run an Anderson–Darling test on the data. This can be done with 3 clicks through Minitab which will automatically give you a P-value for whether or not the data is normal. If you cannot run this test or it tells you that your normality is problematic - note in the OP that your test assumes normality but was not verified to be normal.
2. The specific question here is whether or not one race has a signficantly higher MMR average than another. What your current test is actually testing for (although somewhat incorrectly), is whether or not the sample average varies significantly from the population mean. If executed correctly, this test also has application to understanding balance, but it doesn't answer the specific question. The specific question should be tested for under a very simple 2 sample t test (google it) and be tested 3 times - tz, pz, and zt. This is a much better test to fit the question and allows you to ignore the further confusion of taking another average.
3. In these calculations, independence between populations is a fair concern - and should likewise be noted.
4. Finally, be very clear about your conclusion. Your data allows you to conclude that the average skill rating of a certain race is potentially different than the skill rating of another. It is yet another jump to equate this difference to a problem in a balance, due to a potential cause-correlation problem (ie: Does terran make players bad, or do bad players pick terran?). Unfortunately, there's no way to resolve this concern with the data that you possess, so you'll have to make note of this caveat as well. At least do the easy part and fix 1 and 2, and note very carefully what test was run (which STD's did you use?) to calculate statistical signficance. 1) I dont assume normality. I show that 99.99% of random values are in a range +- x and my value is outsite of range x. So its very unlikely that my value is random! THATS ALL. You call yourself statistic freaks but fail to understand this simple method! I fail to understand because you failed to explain. What you did: You take a large pool of data. Find average. You take a subset of that data. Find average. Subtract the difference. Then what? What do you mean by you "showed" 99.99% of the random values are +-x. From what ,the average? By random values, you mean data points? If my understanding is correct, you are implying that 99+% of the data lies between +-25 of the mean. That's NOT what the graph here: http://postimage.org/image/n60jmstyz/ shows at all. You need to be more thorough in your explanation and your calculations. And once again, it's not a more sophicated test, but you should not be comparing dependent sample averages. You should be comparing average MMR of t to average of z, t to p, and p to z, and so forth. So you want 3 tests to see if the averages are different from each other, not testing if a single race varies signficantly from the average of all races.
NO. nothing of this is true. i try to explain
Given : Data A Data A was created without knowledge of P Property P Property P was collected without knowledge of A
90.000 Random sorted Data groups of A produced in 99.55% of the cases values between -25 and +25 P sorted Data groups of data A produced P1: -53.68 P2 11.87 P3 31.52 P sorted Data group of data B subgroub of A produced P1: -27.70 P2 17.49 P3 3.82 P sorted Data group of data C subgroub of A produced P1: -43.51 P2 0.37 P3 34.93
Data A is obvious significant biased towards P!
btw you find this in the op....
And once again, it's not a more sophicated test, but you should not be comparing dependent sample averages. You should be comparing average MMR of t to average of z, t to p, and p to z, and so forth. So you want 3 tests to see if the averages are different from each other, not testing if a single race varies signficantly from the average of all races.
Adding or substracting the same number from 2 numbers dont change the diffrence of this number to each other: A -C = X B -C = Y X -Y = A-B
|
|
On July 13 2012 04:14 monkybone wrote: Why does uneven skill distribution not affect average MMR in an ideal situation with perfect balance? because a situation with perfect balance = even skill distribution
When i talk about balance i talk about even skill distribution of races. That the balance of the Property (race) of the data (account skill)
This DONT have to be game design balance. Last one is a social term and can be calculated because its not even clear defined. If all Terran pro players are ill and can not play is the game still balanced? I say no. You could say yesl Its not a mathematical question.
|
On July 13 2012 01:30 skeldark wrote: If you read the text careful, i think will agree that this is not perfect but a way better method than tldp win-ratios or random tournament results.
This is something I can completely agree with, that the method used, regardless of the many faults I find with it, is much more significant than the tldp win-ratios.
You also said it's not a university paper, and I think that's what most people are looking for, a much more detailed and broken down explaination. I think in general I get what you did with your actual data, my main issue is with your model that you used to determine what acceptable SD would be under a balanced model. But as you used your own program, I don't think there's any point in going further down that road.
With that, cool idea, I disagree with your final analyst and explaination of the results you found from the data, and thus disagree that it proves anything.
I do want to say that with your large amount of data, that over the course of it's collection, that your data does show that Terran tends to be slightly lower MMR than the other two, this could be for many reasons, including that at the start of your collection terran was over-balanced, and then readjusted lower after you started collecting data, it could fit within a proper standard deviation with a different (better) model of b.net, or many other things.
Thank you for posting your data as well, final question, when did you start collecting data, was it in fact 1970 (lol, jk) or was it on May 13th, 2012 (best date I can find from what you've posted)
|
On July 13 2012 04:10 skeldark wrote:Show nested quote +On July 13 2012 04:06 lolcanoe wrote:On July 13 2012 01:06 skeldark wrote:On July 13 2012 00:23 lolcanoe wrote:On July 12 2012 07:48 lolcanoe wrote: 1. Run an Anderson–Darling test on the data. This can be done with 3 clicks through Minitab which will automatically give you a P-value for whether or not the data is normal. If you cannot run this test or it tells you that your normality is problematic - note in the OP that your test assumes normality but was not verified to be normal.
2. The specific question here is whether or not one race has a signficantly higher MMR average than another. What your current test is actually testing for (although somewhat incorrectly), is whether or not the sample average varies significantly from the population mean. If executed correctly, this test also has application to understanding balance, but it doesn't answer the specific question. The specific question should be tested for under a very simple 2 sample t test (google it) and be tested 3 times - tz, pz, and zt. This is a much better test to fit the question and allows you to ignore the further confusion of taking another average.
3. In these calculations, independence between populations is a fair concern - and should likewise be noted.
4. Finally, be very clear about your conclusion. Your data allows you to conclude that the average skill rating of a certain race is potentially different than the skill rating of another. It is yet another jump to equate this difference to a problem in a balance, due to a potential cause-correlation problem (ie: Does terran make players bad, or do bad players pick terran?). Unfortunately, there's no way to resolve this concern with the data that you possess, so you'll have to make note of this caveat as well. At least do the easy part and fix 1 and 2, and note very carefully what test was run (which STD's did you use?) to calculate statistical signficance. 1) I dont assume normality. I show that 99.99% of random values are in a range +- x and my value is outsite of range x. So its very unlikely that my value is random! THATS ALL. You call yourself statistic freaks but fail to understand this simple method! I fail to understand because you failed to explain. What you did: You take a large pool of data. Find average. You take a subset of that data. Find average. Subtract the difference. Then what? What do you mean by you "showed" 99.99% of the random values are +-x. From what ,the average? By random values, you mean data points? If my understanding is correct, you are implying that 99+% of the data lies between +-25 of the mean. That's NOT what the graph here: http://postimage.org/image/n60jmstyz/ shows at all. You need to be more thorough in your explanation and your calculations. And once again, it's not a more sophicated test, but you should not be comparing dependent sample averages. You should be comparing average MMR of t to average of z, t to p, and p to z, and so forth. So you want 3 tests to see if the averages are different from each other, not testing if a single race varies signficantly from the average of all races. NO. nothing of this is true. i try to explain Given : Data A Data A was created without knowledge of P Property P Property P was collected without knowledge of A 90.000 Random sorted Data groups of A produced in 99.55% of the cases values between -25 and +25 P sorted Data groups of data A produced P1: -53.68 P2 11.87 P3 31.52 P sorted Data group of data B subgroub of A produced P1: -27.70 P2 17.49 P3 3.82 P sorted Data group of data C subgroub of A produced P1: -43.51 P2 0.37 P3 34.93 Data A is obvious significant biased towards P! btw you find this in the op.... Show nested quote + And once again, it's not a more sophicated test, but you should not be comparing dependent sample averages. You should be comparing average MMR of t to average of z, t to p, and p to z, and so forth. So you want 3 tests to see if the averages are different from each other, not testing if a single race varies signficantly from the average of all races.
Adding or substracting the same number from 2 numbers dont change the diffrence of this number to each other: A -C = X B -C = Y X -Y = A-B
What do you mean by data group - how large is each of these data groups you are talking about? And why are we dealing with data groups instead of lumping all the data into one sum?
As for the A-C = X mumbo jumbo, technically you're right in the sense that testing t vs p and t vs z will utilize all the data, but you're missing the point I'm trying to make.
You're currently using A - avg(a,b,c), when should be testing a against c directly to avoid confusion.
|
On July 13 2012 04:21 1st_Panzer_Div. wrote:Show nested quote +On July 13 2012 01:30 skeldark wrote: If you read the text careful, i think will agree that this is not perfect but a way better method than tldp win-ratios or random tournament results.
I do want to say that with your large amount of data, that over the course of it's collection, that your data does show that Terran tends to be slightly lower MMR than the other two That is all i ever said!
Thank you for posting your data as well, final question, when did you start collecting data, was it in fact 1970 (lol, jk) or was it on May 13th, 2012 (best date I can find from what you've posted)
mid season 7 but with races since last patch of my program so 2 weeks ago
|
|
On July 13 2012 04:32 monkybone wrote:Show nested quote +On July 13 2012 04:17 skeldark wrote:On July 13 2012 04:14 monkybone wrote: Why does uneven skill distribution not affect average MMR in an ideal situation with perfect balance? because a situation with perfect balance = even skill distribution When i talk about balance i talk about even skill distribution of races. That the balance of the Property (race) of the data (account skill) This DONT have to be game design balance. Last one is a social term and can be calculated because its not even clear defined. If all Terran pro players are ill and can not play is the game still balanced? I say no. You could say yesl Its not a mathematical question. Of course the game could be balanced even in absurd situations where all Terran players were complete scrubs or any other skill distribution.
Now you talking about game balance. So what is game balance. Define...
I say the fact that all terrans are scrubs = inbalance. Its imblanced because all terran are scrubs. My definition of balance = not all terrans are scrubs!
You say its balanced because the reasons is the player not the game.
Reason can be anything. That is not a mathematical value!
You are free to find statistic methods or social analyses what is a reason for inbalance in the data. Thats not what i did. I look IF there is inbalance in the data not what cases it.
|
You seem to be ignoring the more important questions about defining what you mean by a "data group" and skipping to what you like rehashing 100 times.
|
On July 13 2012 04:46 lolcanoe wrote: You seem to be ignoring the more important questions about defining what you mean by a "data group" and skipping to what you like rehashing 100 times. a data group is a group of the datapool. A subgroup
I can not mix all together because i have 3 different races. So to find out the average of one race i have to take all player of only this race.
Datagroup terran = All terran players of the data. Datagroup random = a random subgroup A random data group is when i give every Player a random number and than take the group where this random number is 1.
I dont know how i can simplify more.
|
|
On July 13 2012 04:50 skeldark wrote:Show nested quote +On July 13 2012 04:46 lolcanoe wrote: You seem to be ignoring the more important questions about defining what you mean by a "data group" and skipping to what you like rehashing 100 times. a data group is a group of the datapool. A subgroup I can not mix all together because i have 3 different races. So to find out the average of one race i have to take all player of only this race. This is the data group E.g. terran. All terran players of the data. A random data group is when i give every Player a random number and than take the group where this random number is 1. Yes but what's the statistical value of creating random data groups?
And what do you mean you didn't mix them together - aren't you data values calculated by average(t) - average(t,z,p)? And you did use a player-weighted average right?
|
|
|
|