|
On July 12 2012 20:46 skeldark wrote: But this is offtopic and have NOTHING to do with what i did here. I dont know how to explain it else to you than i did
It's not off topic at all. All it's about is whether your monte carlo simulation accurately estimates the likelihood of those differences occurring randomly. Your simulation doesn't take everything we know about the system into account. That's a huge part of your thread here.
Edit: To estimate this accurately, your simulation would have to take into account the ACTUAL variability of Blizzards ACTUAL MMR system, and then track that MMR with points the way Blizzard's system does. You're not doing any of this, and as far as I can tell, it's not possible to do it either.
|
On July 12 2012 20:42 Lysenko wrote: What he's reverse engineered are MMR values mapped back into adjusted points and then mapped from there into an Elo-like point system. The problem is that the mapping between MMR and adjusted points may not behave well for the case where a player's not in equilibrium. I'm not sure what you are getting at. There are a lot of special cases where the mapping can not be done for various reasons, but he is aware of this. I don't think there is a problem with unstable players. In principle I think your MMR can be calculated quite accurately from only one game, even if you are a new player.
|
On July 12 2012 20:51 Mendelfist wrote: In principle I think your MMR can be calculated quite accurately from only one game, even if you are a new player.
This can only be true if your MMR is in equilibrium. If your MMR is changing rapidly, meaning that the current MMR estimate for you is very different from the actual value, one game won't get you there.
|
On July 12 2012 20:53 Lysenko wrote:Show nested quote +On July 12 2012 20:51 Mendelfist wrote: In principle I think your MMR can be calculated quite accurately from only one game, even if you are a new player. This can only be true if your MMR is in equilibrium. If your MMR is changing rapidly, meaning that the current MMR estimate for you is very different from the actual value, one game won't get you there. My data prove you wrong. See pm. I dont think we make any progress in this discussion. You assume many wrong points about our mmr calculation and bring up problems we solved long time ago. And all this have nothing to do with the data in this thread.
Update i have 10.000 K data-points with race by now! Will upate op when calculation is done.
|
On July 12 2012 20:53 Lysenko wrote:Show nested quote +On July 12 2012 20:51 Mendelfist wrote: In principle I think your MMR can be calculated quite accurately from only one game, even if you are a new player. This can only be true if your MMR is in equilibrium. If your MMR is changing rapidly, meaning that the current MMR estimate for you is very different from the actual value, one game won't get you there.
Ok, maybe we are talking about different things. I'm only talking about the MMR estimate that the matchmaking system uses, not some actual skill value which the MMR theoretically should converge to. I don't even think this actual value should be called MMR. That's very confusing.
I probably also should let skeldark speak for himself. He is already covered in tar an feathers, while I'm not, yet. :-)
|
Listen, we're getting into details that are beyond what I can reasonably talk about without going back and reviewing the entire thing from end to end. Let's put this discussion off until the weekend and I'll go through all the work with fresh eyes and continue the discussion then.
|
On July 12 2012 20:56 skeldark wrote:Show nested quote +On July 12 2012 20:53 Lysenko wrote:On July 12 2012 20:51 Mendelfist wrote: In principle I think your MMR can be calculated quite accurately from only one game, even if you are a new player. This can only be true if your MMR is in equilibrium. If your MMR is changing rapidly, meaning that the current MMR estimate for you is very different from the actual value, one game won't get you there. My data prove you wrong.
Mendelfist was right, I was speaking about the "actual" skill value that MMR is trying to estimate, so our wires are crossed here. It's 5 a.m. in California, so a bad time for me to be posting about this.
|
On July 12 2012 21:12 Lysenko wrote:Show nested quote +On July 12 2012 20:56 skeldark wrote:On July 12 2012 20:53 Lysenko wrote:On July 12 2012 20:51 Mendelfist wrote: In principle I think your MMR can be calculated quite accurately from only one game, even if you are a new player. This can only be true if your MMR is in equilibrium. If your MMR is changing rapidly, meaning that the current MMR estimate for you is very different from the actual value, one game won't get you there. My data prove you wrong. Mendelfist was right, I was speaking about the "actual" skill value that MMR is trying to estimate, so our wires are crossed here. It's 5 a.m. in California, so a bad time for me to be posting about this. I call the theoretical value that the system try to find out "real skill" . We come up with a lot of word definitions by now to avoid this kind of confusion in discussions.^^ Sometimes i use them and forget that others dont know my special definitions.
eg: Dmmr = division mmr (not yet cleaned form ladder offsets) Ammr = analysed mmr (my endresult) Cmmr = caped mmr MMR = The endresult of blizzards function ommr = the mmr of the opponent pmmr = the mmr of the player
|
I see three potential problems that could confound the data presented and I'm sure some, if not all, have been mentioned before. Please correct me if any of my assumptions are incorrect.
The data measured is for all leagues and all users of your tool, which, depending on your view of how the game should be balanced, may or may not be relevant. Personally, I believe that the game should be balanced around the higher levels of play, but that's up to Blizzard ultimately.
Also, non-game balance/design factors can affect this measure and would be really hard to actually account for. I'm not saying they exist, but if they do, they would be hard to identify and account for. Maybe there are more Terran players in lower leagues due to it being the race in the campaign. (these players are most likely not using your tool, but it's an example of the kind of bias that wouldn't be accounted for by this measure) There are several other of these outside influences that could affect the data.
The other thing, which could be seen as both a positive and negative, is the fluidity of this data. You mention this in the OP, but I wanted to talk about it a little bit. Because the averages are so close and MMR changes so rapidly, this data becomes a snapshot of a period in time. The validity of the data disappears in a very short time after it is published. It literally may already have changed. If you combine that with the fact that Terran metagame is in-flux for one of their matchups and you may get data that suggests that game design changes are necessary, when in reality it will correct itself over time.
All this being said, I very much appreciate the work that you have done with the MMR tool and putting together this data. I hope that good things can come of it and hope that the community can be a little more respectful of people who put time and effort into making it better.
|
On July 12 2012 22:01 TrippSC2 wrote: I see three potential problems that could confound the data presented and I'm sure some, if not all, have been mentioned before. Please correct me if any of my assumptions are incorrect.
The data measured is for all leagues and all users of your tool, which, depending on your view of how the game should be balanced, may or may not be relevant. Personally, I believe that the game should be balanced around the higher levels of play, but that's up to Blizzard ultimately.
Also, non-game balance/design factors can affect this measure and would be really hard to actually account for. I'm not saying they exist, but if they do, they would be hard to identify and account for. Maybe there are more Terran players in lower leagues due to it being the race in the campaign. (these players are most likely not using your tool, but it's an example of the kind of bias that wouldn't be accounted for by this measure) There are several other of these outside influences that could affect the data.
The other thing, which could be seen as both a positive and negative, is the fluidity of this data. You mention this in the OP, but I wanted to talk about it a little bit. Because the averages are so close and MMR changes so rapidly, this data becomes a snapshot of a period in time. The validity of the data disappears in a very short time after it is published. It literally may already have changed. If you combine that with the fact that Terran metagame is in-flux for one of their matchups and you may get data that suggests that game design changes are necessary, when in reality it will correct itself over time.
All this being said, I very much appreciate the work that you have done with the MMR tool and putting together this data. I hope that good things can come of it and hope that the community can be a little more respectful of people who put time and effort into making it better.
1) its mostly the opponents. I collect data of my users and everyone they played. 2) data balance dont have to be desing balance. But this point is valid for every method. You cant tell the reson for the unbalance onl y that the data is out of balance 3) Agree. Can not tell yet how this will turn out. I will public monthly or 2 weeks snapshots depending how much data i get in. We will see
4) thank you. I dont have time to make all stats of my data. But i collected game lenght to. So someone can make statistic about gamelengh to winratio
|
On July 11 2012 01:50 1st_Panzer_Div. wrote: Whoah, rechecked that, you have 149,000 games of data. And you are claiming 4% of that is you as well?
So you have 5900 games of your own in this?
And why did you run the random deviation tests than only running 1,000 games, and not at least equal to the 149,000. (You actually should run random monte carlo's for whatever the estimated current userbase is to get some mock battle.net ladders from a perfectly balanced game). I could easily pick 1,000 games out of your current data and show significant imbalance towards any of the three races.
Also what are the dates your data is from?
This is a cool idea... the deviation bit is just not nearly enough random games to be accurate. What program did you use to run these tests, and was it length of the test that prevented you from doing a few hundred thousand?
Someone is crying.
User was temp banned for this post.
|
On July 12 2012 07:48 lolcanoe wrote: 1. Run an Anderson–Darling test on the data. This can be done with 3 clicks through Minitab which will automatically give you a P-value for whether or not the data is normal. If you cannot run this test or it tells you that your normality is problematic - note in the OP that your test assumes normality but was not verified to be normal.
2. The specific question here is whether or not one race has a signficantly higher MMR average than another. What your current test is actually testing for (although somewhat incorrectly), is whether or not the sample average varies significantly from the population mean. If executed correctly, this test also has application to understanding balance, but it doesn't answer the specific question. The specific question should be tested for under a very simple 2 sample t test (google it) and be tested 3 times - tz, pz, and zt. This is a much better test to fit the question and allows you to ignore the further confusion of taking another average.
3. In these calculations, independence between populations is a fair concern - and should likewise be noted.
4. Finally, be very clear about your conclusion. Your data allows you to conclude that the average skill rating of a certain race is potentially different than the skill rating of another. It is yet another jump to equate this difference to a problem in a balance, due to a potential cause-correlation problem (ie: Does terran make players bad, or do bad players pick terran?). Unfortunately, there's no way to resolve this concern with the data that you possess, so you'll have to make note of this caveat as well.
At least do the easy part and fix 1 and 2, and note very carefully what test was run (which STD's did you use?) to calculate statistical signficance.
|
On July 12 2012 20:48 Lysenko wrote:Show nested quote +On July 12 2012 20:46 skeldark wrote: But this is offtopic and have NOTHING to do with what i did here. I dont know how to explain it else to you than i did
It's not off topic at all. All it's about is whether your monte carlo simulation accurately estimates the likelihood of those differences occurring randomly. Your simulation doesn't take everything we know about the system into account. That's a huge part of your thread here. Edit: To estimate this accurately, your simulation would have to take into account the ACTUAL variability of Blizzards ACTUAL MMR system, and then track that MMR with points the way Blizzard's system does. You're not doing any of this, and as far as I can tell, it's not possible to do it either.
I asked him about this on the 1st page. His simulation is not a monte carlo according to him, and he claims to have designed his program and model himself. To me as soon as he said that he designed it himself, I just gave up, and realized there was no further point arguing about it.
|
On July 13 2012 00:23 lolcanoe wrote:Show nested quote +On July 12 2012 07:48 lolcanoe wrote: 1. Run an Anderson–Darling test on the data. This can be done with 3 clicks through Minitab which will automatically give you a P-value for whether or not the data is normal. If you cannot run this test or it tells you that your normality is problematic - note in the OP that your test assumes normality but was not verified to be normal.
2. The specific question here is whether or not one race has a signficantly higher MMR average than another. What your current test is actually testing for (although somewhat incorrectly), is whether or not the sample average varies significantly from the population mean. If executed correctly, this test also has application to understanding balance, but it doesn't answer the specific question. The specific question should be tested for under a very simple 2 sample t test (google it) and be tested 3 times - tz, pz, and zt. This is a much better test to fit the question and allows you to ignore the further confusion of taking another average.
3. In these calculations, independence between populations is a fair concern - and should likewise be noted.
4. Finally, be very clear about your conclusion. Your data allows you to conclude that the average skill rating of a certain race is potentially different than the skill rating of another. It is yet another jump to equate this difference to a problem in a balance, due to a potential cause-correlation problem (ie: Does terran make players bad, or do bad players pick terran?). Unfortunately, there's no way to resolve this concern with the data that you possess, so you'll have to make note of this caveat as well. At least do the easy part and fix 1 and 2, and note very carefully what test was run (which STD's did you use?) to calculate statistical signficance. 1) I dont assume normality.
I show that 99.99% of random values are in a range +- x and my value is outsite of range x. So its very unlikely that my value is random! THATS ALL. You call yourself statistic freaks but fail to understand this simple method!
If you want to do a more complex test with the data. Feel free to do so! i dont stop you! Its not on me to prove anything. I publish data. If you want to prove something, prove it. Why does so many people think i have to do something? do you pay me?
Sorry if im harsh but this thread is full of people who dont understand anything but act like they know what they are talking about. So its hard for me to filter everytime who have a good point and who just want to look smart.
|
I love observing stats. Interesting that Terran's highest moments are in early MMR, Zerg's is near the middle, Protoss near the high middle, and then it flattens out decently.
|
Update:
Result + Show Spoiler + TIME Filter: only between 1 Jan 1970 00:00:00 GMT - 12 Jul 2012 16:52:47 GMT Datasize 10063 Average MMR: 1593.1 Min Difference to be significant: 90% : +-16 99% : +-24 99,99% : +-36 Difference to average MMR per Race: T: -53.08 P: 11.18 Z: 32.05
TIME Filter: only between 1 Jan 1970 00:00:00 GMT - 12 Jul 2012 16:52:47 GMT MMR Filter: Only Master+ Datasize 2278 Average MMR: 2278.03 Min Difference to be significant: 90% : +-15 99% : +-23 99,99% : +-35 Difference to average MMR per Race: T: -24.42 P: 14.98 Z: 3.69
The deviation shows, that the diffrence of the race-values are to big, to be explained with an random errors.
So we come to the conclusion:
1)Terran is have significant lower average MMR compared to the total data pool We can not tell if this unbalance comes from design or other reason.
2) The unbalance is small A average win on ladder is +16 MMR
3) The data is biased towards EU/US and towards higher skill-rate.
README before writing a long post why you think that is no scientific statistic prove. + Show Spoiler +This is not an university paper about sc2 balance I dont get money for this. I dont personal care which race is op or not I publish the data i collected with my own program that i wrote to back calculate mmr. I found a very interesting anomalies in the race data. I programmed a quick test routine to show this anomalie and that is very unlikely that its a random source. I show that 99.99% of random values are in a range +- x and my value is outsite of range x. So its very unlikely that my value is random! If you want to do a more complex test with the data. Feel free to do so!Source DataIf you read the text careful, i think will agree that this is not perfect but a way better method than tldp win-ratios or random tournament results.
It says a lot about this community that over 30 people tell me what i should do and 0 people who do something with the data.
|
|
TL BANHAMMER Quote from lazyitachi removed! Which post? Looks like my ignore list is well made. He was already on the list before he wrote that
|
You say in your OP that you were able to calculate the mmr very accurately. Is the so to speak official mmr, used by the bnet, somehow observable? I thought it was not. If it is not, how do you know that your results are very accurate?
Great thread. =)
|
Sorry, didn't read the 18 pages all, but who is the Protoss with this highest MMR?
Crazy Crawling btw, really nice to see what you can actually do with some time
|
|
|
|