|
On May 04 2011 01:02 Pelican wrote:Show nested quote +On May 04 2011 00:36 Dommk wrote: 1.) The data is a sample of a bigger set. You have collected data on all games played, but not all possible games played -- all games played is the sample distribution, and all possible games would be the population. 2.) The data comes from a dichotomous variable, but what you're plotting is ultimately a proportion of the outcomes of that variable, not the variable itself. 3.) It doesn't matter if the sample size is 10,000 if the standard error is something like 1700.
The data used is deemed worthy of being collected by TPLD. He is just graphing the results available there, there is no need for confidence. The data is not dichotomous given that he is using ALL data from the database he said he was using and not just what he himself deems worthy of being used. He says he uses TPLD as his only source. If you have problems with the data he us using, then your qualms are with him using only TPLD. If he was only using specific data from TPLD then there would be some room to argue, but given he is showing no bias and graphing everything available there it doesn't need any further analysis. It is just pure recorded historical data. All he has done is taken TPLD and graphed it. There is nothing else needed to be added. First, see my post: I misread what was actually being graphed, so that post can be safely ignored. Second: I wasn't criticizing the data or the sources, I was saying that the TLPD data isn't the [full] population distribution, but is instead a sample distribution of the data -- but this point is moot, since I misread the graph anyway. However, if anyone is trying to make statements about balance based off of the graph, significance tests need to be run at each point to see if the data are actually statistically different. We can't just look at the graph and say 'oh yeah, terran is owning!' There may not even be a statistically significant difference -- and this was the point of error bars being suggested (by someone else originally, not me). However, due to the nature of the data, it doesn't look like error bars would be appropriate, but I'm not sure since I don't have the spreadsheet in front of me.
Exactly right here. The fact of the matter is that you have to create a distinction between the population and the sample data. If you consider that the population is all Starcraft II gamers as a whole, and these tournaments as a sample, then you can do sample statistics. There seems to be a lot of criticism, saying that the tournament data is not a sample, but a population, but this thought needs to be re-examined.
If you take all sc2 players as the population, you can isolate these tournaments in order to try and control for variable bias. This removes a random element to the study, which can be essential, but accounts for skill of the players (can be argued, but not without offense), motive/goals, and the bias of mirror matches. You could assume the mean win percentage of 50%, and run regressions to see if the percentage difference is significant. Correct me if I'm wrong here, but I don't see anything wrong with sample statistics in this case.
|
Ok, so we'll just conclude that in that 6 months period, with balance patches, better maps, better players, and a big sample, the fact that all winrates converge to 50% in tournaments (so without matchmaking to compensate) is just a coincidence.
|
On May 03 2011 23:53 Ctuchik wrote:Show nested quote +On May 03 2011 21:18 Primadog wrote:On May 03 2011 20:59 Ctuchik wrote:On May 03 2011 19:41 Elean wrote:On May 03 2011 18:27 Primadog wrote: Any chance of outputting graphs with confidence intervals? I am concerned that some of the "trends" we see in the graph is simply random fluctuations due to some months having smaller sets of data. Assuming all the match are independant event, the standard deviation for a given matchup is: sqrt(p(1-p)/N) where p≈0.5 is the win rate of a race, and N the number of played games. Let's take the PvZ match up as example. There are 2244 games over 6 months, 374 games/months gives a standard deviation of 2.5% Considering an error of twice the standard deviation, your confidence interval is +/- 5%. Conclusion: the fluctuations we observe for the PvZ match up can very well be due to the sample size. For ZvT however the sample size is large enough to say the matchup was unbalanced. I would love to do his, trying to figure out how now. =P Ideally you can make a graph something like the second graph on a stock screener: ![[image loading]](http://imgur.com/qLi1l.gif) 95% confidence-interval (2 standard deviation) is the standard that most uses. So the range you will use is the mean+/-2*standard deviation (μ +/- 2σ), where σ=(P * (1-P)/n)^.5. Note that n will be the number of data you have per month, not the total datapoints overall. Well, reading up a bit on it it doesn't seem like this would be all that useful. Please correct me if I'm wrong. *The data behind this is not a sample of a bigger set. It's all matches played during this time period, and I'm not measuring it against a larger population. *The actual data behind it is binary, ie, a match is either a 1 or a 0, win or loss. *The sample size for each month is over 1000 games. See ![[image loading]](http://imgur.com/k49GV.png) . Please correct me if I'm wrong here!
Any chance of sharing your data then? Simply the mean and number of games per month will be suffice. You can put it on http://docs.google.com/.
There's a difference in objective here. If your sole objective is to plot the actual changes in high level play racial win-rate, then the current graph is sufficient. However, if you want to address the question: Can we be confident that there was a imbalance between Race-A vs Race-B, some kind of statistics test will be needed.
Think about this comparison: Suppose we're flipping a coin. For one hour we flip it 10 times, next we flip it 1000 times, and the third hour we flip it 20 more times. I can make a graph that plot the percentage of times the coin landed heads, but how can we say with confidence that the coin is fair or not just by looking at the graph? What if for the first ten times, we get only 1 head and 9 tails? The entire 1030 times I flipped is the whole population, yet we can't really just point to the results of the third hour and say "this coin is balanced."
You need to use some statistics to test so, and one way is to calculate a standard error to account for the number of samples you have. The easiest one to use is 95% standard deviation (μ +/- 2σ). For hour 1, we expect the percentage to be between 18% and 81% (σ=.16); for hour 2, we expect the percentage to be between 47% and 53% (σ=.015); and for hour 3, we expect the percentage to be between 28% and 72% (σ=.11). It's a matter of expecting a fair coin to regress back to the mean if the sample in the particular hour is big enough.
|
I just looked at the charts quickly; it's interesting that the wider results are converging while the Korean results are diverging. My understanding is that the level of play on the KR server is much higher and more inflexible (less varied). Perhaps they are exploiting timing windows/small points of imbalance, and communicating these ideas to each other more quickly, so that small racial imbalances are more obvious on the KR server.
|
I like how zerg is blue and terran red
|
These graphs are extremly misleading. All tournaments have elimination system. If someone has win/lose ratio <50% that means he never won anything(with equal participation). You have to adjust the ordinate scale to (w/l ratio)*(quantity of matches) unit to make some sense.
|
On May 04 2011 01:42 Fr0d0 wrote: These graphs are extremly misleading. All tournaments have elimination system. If someone has win/lose ratio <50% that means he never won anything(with equal participation). You have to adjust the ordinate scale to (w/l ratio)*(quantity of matches) unit to make some sense.
It's not tracking players, it's tracking games. So in effect it already does that.
|
On May 04 2011 01:34 Primadog wrote:Show nested quote +On May 03 2011 23:53 Ctuchik wrote:On May 03 2011 21:18 Primadog wrote:On May 03 2011 20:59 Ctuchik wrote:On May 03 2011 19:41 Elean wrote:On May 03 2011 18:27 Primadog wrote: Any chance of outputting graphs with confidence intervals? I am concerned that some of the "trends" we see in the graph is simply random fluctuations due to some months having smaller sets of data. Assuming all the match are independant event, the standard deviation for a given matchup is: sqrt(p(1-p)/N) where p≈0.5 is the win rate of a race, and N the number of played games. Let's take the PvZ match up as example. There are 2244 games over 6 months, 374 games/months gives a standard deviation of 2.5% Considering an error of twice the standard deviation, your confidence interval is +/- 5%. Conclusion: the fluctuations we observe for the PvZ match up can very well be due to the sample size. For ZvT however the sample size is large enough to say the matchup was unbalanced. I would love to do his, trying to figure out how now. =P Ideally you can make a graph something like the second graph on a stock screener: ![[image loading]](http://imgur.com/qLi1l.gif) 95% confidence-interval (2 standard deviation) is the standard that most uses. So the range you will use is the mean+/-2*standard deviation (μ +/- 2σ), where σ=(P * (1-P)/n)^.5. Note that n will be the number of data you have per month, not the total datapoints overall. Well, reading up a bit on it it doesn't seem like this would be all that useful. Please correct me if I'm wrong. *The data behind this is not a sample of a bigger set. It's all matches played during this time period, and I'm not measuring it against a larger population. *The actual data behind it is binary, ie, a match is either a 1 or a 0, win or loss. *The sample size for each month is over 1000 games. See ![[image loading]](http://imgur.com/k49GV.png) . Please correct me if I'm wrong here! Any chance of sharing your data then? Simply the mean and number of games per month will be suffice. You can put it on http://docs.google.com/. There's a difference in objective here. If your sole objective is to plot the actual changes in high level play racial win-rate, then the current graph is sufficient. However, if you want to address the question: Can we be confident that there was a imbalance between Race-A vs Race-B, some kind of statistics test will be needed. Think about this comparison: Suppose we're flipping a coin. For one hour we flip it 10 times, next we flip it 1000 times, and the third hour we flip it 20 more times. I can make a graph that plot the percentage of times the coin landed heads, but how can we say with confidence that the coin is fair or not just by looking at the graph? What if for the first ten times, we get only 1 head and 9 tails? The entire 1030 times I flipped is the whole population, yet we can't really just point to the results of the third hour and say "this coin is balanced." You need to use some statistics to test so, and one way is to calculate a standard error to account for the number of samples you have. The easiest one to use is 95% standard deviation (μ +/- 2σ). For hour 1, we expect the percentage to be between 18% and 81% (σ=.16); for hour 2, we expect the percentage to be between 47% and 53% (σ=.015); and for hour 3, we expect the percentage to be between 28% and 72% (σ=.11). It's a matter of expecting a fair coin to regress back to the mean if the sample in the particular hour is big enough.
Thanks, that is a great explanation. I posted the data being used in the graph here (without restriction on month):
https://spreadsheets.google.com/ccc?key=0AgaW81yNlT2UdGJNWktQR1JYZWtYZk91dHEyMUhrUUE&hl=en&authkey=CI-arP8O
I also have the full dataset loaded, so if there is anything else needed I can easily put it up there as well.
|
You are tracking match-ups actually. W/L ratio <50% in elemination system means zerg never won anything(with equal amount of qulified players of each race).
|
On May 04 2011 01:59 Fr0d0 wrote: You are tracking match-ups actually. W/L ratio <50% in elemination system means zerg never won anything(with equal amount of qulified players of each race). can you explain it more? i do not get that.
|
I'd love to see demarcation of patch notes or at the very least "major" changes within the last 6 months. reaper changes, roach range, depot before barracks, amulet, infestor, (major map pool changes) just to name a few.
I couldn't really see anything written about it but I feel as though it's hard to overlook a few variables. The first being actual player skill, one player regardless of race just simply may be better than the other. In Tennis player A have a "better" raquet or shoes, but player B's fitness is far greater and is just simply the better athlete.
I'm not a numbers guy by any means, but I would also think that given the way tournaments are done (short groups) then playoff elimination, it's limiting in a sense. What about groups that have 3 protoss in it, and only 1 makes it out. Or a situation where Protoss A in a pvp advances to a semis, but severely lacks a tvp or zvp game as opposed to Protoss B who may have the better ZvP or TvP record. Is there any truth to this? I know that mirrors aren't counted but I think the results of mirror matches could skew the results as now player B who had a better chance in the rest of the bracket was eliminated by player A or one poor decision.
I like the graphs, their oscillations are really interesting which is why I'd like to see demarcation of major changes. I still take it with a grain of salt, not because the sample size is too small but because tournament formats are too limited, not to mention I would think frequency of race would also play a role in it?
|
On May 04 2011 02:23 trancey_ wrote:Show nested quote +On May 04 2011 01:59 Fr0d0 wrote: You are tracking match-ups actually. W/L ratio <50% in elemination system means zerg never won anything(with equal amount of qulified players of each race). can you explain it more? i do not get that. W/L ration <50% means you have lost more games, then you won. How can you you advance to final if you got eliminated, because you lost more games then you won ?
|
On May 04 2011 02:34 Fr0d0 wrote:Show nested quote +On May 04 2011 02:23 trancey_ wrote:On May 04 2011 01:59 Fr0d0 wrote: You are tracking match-ups actually. W/L ratio <50% in elemination system means zerg never won anything(with equal amount of qulified players of each race). can you explain it more? i do not get that. W/L ration <50% means you have lost more games, then you won. How can you you advance to final if you got eliminated, because you lost more games then you won ? Isn't it possible that the win rate was 60% in one tournament and <50% in another tournament in the same month?
|
On May 04 2011 02:23 trancey_ wrote:Show nested quote +On May 04 2011 01:59 Fr0d0 wrote: You are tracking match-ups actually. W/L ratio <50% in elemination system means zerg never won anything(with equal amount of qulified players of each race). can you explain it more? i do not get that. It's because it makes no sense at all.
Sv1, the examples you take have absolutely no effect on the data, it just change the sample size but has no effect on the actual graph, because the graph don't care about groups or what, the graph just takes XvsY and who wins and who loses. If of the 10000 games, 9999 would be TvZ and 1 would be TvP, it would just mean that the TvP graph is meaningless and no conclusion can be made of it. But it wouldn't mean that the result of the 9999 other games is meaningless.
|
So the PvZ matchup is currently more swung in the favour of Zerg than it ever has been in favour of Protoss? Interesting.
|
On May 04 2011 02:38 trancey_ wrote:Show nested quote +On May 04 2011 02:34 Fr0d0 wrote:On May 04 2011 02:23 trancey_ wrote:On May 04 2011 01:59 Fr0d0 wrote: You are tracking match-ups actually. W/L ratio <50% in elemination system means zerg never won anything(with equal amount of qulified players of each race). can you explain it more? i do not get that. W/L ration <50% means you have lost more games, then you won. How can you you advance to final if you got eliminated, because you lost more games then you won ? Isn't it possible that the win rate was 60% in one tournament and <50% in another tournament in the same month?
Yeah, but it should be <40% in another tournament(not <50%) if in first tournament it was 60%. Then average zerg perfomance would be <50%. But <40% means zerg was totally eliminated in another tournament in the fist round(BO3 andvance rata is >66% and BO5 andvance rate is >60%). I'm not really familiar with all SC2 tournament systems, but graphs don't have any values in <40% range so may be it not even possible.
|
Interesting data, but most of the interpretations here are standing on very shaky ground. I really don't like how people are desperately trying to find proof for their theories here.
|
Just graph total amount of wins for T, P and Z(including mirros!). W/L ratio in unequal qulified groups with different tournament systems tells nothing. Difference in 0.0001% can mean that in paticular matchup some race always wins another.
|
On May 04 2011 02:52 branflakes14 wrote: So the PvZ matchup is currently more swung in the favour of Zerg than it ever has been in favour of Protoss? Interesting. No, the 2nd graph means nothing because each month doesn't even have 50 games. So a particular player going to the finals will make the graph skyrocket for instance. The sole Losira vs Alicia win is making zerg takes about +4% winrate for instance.
|
i love stats so much ♥ thnx
|
|
|
|