ETisME wrote:I am sorry but you got the stats wrong.
The over 20 requirement is for cluster analysis, something that you aren't doing because you are not trying to make any clusters out from the data set.
The over 20 thing you talked about is just for normal hypothesis testing, which you aren't doing.
You need to calculate out the optimal minimal sample size based upon your confidence interval etc
in short, you need to calculate out a sample size that truely represent the population. Merely 50 games out of his entire ZvT history does not make sense
HyperionDreamer wrote:Yep. The study cited in the OP pertains to a specific type of stats testing, called cluster analysis.
Maybe read up on it a bit before you cite it as valid, OP. You're talking about simple testing for type 1/2 statistical errors, so you would need a much larger sample size. I did a post a while ago doing rigid scientific statistical analysis on korean matchup percentages, and I think even a sample size of ~200 games rendered a ~7% difference statistically irrelevant.
http://en.wikipedia.org/wiki/Cluster_analysisEdit: It was a sample size of 130, and an ~8% statistical difference. This was rendered statistically insignificant using standard p-level analysis. Here's the link to my analysis.
http://www.teamliquid.net/forum/viewmessage.php?topic_id=317114¤tpage=12#226Heyoka wrote:Yes, 20 is so-called "statistically relevant"...when there are a set number of variables. Those rule-of-thumbs apply for very specific kinds of tests when you're sampling populations in very controlled ways. Looking at MMA's history of 26 games is nowhere near enough to be relevant because there are too many variables within a game, you need to control for opponent, map, style of play at the very least. If you had 26 games of MMA playing DRG on Shakuras with the same openers then yes maybe this would start to apply.
You can't just say you're testing for winrate or balance or something because you're abstracting it in multiple ways, you're several levels above what you're trying to look at.
lazyitachi wrote:Dear god....
Random sampling = How representative of sample of demography studied. Hence since you are not studying the demography's average win rate this is irrelevant. You are looking at the TOP PLAYERS WIN RATE ACCORDING TO ELO hence not representativeness of random sampling to demography.
Hypothesis testing:
Given the probability of DRG's ZVT is (2/3) with 36 sample.
Given the probability of MMA's TVZ is (52/69) with 69 sample.
Null hypothesis: MMA wr > DRG wr at 5% probability of Type I error
(p1 − p2) ± z * sqrt ((p1 q1)/n1 + (p2 q2)/n2 )
where n = no of games, p = win, q = loss, 1 = MMA, 2 = DRG, z = Standard score
therefore substituting in
( 75.36% - 66.67%) ± z * sqrt ( 75.36% * 24.64% / 69 + 66.67% * 33.33% / 36)
At 90% confidence level, the probability of making type I error is 20.8%
At 95% confidence level, the probability of making type I error is 24%
Hence your comparison is not statistically significant if you only tolerate 5% error
You can group the data for top tier Ts and Zs for comparison. I doubt you have enough data for any statistical significance at individual player level.
Credibility theory states the probability of each individual win rate being correct is
= 2 * z ( k * sqrt (n)) - 1
where z = Standard Score, k = probability Type I, n = number of games
We assume 10% Type I error i.e. k = 5% (divide by 2 because two-tailed test)
DRG:
= 2 * z( 5% * sqrt(36)) - 1
= 23.6%
MMA:
= 32.2%
This means that the probability of DRG and MMA's win rate being the expected win rate is only 24% and 32% assuming 10% Type I error i.e. NOT ENOUGH DATA.
Chytilova wrote:I don't think you realize how many variables go into these statistics that you pulled together. Statistics by themselves mean nothing. You need to get a handle on all the variables. That is why studies are done to control variables and isolate the ones you want to interrupt. I don't care how large your sample size is if you ignore most of the variables. We can literally determine nothing with these statistics. Nothing at all.
Danglars wrote:Find it hard to believe that a guy dealing with a sample size of 50 does not calculate confidence intervals to support his results. It's easy to wave your hands at sample sizes above 20, just as it's easy for the other guy to cite small micro mistakes trending towards deviations in that % over a large number of games. We want to be sure we have a grasp on how representative these games are of his true ZvT winrate before we start spouting the % difference between him and next highest (It *could* be as high as an X difference or as low as a Y difference.)
(See NesTea's 91 games compared to DRG's 50. Comparing MMA's 26 games against someone with 50. And we're talking across patches, metagame shifts .. the free advantages that one race gets as the others figure out what works against them, and vice versa on disadvantages. The more games, the less individual patches wave of effects and the sometimes-corresponding metagame shifts afterwards)
And I'm not disagreeing with the proposition that DRG is a VERY good player EVEN in his weakest matchup.
Fubi wrote:Way to ignore the MAIN point in that post; here let me spell it out for you:
Why are you saying that Statistics is very dependent on standard deviations when you didn't even include ANY standard deviation calculations in your analysis? Cool, it's more than 5% cuz you rounded it down, so how do you know that this number isn't within margin of error?