Top Tier Korean ZvT and TvZ TLPD statistics - Page 10

Fubi

2228 Posts

March 16 2012 10:04 GMT

#181

On March 16 2012 18:52 zefreak wrote:
Why is everyone ignoring the many posts critiquing OP's use of statistical data and instead attempting to find reasons that explain his results while assuming they are true?

OP's strategy is to ignore valid posts that proves him wrong, while randomly commenting on stuff that he can so that he can continue to believe he made a valid statistical point.

neoghaleon55

United States7435 Posts

March 16 2012 10:08 GMT

#182

On March 16 2012 18:56 aebriol wrote:
Just one question: for how long back does the sample size go?

It would be relevant to look maybe 2-3 months back, but ... patches etc, will really mess with the statistics. ZvT was damn near impossible for Z for a while here and there - not really the case right now.

Ok I need to address this question because people want an answer.
The statistics are indeed spanning the entire career of a pro-gamer.
In DRG's case it spans all the way back to 2011 GSTL S1.

A counterpoint to the whole "wow so it totally doesn't apply anymore" is that we are taking the CURRENT top 20 or so koreans by ELO. Obviously, if someone hasn't done well recently, their ELO will drop. You don't see fruitdealer or jinro being talked about in these group (even though they are top of the line in their time), because their ELO has fallen off and their data are no longer relevant. People with very high ELO tend to perform well in all matchups most recently. Yes, the treatment of these statistics isn't perfect, I agree that we need better data. But no better data is available at this point for the tippiest top of koreans (most of which don't even ladder or keep their ladder ID hidden), TLPD is well respected and it's there...might as well use it.

Also none of us have any idea how much the ghost patch will affect the matchup at the highest level of play.
You can speculate all you want, but there's no data to say anything.
We'll just have to wait and see.

IgnE

United States7681 Posts

March 16 2012 10:09 GMT

#183

On March 16 2012 18:54 ThomasHobbes wrote:

Show nested quote +

On March 16 2012 18:21 IgnE wrote:

On March 16 2012 17:20 Danglars wrote:

The optimal minimum sample size is 20. Above 20, the n value does not relevantly contribute over all (n-1) to the statistics.
I'm sure you remember from your AP stats class and college.
The statistics presented in the OP are greater than 20 sample sizes and thus are relevant.

Edit 2: Maybe presenting that article wasn't such a good idea as it only confuses people more.
Let me try to explain this in easier terms to understand.
So how about this...http://en.wikipedia.org/wiki/Standard_deviation
lol

Statistics is very dependent on standard deviations which accounts for your confidence interval.
Standard deviation (SI) uses an (N-1) factor, which contributes less and less as N gets larger.
at 20 or above, N-1 is seen as negligible in mathematics terms. I actually use 20 or greater in my research and published works as well... it's quite well known.

Find it hard to believe that a guy dealing with a sample size of 50 does not calculate confidence intervals to support his results. It's easy to wave your hands at sample sizes above 20, just as it's easy for the other guy to cite small micro mistakes trending towards deviations in that % over a large number of games. We want to be sure we have a grasp on how representative these games are of his true ZvT winrate before we start spouting the % difference between him and next highest (It *could* be as high as an X difference or as low as a Y difference.)

(See NesTea's 91 games compared to DRG's 50. Comparing MMA's 26 games against someone with 50. And we're talking across patches, metagame shifts .. the free advantages that one race gets as the others figure out what works against them, and vice versa on disadvantages. The more games, the less individual patches wave of effects and the sometimes-corresponding metagame shifts afterwards)

And I'm not disagreeing with the proposition that DRG is a VERY good player EVEN in his weakest matchup.

How is the OP going to say that his small sample sizes are fine, citing some basic statistics math, and then NOT calculate the confidence intervals for all of these samples, WHILE basing his sweeping conclusions on differences in percent win by as little as 67% compared to 64%. You are telling me that the confidence interval on a 20 game sample doesn't matter? All these bro-stats threads just fuel pointless discussions that go on for pages without actually arriving at any useful, empirically-backed conclusions.

Not to mention that most of these statistics include a majority of games from old patches where ghost play by the likes of mvp and others was not really figured out by zergs.

My conclusion is that there just aren't that many zergs in korea who are good enough to consistently beat the best terran players in korea to have a great win percentage in zvt, disregarding race. It's hard to have a good win percentage when you aren't even good enough to break into Code A. But there are a lot of Koreans who are really good that also happen to have been playing terran at the highest levels since the game's release.

Looking at actual Code A matches, this doesn't seem to be the case.

Terran have early game aggression, cheese, and allins which are significantly stronger than comparable zerg examples.

These games continue to win games at a Code A and Code S level (Two proxy-2 raxes as of game 3 of Code A tonight), and that's the major difference between the two races.

Zerg is fragile, Terran is safe, when Zerg makes mistakes they lose, when Terrans make mistakes they can recover.

If we look at the history of the game, Zerg have consistently been split between Code A regulars and Code S superstars. The race is unforgiving, most Zergs cannot compete on a regular basis with the more varied and safe Terran. The few who can play consistently without making any mistakes regularly win the titles because of their extraordinary personal skill, but even then, as we see, the best Zerg in the ZvT match-up is still behind the top 6 Terrans.

You do know that July won that proxy rax game right? Proxy raxes are actually pretty easy to stop unless you get complacent and aren't ready to react.

zefreak

United States2731 Posts

March 16 2012 10:12 GMT

#184

+ Show Spoiler +

ETisME wrote:
I am sorry but you got the stats wrong.
The over 20 requirement is for cluster analysis, something that you aren't doing because you are not trying to make any clusters out from the data set.
The over 20 thing you talked about is just for normal hypothesis testing, which you aren't doing.
You need to calculate out the optimal minimal sample size based upon your confidence interval etc

in short, you need to calculate out a sample size that truely represent the population. Merely 50 games out of his entire ZvT history does not make sense

HyperionDreamer wrote:
Yep. The study cited in the OP pertains to a specific type of stats testing, called cluster analysis.

Maybe read up on it a bit before you cite it as valid, OP. You're talking about simple testing for type 1/2 statistical errors, so you would need a much larger sample size. I did a post a while ago doing rigid scientific statistical analysis on korean matchup percentages, and I think even a sample size of ~200 games rendered a ~7% difference statistically irrelevant.

http://en.wikipedia.org/wiki/Cluster_analysis

Edit: It was a sample size of 130, and an ~8% statistical difference. This was rendered statistically insignificant using standard p-level analysis. Here's the link to my analysis.

http://www.teamliquid.net/forum/viewmessage.php?topic_id=317114&currentpage=12#226

Heyoka wrote:
Yes, 20 is so-called "statistically relevant"...when there are a set number of variables. Those rule-of-thumbs apply for very specific kinds of tests when you're sampling populations in very controlled ways. Looking at MMA's history of 26 games is nowhere near enough to be relevant because there are too many variables within a game, you need to control for opponent, map, style of play at the very least. If you had 26 games of MMA playing DRG on Shakuras with the same openers then yes maybe this would start to apply.

You can't just say you're testing for winrate or balance or something because you're abstracting it in multiple ways, you're several levels above what you're trying to look at.

lazyitachi wrote:
Dear god....
Random sampling = How representative of sample of demography studied. Hence since you are not studying the demography's average win rate this is irrelevant. You are looking at the TOP PLAYERS WIN RATE ACCORDING TO ELO hence not representativeness of random sampling to demography.

Hypothesis testing:
Given the probability of DRG's ZVT is (2/3) with 36 sample.
Given the probability of MMA's TVZ is (52/69) with 69 sample.

Null hypothesis: MMA wr > DRG wr at 5% probability of Type I error

(p1 − p2) ± z * sqrt ((p1 q1)/n1 + (p2 q2)/n2 )
where n = no of games, p = win, q = loss, 1 = MMA, 2 = DRG, z = Standard score

therefore substituting in
( 75.36% - 66.67%) ± z * sqrt ( 75.36% * 24.64% / 69 + 66.67% * 33.33% / 36)

At 90% confidence level, the probability of making type I error is 20.8%
At 95% confidence level, the probability of making type I error is 24%
Hence your comparison is not statistically significant if you only tolerate 5% error

You can group the data for top tier Ts and Zs for comparison. I doubt you have enough data for any statistical significance at individual player level.

Credibility theory states the probability of each individual win rate being correct is
= 2 * z ( k * sqrt (n)) - 1
where z = Standard Score, k = probability Type I, n = number of games
We assume 10% Type I error i.e. k = 5% (divide by 2 because two-tailed test)

DRG:
= 2 * z( 5% * sqrt(36)) - 1
= 23.6%

MMA:
= 32.2%

This means that the probability of DRG and MMA's win rate being the expected win rate is only 24% and 32% assuming 10% Type I error i.e. NOT ENOUGH DATA.

Chytilova wrote:
I don't think you realize how many variables go into these statistics that you pulled together. Statistics by themselves mean nothing. You need to get a handle on all the variables. That is why studies are done to control variables and isolate the ones you want to interrupt. I don't care how large your sample size is if you ignore most of the variables. We can literally determine nothing with these statistics. Nothing at all.

Danglars wrote:
Find it hard to believe that a guy dealing with a sample size of 50 does not calculate confidence intervals to support his results. It's easy to wave your hands at sample sizes above 20, just as it's easy for the other guy to cite small micro mistakes trending towards deviations in that % over a large number of games. We want to be sure we have a grasp on how representative these games are of his true ZvT winrate before we start spouting the % difference between him and next highest (It *could* be as high as an X difference or as low as a Y difference.)

(See NesTea's 91 games compared to DRG's 50. Comparing MMA's 26 games against someone with 50. And we're talking across patches, metagame shifts .. the free advantages that one race gets as the others figure out what works against them, and vice versa on disadvantages. The more games, the less individual patches wave of effects and the sometimes-corresponding metagame shifts afterwards)

And I'm not disagreeing with the proposition that DRG is a VERY good player EVEN in his weakest matchup.

Fubi wrote:
Way to ignore the MAIN point in that post; here let me spell it out for you:

Why are you saying that Statistics is very dependent on standard deviations when you didn't even include ANY standard deviation calculations in your analysis? Cool, it's more than 5% cuz you rounded it down, so how do you know that this number isn't within margin of error?

Neoghaleon, please respond to these critiques of your statistical methods, or at least post the above comments in your OP. The 'top comment highlights' are a bit of an echo chamber and people might actually buy the nonsense you are selling.

The way you are approaching this thread is highly disingenuous.

Fubi

2228 Posts

March 16 2012 10:13 GMT

#185

Hey OP, check out this math:

You flip a coin 20 times, then I flip the same coin 20 times. Since n = 20, by your argument, it makes my stats valid.

- I got 9 head, 11 tails: chance of head = 9/20 = 45%
- You got 11 head, 9 tails: chance of getting head = 11/20 = 55%
- There is a 10% difference
- Therefore, I proved that you are better at flipping head than me.

see the problem with the math here using exactly your method?

edit* you probably don't, so I should spell it out for you:

the difference between our flip is, by chance, two more of your coins landed head than mine, but because your sample size is so low (20), it lead to what seems to be a big difference (10%). But, until you do some calculation on the variance and confidence interval, you can't prove if the difference is whether it's due to simply chance, or due to your skills at flipping head.

TurboMaN

Germany925 Posts

March 16 2012 10:19 GMT

#186

Samplesize yeah.
Also I would say that MMA has the best multitasking, so his winrate should be higher than others (subjective opinion).

Jarree

Finland1004 Posts

March 16 2012 10:19 GMT

#187

Pseudomath to prove imbalance. Great highlights also on OP.

arbitrageur

Australia1202 Posts

March 16 2012 10:24 GMT

#188

What a horrible article (I'm a statistician)

VidyaYuropa

87 Posts

March 16 2012 10:31 GMT

#189

sounds like a silent balance whine to me...

On March 16 2012 19:04 Fubi wrote:
OP's strategy is to ignore valid posts that proves him wrong, while randomly commenting on stuff that he can so that he can continue to believe he made a valid statistical point.

It just seems like the thread got divided in 2, the ones discussing with the OP and the other ones discussing ACTUALLY the validity of the OP.

Big J

Austria16289 Posts

March 16 2012 10:32 GMT

#190

On March 16 2012 19:13 Fubi wrote:
Hey OP, check out this math:

You flip a coin 20 times, then I flip the same coin 20 times. Since n = 20, by your argument, it makes my stats valid.

- I got 9 head, 11 tails: chance of head = 9/20 = 45%
- You got 11 head, 9 tails: chance of getting head = 11/20 = 55%
- There is a 10% difference
- Therefore, I proved that you are better at flipping head than me.

see the problem with the math here using exactly your method?

edit* you probably don't, so I should spell it out for you:

the difference between our flip is, by chance, two more of your coins landed head than mine, but because your sample size is so low (20), it lead to what seems to be a big difference (10%). But, until you do some calculation on the variance and confidence interval, you can't prove if the difference is whether it's due to simply chance, or due to your skills at flipping head.

you are right, but the sample he took matches very well to the overall TvZ winrates in Korea:
http://imgur.com/a/1aAfu

but I do agree, that it is not useful too argue any kind of balance based on the OPs stats.

Lennon

United Kingdom2275 Posts

March 16 2012 10:39 GMT

#191

Yet another balance thread based on useless statistics.

ThomasHobbes

United States197 Posts

March 16 2012 10:40 GMT

#192

On March 16 2012 19:09 IgnE wrote:

Show nested quote +

On March 16 2012 18:54 ThomasHobbes wrote:

On March 16 2012 18:21 IgnE wrote:

On March 16 2012 17:20 Danglars wrote:

You do know that July won that proxy rax game right? Proxy raxes are actually pretty easy to stop unless you get complacent and aren't ready to react.

I'm watching them, so yes.

July reacted correctly, did not make a mistake, and Happy micro'd poorly and allowed his marines to be completely surrounded.

The issue isn't, though, any single allin, it's that Zerg face an inexhaustible supply of allins and early timings from Terran. It's quite easy to lose to any of these, especially if your overlord wasn't in a position to get a good scout / was sniped / is denied by marines in base.

Zerg fairs well in the late-game, and the mid-game, while difficult, is pretty balanced in context of what's to come (Zerg holding off repeated pushes / drops in order to secure a late-game advantage).

It's the prevalence, even at the highest level (would Happy have gone for a proxy 2-rax if he thought it couldn't work?) for cheese that seems to be effecting the ZvT winrates. It's far too easy to get behind as Zerg, whereas Terran is just a safer race.

neoghaleon55

United States7435 Posts

March 16 2012 10:43 GMT

#193

Ugh why do I feel like everytime I bring up stats in teamliquid, I need to teach a whole course of statistics to satisfy the whiners. This is the reason why I didn't want to spend time explaining earlier...but here goes.

Ok here's the breakdown:

The argument: The sample size is not large enough...

We have to understand why this is a problem in the first place.
This is related to the coin flip test...which is a comparison between True theoretical probability and actual probability.
Everyone knows that in a truly balanced coin (yes I know tails land more because head is heavier, but let's assume that the coin is fully balanced) the chances of heads or tails is 50/50.

However, if you flip the coin 8 times, you might get 5 heads and 3 tails, or even 7 heads and 1 tails. The reason this happens is because the actual outcome does not approach the theoretical outcome until very high number of samples are gained. This is related to the question at hand:

Are the number of games played by these top koreans high enough for their theoretical skill level to show?

I answer yes, 20 coinflips or greater tend to be the magical number in which the standard deviation improves significant enough for the gaussian distribution to be acceptable. Thus 20 games or greater is enough to probe how well a pro-gamer is skilled at a single matchup, as the chances of random deviation should decrease significantly when we attain 20 games or more. All these statistics presented in the opening post has more than 20 games. We are pretty safe to say that they matchup well with the player's capabilities.

So here are some pictures

Number of Tails
8 coinflips
[image loading]

16 coinflips
[image loading]

32 coinflips
[image loading]

As you can see, the gaussian distribution gets "slimmer" the more coinflips there are.
This sliming down of the curve can be numerically expressed by the standard deviation. The bigger the sample size, the slimmer the standard deviation, which means the closer the actual probability approaches the theoretical probability.

The calculation for standard deviation is dependent on the inverse of sample size N...
The greater the N samples, the less fluctuations one is likely to see (meaning the standard deviation is smaller...which is what we want)

So why 20?
Because anything greater than 20 is great, but the total impact of N, itself, to the statistics decreases significantly over 20.
Call it diminishing returns.

Those of you who are telling me to go calculate the confidence interval have no idea what you're talking about.

KAmaKAsa

Finland210 Posts

March 16 2012 10:44 GMT

#194

First off this is player vs player and having a good record in a tournament doesnt mean that x race is imba etc etc that player might have a very unusual style that works well against the current metagame or such ie. stephano and because of that have a good winrate.

Some of those wins might be all ins or just the other player playing worse and when it comes to MMA ofc that winrate is going to drop down drastically when he actually gets to play some zergs in the gsl.

The 60 %+ will stay around that for the very best players

EmilA

Denmark4618 Posts

March 16 2012 10:46 GMT

#195

When 60 damage siege tanks on steppes of war games are put in the same data as 35 damage siege tanks(+15 considerable nerfs and several considerable buffs to zerg) on maps 4x the size.

kAelle_sc

287 Posts

March 16 2012 10:46 GMT

#196

Wait till Stephano returns to Korea (he said he'll return around April) and turn this around with his 92% win rate when he goes live on the Korean ladder. Just wait haters. Wait. And see. And be convinced. And believe.

how

United States538 Posts

March 16 2012 10:49 GMT

#197

On March 16 2012 11:20 Liquid`Jinro wrote:
How the fuck is 64% not good?

Jinro ftw! ♥

taitanik

Latvia231 Posts

March 16 2012 10:49 GMT

#198

MVP dominated everyone and he has 67 % vs Z like DRG vs T only drg didint dominate for so long

Fubi

2228 Posts

March 16 2012 10:52 GMT

#199

On March 16 2012 19:43 neoghaleon55 wrote:
Ugh why do I feel like everytime I bring up stats in teamliquid, I need to teach a whole course of statistics to satisfy the whiners. This is the reason why I didn't want to spend time explaining earlier...but here goes.

Ok here's the breakdown:

The argument: The sample size is not large enough...

We have to understand why this is a problem in the first place.
This is related to the coin flip test...which is a comparison between True theoretical probability and actual probability.
Everyone knows that in a truly balanced coin (yes I know tails land more because head is heavier, but let's assume that the coin is fully balanced) the chances of heads or tails is 50/50.

However, if you flip the coin 8 times, you might get 5 heads and 3 tails, or even 7 heads and 1 tails. The reason this happens is because the actual outcome does not approach the theoretical outcome until very high number of samples are gained. This is related to the question at hand:

Are the number of games played by these top koreans high enough for their theoretical skill level to show?

I answer yes, 20 coinflips or greater tend to be the magical number in which the standard deviation improves significant enough for the gaussian distribution to be acceptable.

So here are some pictures

8 coinflips
[image loading]

16 coinflips
[image loading]

32 coinflips
[image loading]

As you can see, the gaussian distribution gets "slimmer" the more coinflips there are.
This sliming down of the curve can be numerically expressed by the standard deviation. The bigger the sample size, the slimmer the standard deviation, which means the closer the actual probability approaches the theoretical probability.

The calculation for standard deviation is dependent on the inverse of sample size N...
The greater the N samples, the less fluctuations one is likely to see (meaning the standard deviation is smaller...which is what we want)

Those of you who are telling me to go calculate the confidence interval have no idea what you're talking about.

Ok, you need go to back and read the book on the section about confidence intervals and variance.

I'm NOT saying your sample size is NOT enough.

I'm saying this: REGARDLESS of how BIG your sample size is, even if you measure 90% of the total games played in the entire SC2 history, and you find that there is a difference (say even as big as 20%). You STILL have to prove, with math/stats, that this difference isn't simply due to random chance, because no matter how many samples you take, there will STILL be a chance that the difference is purely due to randomness.

You can NOT, simply say "I FEEL like x% is large enough to show that there is a difference".

I'm not trying to be a dick or anything, but you're making yourself look REALLY bad to the people that actually understands your stats. Thank god they don't let people work in any serious jobs simply from one year of college education.

neoghaleon55

United States7435 Posts

March 16 2012 11:01 GMT

#200

On March 16 2012 19:52 Fubi wrote:
You STILL have to prove, with math/stats, that this difference isn't simply due to random chance, because no matter how many samples you take, there will STILL be a chance that the difference is purely due to randomness.

You can NOT, simply say "I FEEL like x% is large enough to show that there is a difference".

I'm not trying to be a dick or anything, but you're making yourself look REALLY bad to the people that actually understands your stats. Thank god they don't let people work in any serious jobs simply from one year of college education.

I'm not trying to be a dick or anything

You're going to have to try harder than that.

No, I don't need to prove about "difference due to random chance"

Are there more things contributing to a win/loss rather than balance and skills?
absolutely!
There's also luck, but it's not easy to quantify luck.
And if you're so caught up with confidence, why don't you do it?

Prev 1 8 9 10 11 12 18 Next All

Please or register to reply.

Top Tier Korean ZvT and TvZ TLPD statistics - Page 10

Completed

Ongoing

Upcoming