|
The bottom line here is that, despite what Day[9] said in his video analysis of Bisu's recent ZvP loss to Shine, the recent higher rate of Zerg wins in the matchup is not a coincidence. It cannot be attributed to random factors. What this means is that there's something going on that's changing the "natural" ZvP win rate.
I don't have the kind of expert qualifications necessary to undertake match analysis of recent games to try to figure out what the problem is, but I would point out one thing: I don't think it's the maps.
Consider: if we were to expect that particular maps are making a difference, we would probably see that the newer maps are where the zergs are piling up wins, and that the older maps have lower Zerg win rates. That's just not what we observe. Destination is a relatively old map - it dates back prior to the surge in Zerg wins. We would expect a lowish Zerg win rate - instead, we see Zerg winning at a 60% clip. Heartbreak Ridge is a new map, we should expect a high zerg win rate - instead we see only 58%, lower than the mean. God's Garden is a new map - only a 56% win rate. Outsider is 60% for Zerg, but it's the exception, not the rule. We can see the same thing happening in maps with new versions. Medusa has a higher Zerg win rate than Neo Medusa over the period we're studying. Byzantium 2 has a higher Zerg win rate than Byzantium 3. It seems to me that you could make a strong case that the win rates on the new maps aren't all that much different from the win rates on the old maps.
|
On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD you didnt bold newb!
btw i took stats like a year ago and i'm terrible at math but i'm pretty sure you're supposed to have a control group in order to prove anything and that's pretty much impossible because it's impossible to control / keep external factors constant so all this math seems kind of useless.
|
On October 21 2009 05:46 Matrijs wrote: The bottom line here is that, despite what Day[9] said in his video analysis of Bisu's recent ZvP loss to Shine, the recent higher rate of Zerg wins in the matchup is not a coincidence. It cannot be attributed to random factors. What this means is that there's something going on that's changing the "natural" ZvP win rate.
I don't have the kind of expert qualifications necessary to undertake match analysis of recent games to try to figure out what the problem is, but I would point out one thing: I don't think it's the maps.
Consider: if we were to expect that particular maps are making a difference, we would probably see that the newer maps are where the zergs are piling up wins, and that the older maps have lower Zerg win rates. That's just not what we observe. Destination is a relatively old map - it dates back prior to the surge in Zerg wins. We would expect a lowish Zerg win rate - instead, we see Zerg winning at a 60% clip. Heartbreak Ridge is a new map, we should expect a high zerg win rate - instead we see only 58%, lower than the mean. God's Garden is a new map - only a 56% win rate. Outsider is 60% for Zerg, but it's the exception, not the rule. We can see the same thing happening in maps with new versions. Medusa has a higher Zerg win rate than Neo Medusa over the period we're studying. Byzantium 2 has a higher Zerg win rate than Byzantium 3. It seems to me that you could make a strong case that the win rates on the new maps aren't all that much different from the win rates on the old maps.
u forget one thing: that it might take time until a race figures out how to abuse the maps to win. maybe the maps had what it needs to be imba in zvp, but it was so subtle that the zergs needed several months to figure it out. for example hbr: first it was good for protoss. then came lurker contain and it was relatively balanced. then came the abuse of the excess gas for muta snipes which make mass hydra roll any protoss army. hbr turned into a protoss graveyard. the map hasnt changed, it had the potential to be a protoss graveyard since the very beginning, zergs just didnt know.
on the other hand, maybe the maps do allow for a good protoss counter to 5hatch hydra with muta snipe, and the tosses just dont know yet.
|
Holy cow so much talk and math to prove something as obvious as Z>P? Not really needed imo xd
|
motbob
United States12546 Posts
On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD Take it up with the programmers of Excel, not me. As you can see below, I'm asking Excel to give me the standard deviation of the dataset, and it's giving me ~0.49
|
On October 21 2009 06:12 Black Gun wrote:Show nested quote +On October 21 2009 05:46 Matrijs wrote: The bottom line here is that, despite what Day[9] said in his video analysis of Bisu's recent ZvP loss to Shine, the recent higher rate of Zerg wins in the matchup is not a coincidence. It cannot be attributed to random factors. What this means is that there's something going on that's changing the "natural" ZvP win rate.
I don't have the kind of expert qualifications necessary to undertake match analysis of recent games to try to figure out what the problem is, but I would point out one thing: I don't think it's the maps.
Consider: if we were to expect that particular maps are making a difference, we would probably see that the newer maps are where the zergs are piling up wins, and that the older maps have lower Zerg win rates. That's just not what we observe. Destination is a relatively old map - it dates back prior to the surge in Zerg wins. We would expect a lowish Zerg win rate - instead, we see Zerg winning at a 60% clip. Heartbreak Ridge is a new map, we should expect a high zerg win rate - instead we see only 58%, lower than the mean. God's Garden is a new map - only a 56% win rate. Outsider is 60% for Zerg, but it's the exception, not the rule. We can see the same thing happening in maps with new versions. Medusa has a higher Zerg win rate than Neo Medusa over the period we're studying. Byzantium 2 has a higher Zerg win rate than Byzantium 3. It seems to me that you could make a strong case that the win rates on the new maps aren't all that much different from the win rates on the old maps. u forget one thing: that it might take time until a race figures out how to abuse the maps to win. maybe the maps had what it needs to be imba in zvp, but it was so subtle that the zergs needed several months to figure it out. for example hbr: first it was good for protoss. then came lurker contain and it was relatively balanced. then came the abuse of the excess gas for muta snipes which make mass hydra roll any protoss army. hbr turned into a protoss graveyard. the map hasnt changed, it had the potential to be a protoss graveyard since the very beginning, zergs just didnt know. on the other hand, maybe the maps do allow for a good protoss counter to 5hatch hydra with muta snipe, and the tosses just dont know yet.
My argument still holds. If the current higher rate is attributable to maps, it has to be the result of new maps influencing the overall win rate, which just doesn't seem to be happening. The win rate has increased on old maps, too, which implicates some other factor.
|
On October 21 2009 06:13 motbob wrote:Show nested quote +On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD Take it up with the programmers of Excel, not me. As you can see below, I'm asking Excel to give me the standard deviation of the dataset, and it's giving me ~0.49
use a small dataset: 10 data points, 8 times a "1", 2 times a "0". the percentage is 0.8. look what excel tells u about the sd.
|
motbob
United States12546 Posts
On October 21 2009 06:09 mahnini wrote:Show nested quote +On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD you didnt bold newb! btw i took stats like a year ago and i'm terrible at math but i'm pretty sure you're supposed to have a control group in order to prove anything and that's pretty much impossible because it's impossible to control / keep external factors constant so all this math seems kind of useless. Nah, you're thinking of controlled experiments. This is just data analysis.
|
motbob
United States12546 Posts
On October 21 2009 06:14 Black Gun wrote:Show nested quote +On October 21 2009 06:13 motbob wrote:On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD Take it up with the programmers of Excel, not me. As you can see below, I'm asking Excel to give me the standard deviation of the dataset, and it's giving me ~0.49 use a small dataset: 10 data points, 8 times a "1", 2 times a "0". the percentage is 0.8. look what excel tells u about the sd. It gives an SD of 0.421637021... but I don't see why that's relevant.
|
On October 21 2009 06:14 motbob wrote:Show nested quote +On October 21 2009 06:09 mahnini wrote:On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD you didnt bold newb! btw i took stats like a year ago and i'm terrible at math but i'm pretty sure you're supposed to have a control group in order to prove anything and that's pretty much impossible because it's impossible to control / keep external factors constant so all this math seems kind of useless. Nah, you're thinking of controlled experiments. This is just data analysis. yeah but all of your data doesnt really prove a point because it's not taking into account factors other than win/lose so there's really no point.
|
motbob
United States12546 Posts
On October 21 2009 06:18 mahnini wrote:Show nested quote +On October 21 2009 06:14 motbob wrote:On October 21 2009 06:09 mahnini wrote:On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD you didnt bold newb! btw i took stats like a year ago and i'm terrible at math but i'm pretty sure you're supposed to have a control group in order to prove anything and that's pretty much impossible because it's impossible to control / keep external factors constant so all this math seems kind of useless. Nah, you're thinking of controlled experiments. This is just data analysis. yeah but all of your data doesnt really prove a point because it's not taking into account factors other than win/lose so there's really no point. I'm just trying to show that it's not a coincidence that zergs have been winning. It's not random chance. There's an "external factor," as you put it.
|
On October 21 2009 06:21 motbob wrote:Show nested quote +On October 21 2009 06:18 mahnini wrote:On October 21 2009 06:14 motbob wrote:On October 21 2009 06:09 mahnini wrote:On October 21 2009 05:46 heyoka wrote: YOUR STANDARD DEVIATION CANNOT BE 49%
THAT IS NOT YOUR VARIATION
YOU ARE LOOKING AT ONE DATA POINT
THERE IS NO OTHER WAY I CAN SAY THIS SIMPLY, UNLESS I DO IT IN CAPS PLUS BOLD you didnt bold newb! btw i took stats like a year ago and i'm terrible at math but i'm pretty sure you're supposed to have a control group in order to prove anything and that's pretty much impossible because it's impossible to control / keep external factors constant so all this math seems kind of useless. Nah, you're thinking of controlled experiments. This is just data analysis. yeah but all of your data doesnt really prove a point because it's not taking into account factors other than win/lose so there's really no point. I'm just trying to show that it's not a coincidence that zergs have been winning. It's not random chance. There's an "external factor," as you put it. a large part of that is probably maps so why is everyone flopping their math-dicks around.
also the largest external factor SKILL LOL SO EZ GTFO PROTOSS NEWBS
|
When protoss was doing good against zerg i had a good win reate against toss, probably my best match up easy.
Now zerg is "dominating" yet im terrible in that matchup now even with the newer builds, ffs i made c with a 29% win rate against toss FML
|
Okay guys, the calculation that motbob did and the one Black Gun did are the same, and correct.
59% over 885 games vs an expected 50% (or an expected 55%) is statistically significant. That should be common sense. Think about flipping a coin 885 times and getting heads almost 6/10 times.
motbob, you just did a bad job in explaining/justifying your process. .49 is the standard deviation of ONE zvp game; .49/sqrt(885) is the standard deviation of 885 zvp games. That is what jwd/heyoka were trying to say.
Also, pointing to a big column of excel data for your evidence is somewhat unnecessary, and kind of undermined your credibility as someone with a good grasp of stat. The standard deviation of a single bernoulli event is sqrt(p(1-p)), which is the same as what that excel calculation was doing.
|
motbob
United States12546 Posts
On October 21 2009 06:58 Gustav_Wind wrote: The standard deviation for a single event that has 55% probability is in fact 0.49. That is obtained by the simple calculation sqrt(p(1-p)). To get the standard deviation that we want to use in calculating z-score, divide that value by the square root of the sample size. so .49/sqrt(885). It's more correct to call this the standard error...
http://en.wikipedia.org/wiki/Standard_error_(statistics)
http://en.wikipedia.org/wiki/Standard_deviation
There's a subtle but important difference. Calling both of these things the standard deviation would be really confusing. So most statisticians call the SD of the sampling distribution (which is SD_pop/sqrt(n)) the "standard error" in order to reduce that confusion.
Also, pointing to a big column of excel data for your evidence is somewhat unnecessary, and kind of undermined your credibility as someone with a good grasp of stat. The standard deviation of a single bernoulli event is sqrt(p(1-p)), which is the same as what that excel calculation was doing. *shrug* I'm used to working w/ excel spreadsheets w/ data sets that aren't just filled with binary data. So it's second nature for me to just draw up a data set and use the Excel command. It only took 30 seconds to create the data set of 1's and 0's. Note that I didn't just use Excel to get the stdev... Excel also makes it really easy to take all the variables and do the z-test itself.
|
On October 21 2009 07:07 motbob wrote:Show nested quote +On October 21 2009 06:58 Gustav_Wind wrote: The standard deviation for a single event that has 55% probability is in fact 0.49. That is obtained by the simple calculation sqrt(p(1-p)). To get the standard deviation that we want to use in calculating z-score, divide that value by the square root of the sample size. so .49/sqrt(885). It's more correct to call this the standard error... http://en.wikipedia.org/wiki/Standard_error_(statistics)http://en.wikipedia.org/wiki/Standard_deviationThere's a subtle but important difference. Calling both of these things the standard deviation would be really confusing. So most statisticians call the SD of the sampling distribution (which is SD_pop/sqrt(n)) the "standard error" in order to reduce that confusion. Show nested quote +Also, pointing to a big column of excel data for your evidence is somewhat unnecessary, and kind of undermined your credibility as someone with a good grasp of stat. The standard deviation of a single bernoulli event is sqrt(p(1-p)), which is the same as what that excel calculation was doing. *shrug* I'm used to working w/ excel spreadsheets w/ data sets that aren't just filled with binary data. So it's second nature for me to just draw up a data set and use the Excel command. It only took 30 seconds to create the data set of 1's and 0's. Note that I didn't just use Excel to get the stdev... Excel also makes it really easy to take all the variables and do the z-test itself.
As I understand it, standard error is used as an estimate of the standard deviation of the true population, right?
But we are assuming that p = .55 in our null hypothesis test, aren't we? So isn't it fine to use the term standard deviation since we can derive that from our assumption?
And .49 and .49/sqrt(885) are both standard deviations. .49 is the standard deviation of the variable (one zvp game), whereas .49/sqrt(885) is the standard deviation of the variable (number of zvp wins in 885 games/885), or in other words, the ratio of zvp wins in 885 games.
|
On October 21 2009 06:58 Gustav_Wind wrote: Okay guys, the calculation that motbob did and the one Black Gun did are the same, and correct.
59% over 885 games vs an expected 50% (or an expected 55%) is statistically significant. That should be common sense. Think about flipping a coin 885 times and getting heads almost 6/10 times.
motbob, you just did a bad job in explaining/justifying your process. .49 is the standard deviation of ONE zvp game; .49/sqrt(885) is the standard deviation of 885 zvp games. That is what jwd/heyoka were trying to say.
Also, pointing to a big column of excel data for your evidence is somewhat unnecessary, and kind of undermined your credibility as someone with a good grasp of stat. The standard deviation of a single bernoulli event is sqrt(p(1-p)), which is the same as what that excel calculation was doing.
the figures are very close to each other and our tests came to the same conclusion, but still they were not the same. in particular, the correct test in our case here does NOT require standard errors, ie does not involve estimated standard deviations. the base distribution is bernoulli/binomial/scaled binomial, whatever, but it is not normal. in the distributions we are using here, the parameter of interest (the success probability) also determines the sd of the null-distribution, therefore it does not have to be estimated in order to compute our test statistic. we do not need standard errors here. 
(when the distribution of the data itself is normal, the sd is a nuissance parameter which is independent from the parameter of interest. in particular, this means that a null-hypothesis about the mean, the parameter of interest, does not give info about the sd, so if the sd of the null-distribution is not known beforehand we must plug in the standard error, ie the estimated sd. this increases the uncertainty and this increased uncertainty must be addressed by using the t- instead of the normal-distribution.)
but lets finish the stat discussions and continue with whining about how hard pvz is. *gg*
|
motbob
United States12546 Posts
On October 21 2009 07:55 Gustav_Wind wrote:Show nested quote +On October 21 2009 07:07 motbob wrote:On October 21 2009 06:58 Gustav_Wind wrote: The standard deviation for a single event that has 55% probability is in fact 0.49. That is obtained by the simple calculation sqrt(p(1-p)). To get the standard deviation that we want to use in calculating z-score, divide that value by the square root of the sample size. so .49/sqrt(885). It's more correct to call this the standard error... http://en.wikipedia.org/wiki/Standard_error_(statistics)http://en.wikipedia.org/wiki/Standard_deviationThere's a subtle but important difference. Calling both of these things the standard deviation would be really confusing. So most statisticians call the SD of the sampling distribution (which is SD_pop/sqrt(n)) the "standard error" in order to reduce that confusion. Also, pointing to a big column of excel data for your evidence is somewhat unnecessary, and kind of undermined your credibility as someone with a good grasp of stat. The standard deviation of a single bernoulli event is sqrt(p(1-p)), which is the same as what that excel calculation was doing. *shrug* I'm used to working w/ excel spreadsheets w/ data sets that aren't just filled with binary data. So it's second nature for me to just draw up a data set and use the Excel command. It only took 30 seconds to create the data set of 1's and 0's. Note that I didn't just use Excel to get the stdev... Excel also makes it really easy to take all the variables and do the z-test itself. As I understand it, standard error is used as an estimate of the standard deviation of the true population, right? But we are assuming that p = .55 in our null hypothesis test, aren't we? So isn't it fine to use the term standard deviation since we can derive that from our assumption? And .49 and .49/sqrt(885) are both standard deviations. .49 is the standard deviation of the variable (one zvp game), whereas .49/sqrt(885) is the standard deviation of the variable (number of zvp wins in 885 games/885), or in other words, the ratio of zvp wins in 885 games. For your second point, yes, they're both standard deviations. But it's less confusing if we call the SD/sqrt(n) figure the standard error.
I'm less sure about your first point. I was always taught to use bootstrapping from the existing data to get the SD, not to get the SD from the null hypothesis. I'll try to figure out which method is correct when I'm free in 2 hours.
|
One thing i dont understand is why this is such a huge deal. It seems every race *at some point* goes through this. I think a large part of certain race dominance is the players. Ok, sure maps will definately tilt the favor even more. But saying the race is better, by itself, is not correct. All yer math stuff hurts my simple brain.
|
I think this thread needs its own FAQ by now.
|
|
|
|