|
On May 02 2011 23:59 Silent331 wrote: When i saw the first graph i though "good job blizzard, you may be on track" then I saw the korea graph and tried to see why it was that way and this is what I came up with
Koreans protoss generally stick to their guns in the sense that they will continue to do a BO untill it starts losing, and once it starts losing they will start doing a BO that counters the counter which explains the constant back and fourth of the protoss charts, as you can see after the drop in protoss wins they come back to have >50% win rate very soon after. This is simply development in the game.
Now in both charts Z has never had a good ratio vs T. Oddly tho the 2 charts are inverses of each other, when then international zergs do worse the Korean zergs do better. Personally I believe that the zerg generally has a low win rate vs T because T has alot of things that they can do which are not even close to all ins but can result in insta loss for the zerg. Also terran is simply much more forgiving endgame as opposed to zerg, if zerg loses 1 engagement late game they just lose the game.
my thoughts, take what you wish
Really Terran vs Zerg lategame is so much easier for zerg than it is for terran. infestor, broodlord + lings or any other unit really with a tech switch into ultralisks is pretty much unstoppable for a terran lategame.
Against broodlords you lose your tanks, against infestors you lose your vikings, if you add ghosts to your army to emp the infestors you will then die to the ultralisk tech switch.
I am yet to see a game where a terran can beat this late game strat of zerg and right now it really seems impossible.
Saying something like T>Z lategame is simply wrong, it might be true early game, but late game it's the exact opposite.
|
On May 03 2011 23:53 Ctuchik wrote:Show nested quote +On May 03 2011 21:18 Primadog wrote:On May 03 2011 20:59 Ctuchik wrote:On May 03 2011 19:41 Elean wrote:On May 03 2011 18:27 Primadog wrote: Any chance of outputting graphs with confidence intervals? I am concerned that some of the "trends" we see in the graph is simply random fluctuations due to some months having smaller sets of data. Assuming all the match are independant event, the standard deviation for a given matchup is: sqrt(p(1-p)/N) where p≈0.5 is the win rate of a race, and N the number of played games. Let's take the PvZ match up as example. There are 2244 games over 6 months, 374 games/months gives a standard deviation of 2.5% Considering an error of twice the standard deviation, your confidence interval is +/- 5%. Conclusion: the fluctuations we observe for the PvZ match up can very well be due to the sample size. For ZvT however the sample size is large enough to say the matchup was unbalanced. I would love to do his, trying to figure out how now. =P Ideally you can make a graph something like the second graph on a stock screener: ![[image loading]](http://imgur.com/qLi1l.gif) 95% confidence-interval (2 standard deviation) is the standard that most uses. So the range you will use is the mean+/-2*standard deviation (μ +/- 2σ), where σ=(P * (1-P)/n)^.5. Note that n will be the number of data you have per month, not the total datapoints overall. Well, reading up a bit on it it doesn't seem like this would be all that useful. Please correct me if I'm wrong. *The data behind this is not a sample of a bigger set. It's all matches played during this time period, and I'm not measuring it against a larger population. *The actual data behind it is binary, ie, a match is either a 1 or a 0, win or loss. *The sample size for each month is over 1000 games. See ![[image loading]](http://imgur.com/k49GV.png) . Please correct me if I'm wrong here! 1.) The data is a sample of a bigger set. You have collected data on all games played, but not all possible games played -- all games played is the sample distribution, and all possible games would be the population. 2.) The data comes from a dichotomous variable, but what you're plotting is ultimately a proportion of the outcomes of that variable, not the variable itself. 3.) It doesn't matter if the sample size is 10,000 if the standard error is something like 1700.
I'm not sure if it would be more appropriate to have error bars of standard error (stdev/SQRT(n)) or standard deviation, since the error is what we are really worried about here (I'm not talking about mistakes, but statistical error).
Edit: nevermind, I misread.
|
On May 04 2011 00:19 Pelican wrote:Show nested quote +On May 03 2011 23:53 Ctuchik wrote:On May 03 2011 21:18 Primadog wrote:On May 03 2011 20:59 Ctuchik wrote:On May 03 2011 19:41 Elean wrote:On May 03 2011 18:27 Primadog wrote: Any chance of outputting graphs with confidence intervals? I am concerned that some of the "trends" we see in the graph is simply random fluctuations due to some months having smaller sets of data. Assuming all the match are independant event, the standard deviation for a given matchup is: sqrt(p(1-p)/N) where p≈0.5 is the win rate of a race, and N the number of played games. Let's take the PvZ match up as example. There are 2244 games over 6 months, 374 games/months gives a standard deviation of 2.5% Considering an error of twice the standard deviation, your confidence interval is +/- 5%. Conclusion: the fluctuations we observe for the PvZ match up can very well be due to the sample size. For ZvT however the sample size is large enough to say the matchup was unbalanced. I would love to do his, trying to figure out how now. =P Ideally you can make a graph something like the second graph on a stock screener: ![[image loading]](http://imgur.com/qLi1l.gif) 95% confidence-interval (2 standard deviation) is the standard that most uses. So the range you will use is the mean+/-2*standard deviation (μ +/- 2σ), where σ=(P * (1-P)/n)^.5. Note that n will be the number of data you have per month, not the total datapoints overall. Well, reading up a bit on it it doesn't seem like this would be all that useful. Please correct me if I'm wrong. *The data behind this is not a sample of a bigger set. It's all matches played during this time period, and I'm not measuring it against a larger population. *The actual data behind it is binary, ie, a match is either a 1 or a 0, win or loss. *The sample size for each month is over 1000 games. See ![[image loading]](http://imgur.com/k49GV.png) . Please correct me if I'm wrong here! 1.) The data is a sample of a bigger set. You have collected data on all games played, but not all possible games played -- all games played is the sample distribution, and all possible games would be the population. 2.) The data comes from a dichotomous variable, but what you're plotting is ultimately a proportion of the outcomes of that variable, not the variable itself. 3.) It doesn't matter if the sample size is 10,000 if the standard error is something like 1700. Edit: I'm not sure if it would be more appropriate to have error bars of standard error (stdev/SQRT(n)) or standard deviation, since the error is what we are really worried about here (I'm not talking about mistakes, but statistical error).
Could you give me an example of this (for one data point for example)? What numbers would you need? How would the expression look?
|
On May 03 2011 23:53 Ctuchik wrote:Show nested quote +On May 03 2011 21:18 Primadog wrote:On May 03 2011 20:59 Ctuchik wrote:On May 03 2011 19:41 Elean wrote:On May 03 2011 18:27 Primadog wrote: Any chance of outputting graphs with confidence intervals? I am concerned that some of the "trends" we see in the graph is simply random fluctuations due to some months having smaller sets of data. Assuming all the match are independant event, the standard deviation for a given matchup is: sqrt(p(1-p)/N) where p≈0.5 is the win rate of a race, and N the number of played games. Let's take the PvZ match up as example. There are 2244 games over 6 months, 374 games/months gives a standard deviation of 2.5% Considering an error of twice the standard deviation, your confidence interval is +/- 5%. Conclusion: the fluctuations we observe for the PvZ match up can very well be due to the sample size. For ZvT however the sample size is large enough to say the matchup was unbalanced. I would love to do his, trying to figure out how now. =P Ideally you can make a graph something like the second graph on a stock screener: ![[image loading]](http://imgur.com/qLi1l.gif) 95% confidence-interval (2 standard deviation) is the standard that most uses. So the range you will use is the mean+/-2*standard deviation (μ +/- 2σ), where σ=(P * (1-P)/n)^.5. Note that n will be the number of data you have per month, not the total datapoints overall. Well, reading up a bit on it it doesn't seem like this would be all that useful. Please correct me if I'm wrong. *The data behind this is not a sample of a bigger set. It's all matches played during this time period, and I'm not measuring it against a larger population. *The actual data behind it is binary, ie, a match is either a 1 or a 0, win or loss. *The sample size for each month is over 1000 games. See ![[image loading]](http://imgur.com/k49GV.png) . Please correct me if I'm wrong here! You are in the right here.
This is the same as taking all the win/loss records of your favorite sporting team over the last decade and plotting it, or plotting the performance of your companies stock over the last year. In that case, why would you ever use something like confidence intervals?
You aren't making predictions, there is no need for any further analysis.
1.) The data is a sample of a bigger set. You have collected data on all games played, but not all possible games played -- all games played is the sample distribution, and all possible games would be the population. 2.) The data comes from a dichotomous variable, but what you're plotting is ultimately a proportion of the outcomes of that variable, not the variable itself. 3.) It doesn't matter if the sample size is 10,000 if the standard error is something like 1700.
The data used is deemed worthy of being collected by TPLD. He is just graphing the results available there, there is no need for confidence. The data is not dichotomous given that he is using ALL data from the database he said he was using and not just what he himself deems worthy of being used.
He says he uses TPLD as his only source. If you have problems with the data he us using, then your qualms are with him using only TPLD. If he was only using specific data from TPLD then there would be some room to argue, but given he is showing no bias and graphing everything available there it doesn't need any further analysis. It is just pure recorded historical data.
All he has done is taken TPLD and graphed it. There is nothing else needed to be added.
|
After getting flamed every game by Zerg i just can say "LOL" by watching these graphs. Btw i said the same statistics in the SotG Thread but some Zergs there wanted to say these statistics mean nothing.
|
On May 04 2011 00:35 Ctuchik wrote:Show nested quote +On May 04 2011 00:19 Pelican wrote:On May 03 2011 23:53 Ctuchik wrote:On May 03 2011 21:18 Primadog wrote:On May 03 2011 20:59 Ctuchik wrote:On May 03 2011 19:41 Elean wrote:On May 03 2011 18:27 Primadog wrote: Any chance of outputting graphs with confidence intervals? I am concerned that some of the "trends" we see in the graph is simply random fluctuations due to some months having smaller sets of data. Assuming all the match are independant event, the standard deviation for a given matchup is: sqrt(p(1-p)/N) where p≈0.5 is the win rate of a race, and N the number of played games. Let's take the PvZ match up as example. There are 2244 games over 6 months, 374 games/months gives a standard deviation of 2.5% Considering an error of twice the standard deviation, your confidence interval is +/- 5%. Conclusion: the fluctuations we observe for the PvZ match up can very well be due to the sample size. For ZvT however the sample size is large enough to say the matchup was unbalanced. I would love to do his, trying to figure out how now. =P Ideally you can make a graph something like the second graph on a stock screener: ![[image loading]](http://imgur.com/qLi1l.gif) 95% confidence-interval (2 standard deviation) is the standard that most uses. So the range you will use is the mean+/-2*standard deviation (μ +/- 2σ), where σ=(P * (1-P)/n)^.5. Note that n will be the number of data you have per month, not the total datapoints overall. Well, reading up a bit on it it doesn't seem like this would be all that useful. Please correct me if I'm wrong. *The data behind this is not a sample of a bigger set. It's all matches played during this time period, and I'm not measuring it against a larger population. *The actual data behind it is binary, ie, a match is either a 1 or a 0, win or loss. *The sample size for each month is over 1000 games. See ![[image loading]](http://imgur.com/k49GV.png) . Please correct me if I'm wrong here! 1.) The data is a sample of a bigger set. You have collected data on all games played, but not all possible games played -- all games played is the sample distribution, and all possible games would be the population. 2.) The data comes from a dichotomous variable, but what you're plotting is ultimately a proportion of the outcomes of that variable, not the variable itself. 3.) It doesn't matter if the sample size is 10,000 if the standard error is something like 1700. Edit: I'm not sure if it would be more appropriate to have error bars of standard error (stdev/SQRT(n)) or standard deviation, since the error is what we are really worried about here (I'm not talking about mistakes, but statistical error). Could you give me an example of this (for one data point for example)? What numbers would you need? How would the expression look? I actually completely misread the graph. You're right -- error bars aren't really appropriate for this kind of data. If anyone is skeptical about the significance of the differences at each point, its possible to do a significance test for each month, but there isn't really a way to get error bars in the graph.
|
I feel like Zerg players whine too much. Look at the results in Korea. Protoss players should be crying ... =(
|
On May 03 2011 01:25 Jayrod wrote:Show nested quote +On May 03 2011 00:30 garlicface wrote:On May 03 2011 00:26 fant0m wrote: The issue with Korean Toss is that they got complacent. So many lose games early due to over-teching or over-expanding. The balance is too close for you to do that when your opponent is about to hit with a timing attack and still win.
That skews the results so much because there are so few Korean Toss. So 2 of them do bad for 4 games total (San, MC)... and the whole representation gets thrown off massively.
Toss just isn't strong enough any more to ignore what the other race is doing and still win. I don't even understand why they would get complacent in the first place. PvT was never that strong for them, was it? Sure, it looked like they were dominating everything with MC leading the charge, but I can't believe how they all just seem so unprepared now. Almost all of the Ps in GSL have been knocked out in PvT. I just can't wrap my head around it. It's simple really. Terrans went back to what was working for them a few months ago against protoss, winning by unit efficiency and eliminating expansions. I think the HT nerf is often understated with concern to the matchup and sometimes its hard to notice the effect that it has. What we are seeing now is protoss defending drops by warping in chargelots and stuff, which is great... it gets the units off the ground, BUT it doesn't get the medivac out of the air so they can keep you pinned all game. If you leave the units there to defend further drops, your mid-size army at the front is too weak to take on a mid-size bio force. I'm not saying its imba or anything, I'm just saying the HT nerf had a bigger impact than people think since they aren't seen all that often. The impact is in mid-late game drop defense. Protoss has the worst drop defense of all the races not because the units can't defend the drop, but because of the impact it has on your main army to break off chunks of it. Cannons are actually terrible against terran drops and though some people might argue that cannons are very strong, the protoss doesn't have abilities that give them mineral surges like the MULE for instance. MULEing makes getting the money for turrets alot less painful. Basically, cannons cost alot in opportunity cost for the protoss in an evenly matched game. Enough of my digression. Basically, protoss' very weak base defense is being taken advantage of by these races in recent games. I think protoss users will need to come up with some creative uses of air (not deathball air, i mean map control style air) in the near future.
Agreed with those bold points especially. Amulet nerf was actually huge for macro games. It also prevents Protoss from having strong harass styled play.
If patrolling phoenixes could take out dropships fast enough, maybe that'd be something.
|
On May 04 2011 00:36 Dommk wrote:Show nested quote + 1.) The data is a sample of a bigger set. You have collected data on all games played, but not all possible games played -- all games played is the sample distribution, and all possible games would be the population. 2.) The data comes from a dichotomous variable, but what you're plotting is ultimately a proportion of the outcomes of that variable, not the variable itself. 3.) It doesn't matter if the sample size is 10,000 if the standard error is something like 1700.
The data used is deemed worthy of being collected by TPLD. He is just graphing the results available there, there is no need for confidence. The data is not dichotomous given that he is using ALL data from the database he said he was using and not just what he himself deems worthy of being used. He says he uses TPLD as his only source. If you have problems with the data he us using, then your qualms are with him using only TPLD. If he was only using specific data from TPLD then there would be some room to argue, but given he is showing no bias and graphing everything available there it doesn't need any further analysis. It is just pure recorded historical data. All he has done is taken TPLD and graphed it. There is nothing else needed to be added. First, see my post: I misread what was actually being graphed, so that post can be safely ignored.
Second: I wasn't criticizing the data or the sources, I was saying that the TLPD data isn't the [full] population distribution, but is instead a sample distribution of the data -- but this point is moot, since I misread the graph anyway.
However, if anyone is trying to make statements about balance based off of the graph, significance tests need to be run at each point to see if the data are actually statistically different. We can't just look at the graph and say 'oh yeah, terran is owning!' There may not even be a statistically significant difference -- and this was the point of error bars being suggested (by someone else originally, not me). However, due to the nature of the data, it doesn't look like error bars would be appropriate, but I'm not sure since I don't have the spreadsheet in front of me.
|
On May 03 2011 23:27 nihlon wrote:Show nested quote +On May 03 2011 22:10 MrCon wrote: NASL stats after 3 weeks Protoss Terran Zerg
Protoss [17-17] | 50%[21-17] | 55%[15-20] | 43%
Terran [17-21] | 45%[18-18] | 50%[25-27] | 48%
Zerg [20-15] | 57%[27-25] | 52%[13-13] | 50%
Not 70% winrate in NASL for ZvP :o If I'm understanding your post correctly it's 57% ZvP. That may not be 70 % but it's still really high if we assume those numbers would be a fair representation of ZvP generally in high level play (It's not). So I'm not really sure what your point is. No point, 2 times in this thread people talked about zergs having a 70% winrate in NASL, so I just checked and posted the facts because I don't like misinformation spreading, that's all.
That's exactly the same as people here ignoring the significant graph and pointing to the 200 games sample graph to strengthen their "point" (which has the word imbalance in it obviously). The korean graph means absolutly nothing, I feel OP shouldn't even have posted it because it's just a justification for trolls now.
|
Is this based off only master league and higher or is this based upon the entire starcraft 2 ladder?
If this is based upon the entire ladder it might be rather skewed as diamond and below there is a lot of mechanical problems that people might have and just straight up build order loss's. I think that see if its balanced across all leagues is the incorrect way of going about it. We should only be looking at the upper tiers of play where people are playing to the best quality that there currently is.
edit: I misread the OP my bad >,>
|
On May 04 2011 01:11 DrBoo wrote: Is this based off only master league and higher or is this based upon the entire starcraft 2 ladder?
If this is based upon the entire ladder it might be rather skewed as diamond and below there is a lot of mechanical problems that people might have and just straight up build order loss's. I think that see if its balanced across all leagues is the incorrect way of going about it. We should only be looking at the upper tiers of play where people are playing to the best quality that there currently is. no ladder, only tournaments TT
|
On May 04 2011 01:11 DrBoo wrote: Is this based off only master league and higher or is this based upon the entire starcraft 2 ladder?
If this is based upon the entire ladder it might be rather skewed as diamond and below there is a lot of mechanical problems that people might have and just straight up build order loss's. I think that see if its balanced across all leagues is the incorrect way of going about it. We should only be looking at the upper tiers of play where people are playing to the best quality that there currently is. Read the OP, it's not based on ladder, only tournaments and leagues.
|
i wish people wouldn't make graphs with such small sample sizes...
even if a sample size warning accompanies the graphs, the majority of people don't actually understand statistics well enough to be able to disregard these graphs as nothing more than pretty pictures, and so arguments happen and misinformation is spread
|
On May 04 2011 00:17 drox22 wrote:Show nested quote +On May 02 2011 23:59 Silent331 wrote: When i saw the first graph i though "good job blizzard, you may be on track" then I saw the korea graph and tried to see why it was that way and this is what I came up with
Koreans protoss generally stick to their guns in the sense that they will continue to do a BO untill it starts losing, and once it starts losing they will start doing a BO that counters the counter which explains the constant back and fourth of the protoss charts, as you can see after the drop in protoss wins they come back to have >50% win rate very soon after. This is simply development in the game.
Now in both charts Z has never had a good ratio vs T. Oddly tho the 2 charts are inverses of each other, when then international zergs do worse the Korean zergs do better. Personally I believe that the zerg generally has a low win rate vs T because T has alot of things that they can do which are not even close to all ins but can result in insta loss for the zerg. Also terran is simply much more forgiving endgame as opposed to zerg, if zerg loses 1 engagement late game they just lose the game.
my thoughts, take what you wish Really Terran vs Zerg lategame is so much easier for zerg than it is for terran. infestor, broodlord + lings or any other unit really with a tech switch into ultralisks is pretty much unstoppable for a terran lategame. Against broodlords you lose your tanks, against infestors you lose your vikings, if you add ghosts to your army to emp the infestors you will then die to the ultralisk tech switch. I am yet to see a game where a terran can beat this late game strat of zerg and right now it really seems impossible. Saying something like T>Z lategame is simply wrong, it might be true early game, but late game it's the exact opposite.
100% accurate. Every Terran that I have spoken to and asked what they're doing tvz right now, or how much they win and if they're winning...or vica versa if they ask me what i've been doing tvz...it ends up with us both shrugging our shoulders going "we've been losing...wtf are we supposed to do tvz right now, 2 base allin every game?"
I've been off-racing as Z on ladder, and whenever I get a T I just sorta smile and think how much less effort I have to exert instead of playing a TvT 
Overall, I think the graphs are nice to have as reference for balance but these things never really tell the full story, and regardless of how well balance things appear, there are a lot of balance problems (especially late game) that are still really not balanced.
edit: oh and supposedly new metagame trends crop up on the korean ladder more often than they do on the NA/EU ladders, so that may be why the stats look a lot different for the korean graphs than the international graphs.
|
Take Idra's wins vs Socke, kiwikaki and mana. You can't see the state of ZvP with just winrates. He did a series of allins vs both socke and kiwikaki and he just was way better then mana. This doesn't prove zerg is overpowered or anything. And no, I don't think Zerg is UP, I am just pointing out that win rates prove nothing.
Very sick of the attitude that aggressive play wins don't count. Are you kidding me? In that case, you can't count any MC wins ever. Right? Because he almost always puts tons of pressure that the layperson on teamliquid will call an all-in. Annoys the crap out of me... the attitudes that abound.
Pro Zergs are getting better at aggression. Even Idra, Idra the outspoken "macro only" guy is playing aggressive. It wasn't long ago that he admitted he was winning more when he "felt like I was losing". It makes sense because Zerg units are better early game per cost. Why not attack early and get ahead? When the other race has to defend and try to slowly build macro, you have to check them by playing aggressive, or they'll under-defend and win with a killer high economy.
SC2 is not the same as Broodwar... especially this early in the development. Aggression is paying off... so use it.
|
On May 04 2011 00:57 Blacklizard wrote: If patrolling phoenixes could take out dropships fast enough, maybe that'd be something. Even just a couple of phoenixs to take out escaping dropships would be nice to see. Force every drop into a suicide mission. Makes me really sad seeing how many dropships run away successfully
|
I feel like a point that has been overlooked here is the fact that how much the races wins in which matchup does not in any way take the way wins occur. If zerg only ever 6 pooled because they have absolutely no chance against terran/toss in any kind of long game(over exaggerated ofc.), but terran/toss early defense is so shit graphs end up at 50-50 one could look at the game as balanced, however i think balance isnt just in what the percentages are, but also how the game is balanced at the different stages of the game.
To close out my point i just would like to say that just because zerg is doing well against toss or vice versa in the win percentages, you really cant conclude if the early/mid/late game is balanced out. I would much rather have a game where it wasnt just about "oh, i can never let toss get his deathball, i have to kill him before the X minute mark" and toss feeling the oposite thing.
|
On May 04 2011 01:16 jfourz wrote: i wish people wouldn't make graphs with such small sample sizes...
even if a sample size warning accompanies the graphs, the majority of people don't actually understand statistics well enough to be able to disregard these graphs as nothing more than pretty pictures, and so arguments happen and misinformation is spread Over 8000 games is not a small sample size for starcraft2 imo, starcraft2 doesn't have that high of a variance as poker for example. Some people can keep 70-90% winrates for extended periods of times. Besides, aren't high-caliber tournament games the exact basis to our judgement whether something is imba or not? People have always thought zerg was UP because they thought it was winning less. Now we see that it's not the case, but the lack of zergs in tournaments is probably simply because the race is less popular.
|
On May 04 2011 01:16 jfourz wrote: i wish people wouldn't make graphs with such small sample sizes...
even if a sample size warning accompanies the graphs, the majority of people don't actually understand statistics well enough to be able to disregard these graphs as nothing more than pretty pictures, and so arguments happen and misinformation is spread
Well, just for the record, these same people are making statements on "balance in korea" anyway, only without any data to back it up at all.
The graph may have a low sample size, but it does show perfectly accurately the race win rate per month in Korea.
Sure it may not be relevant in a general balance discussion, but it does show very clearly what I set out to show with it.
|
|
|
|