On September 09 2013 05:24 SlixSC wrote:Show nested quote +On September 08 2013 18:08 Orek wrote:I have found some errors in the data. Can someone contact ChaosTerran, the original poster @Reddit?And let him know about this post so that the community can have correct data from September on? I don't have a reddit account and I don't know if he has a TL account http://www.reddit.com/r/starcraft/comments/1ljux8/winrates_august_source_liquipedia/Sample size is actually bigger than what the graph at the bottom right says. While checking the data, I have found some errors in ChaosTerran's original work at https://docs.google.com/spreadsheet/ccc?key=0At0PE4rdhsI9dDE0cEprWkwwMGxQdTczTTlLNW1qX1E&usp=sharingIf you go to the August one, 24 tournaments are used for winrates, but "games played per matchup" is calculated from only 15 tournaments. Therefore, Wrong CorrectedTvZ: 283 313TvP: 335 385ZvP: 356 434Cells needs to be fixed as following to prevent similar situations in future M2 cell =SUM(A2:B16) =SUM(A:B) M3 cell =SUM(C2:D16) =SUM(C:D)M4 cell =SUM(E2:D16) =SUM(E:F)April had 15 tournaments, that is, 16 rows were used, 14 tournaments in May, and 16+ since June. Therefore, Sample sizes in June, July and August data are all wrong. It's just a copy paste problem that is easy to fix. Otherwise, please keep up the good work, ChaosTerran and Wingblade!! Also, it would be nice if the links to both International and Korean google doc spreadsheets (ofc. view only) are attached to the original reddit posts from September so that those who are interested like me can check the original data by themselves. (I want to check Korean only version if available.) + Show Spoiler [Sidenote 1] +Also, "I" /ai/ column 1~6 should be fixed in a similar mannner in case there are more than 35 tournaments per month in future. I6 cell =SUM(E2:E36) =SUM(E:E) etc. + Show Spoiler [Sidenote 2] + Maybe it is intentional, but for example, 60% in TvP and 70% in TvZ don't necessarily mean that Terran has 65% winrate. In fact, when sample sizes of two matchups are different, (60+70)/2 formula can't be used. Example: TvP 5 games played, T wins 1 game, P wins 4 games TvZ 9 games played, T wins 2 games, Z wins 7 game ZvP 11 games played, Z wins 3 games, P wins 8 games
T winrate: (1+2)/(5+9) = 3/14, NOT {(1/5)+(2/9)}/2 = 19/90 P winrate: (4+8)/(5+11) = 3/4, NOT {(4/5)+(8/11)}/2 = 42/55 Z winrate: (7+3)/(9+11) = 1/2, NOT {(7/9)+(3/11)}/2 = 52/99
Distribution of win percentages T: (1+2)/(5+9+11) = 3/25, NOT (19/90)/{(19/90)+(42/55)+(52/99)}=428/2970 P: (4+8)/(5+9+11) = 12/25, NOT (42/55)/{(19/90)+(42/55)+(52/99)}= 1512/2970 Z: (7+3)/(5+9+11) = 10/25, NOT (52/99)/{(19/90)+(42/55)+(52/99)}= 1040/2970
That said, when sample sizes of 3 matchups are very different, race winrates and distribution get skewed in this way. Maybe that's why ChaosTerran took "average" of two winrates, though it doesn't mean much IMHO. Personally, I feel that cell I7~9 and J7~9 need adjustments. At the same time, race winrates and distribution of win percentages are difficult to interpret when sample sizes are different anyways.
The numbers in the graphs (TvZ, TvP, PvZ) are actually accurate. I think the "games played per matchup" has a different algorithm, it's a mistake but it just a display error if you will and not connected with the results per matchup. edit: The last graph (games per matchup) has M2:M4 as the set value and M2 - M4 have B2:16, C2:16 and D2:16 as their respective values. Which is why the number of matches played isn't displayed properly. However, the winrates work with a different algorithm they include all of x:y. So all matches were included in the win rates. edit: But yeah, nice find, I'm sure it'll be fixed for september. edit: Show nested quote +Maybe it is intentional, but for example, 60% in TvP and 70% in TvZ don't necessarily mean that Terran has 65% winrate. In fact, when sample sizes of two matchups are different, (60+70)/2 formula can't be used. Where did you get this from? When the sample sizes are different you actually have to use this formula, because you are calculating an average between two different sample sizes, not the total of both samples. Take an abstract (and rather extreme) case. Zergs wins 500 out of 800 games in PvZ (62.5%), but only wins 20 out of 50 games in TvZ. (40%). Does Zerg really have a 61%+ win rate across both matchups? It's a meaningless number and very misleading. Because while they are losing to Terran, by virtue of there being more PvZ matches and wins their win rate is 60%+ across both matchups. So PvZ is weighed heavier than TvZ by a factor of 16. It just doesn't work that way. Thank you for going to the trouble of checking the original spreadsheet data. As for winrates, as I noted, race winrates and distribution of win percentages are difficult to interpret whichever way they are calculated. Not that many people care, but I believe people interpret these winrate numbers differently without knowing how these numbers are calculated. I for one think calling (60+70)/2=65% "winrate of X race" problematic, but I'm not a mathematician or anything, so meh. As long as those "games played per matchup" get fixed, I'm a happy man.
Edit: Just checked. # of games cells have already been fixed. Quick work.
|