Racial Distribution in Patch 1.0 - Diamond Ladder - Page 22
Forum Index > SC2 General |
ZaaaaaM
Netherlands1828 Posts
| ||
Sentient
United States437 Posts
On September 04 2010 05:47 Scarmath wrote: To take, as an example, the chi-square test, it relies on never expecting a population of zero, and usually expecting a population more than 5. If I set my bin size to 10 instead of 50, in many cases the expected number of random players is less than one. The rankings "are what they are," but the number of people affects how confidently we can say anything about them. This is where I have to disagree. The chi-square test tells you exactly how confident you can be, regardless of the bin size. That's the whole point of using the p-values. The smaller bin sizes are accounted for by the t-table, so it doesn't matter how big or small they are. Smaller bins will use a larger uncertainty multiplier. A bin size of 20 is too small (even a bin size of 25 is pushing it), but I did try this with a sliding window of 50. What do you base this assertion on? There are tests to establish these probabilities. Everyone keeps saying that 20 is too few, yet no one has actually done that calculation. and here are the corresponding p-values: The graph actually says pretty much the same thing. I went back to static, separate bins, though, because I felt it was easier to visually parse. This says a lot actually. With only random chance, we expect 1 in 20 of the data points to be p < 0.05. That's about what you have. However, there is a large cluster at the top 100 players where p < 0.05, and this stays true regardless of which 20 of the top 100 you pick. This is exactly what you would expect if racial distribution was not due to chance. Therefore, we can say with very good confidence (~95%) that the racial distribution is not caused by random chance. It doesn't tell us anything about the reasons for that, but it does give us a hard confidence number of that fact. If you're still concerned about the bins, could you post your spreadsheet somewhere? I'd like to play around with it. It seems like you have it set up in a very convenient way to manipulate. | ||
Scarmath
United States60 Posts
This is where I have to disagree. The chi-square test tells you exactly how confident you can be, regardless of the bin size. That's the whole point of using the p-values. The smaller bin sizes are accounted for by the t-table, so it doesn't matter how big or small they are. Smaller bins will use a larger uncertainty multiplier. and What do you base this assertion on? There are tests to establish these probabilities. Everyone keeps saying that 20 is too few, yet no one has actually done that calculation. Yes, I've done calculations with bin sizes of 10 and 25, and neither provided clear results (at least to my satisfaction). The problem is that with so few random players, the presence or absence of merely one or two random players will create a false positive. Here, I was trying to avoid this, but let's look at the formula I was using. (Math-o-phobes should shield their eyes). where i iterates over the the categorical groups G (race preferences, in this case), O is the observed count, and E is the expected count. The first thing to note is that if E is zero, we get a divide by zero error. Furthermore, for small values of E, relatively small perturbations in O can result is overly large chi-square scores, which in turn can lead to false positives. Why is this a problem? Because random players make up around 5% of the top 1000. With a bin size of 20, that gives us an expected value of 1 per bin, which, in turn, gives the presence of random players a larger influence on the final p-value. A bin size of 50 gives an expected value of 2.85, which is still very low, but at least lessens the impact of Random players. (My statistics text suggests the majority of your expected values be above 5, but I have no idea where they got this number). Also... This says a lot actually. With only random chance, we expect 1 in 20 of the data points to be p < 0.05. That's about what you have. However, there is a large cluster at the top 100 players where p < 0.05, and this stays true regardless of which 20 of the top 100 you pick. This is exactly what you would expect if racial distribution was not due to chance. Therefore, we can say with very good confidence (~95%) that the racial distribution is not caused by random chance. It doesn't tell us anything about the reasons for that, but it does give us a hard confidence number of that fact. Yeah, but that's also exactly what my other graph said. (That is, the top 100 players where statistically significantly different than the overall distribution of the top 1000). The issue is determining if that is because of random players being underrepresented, or zerg players being underrepresented. If you want my (unsubstantiated) opinion, I think that it is because zerg players are underrepresented. However, without more refined methods, we just can't jump to that conclusion. (At the very least, we can't use numbers as proof). | ||
Sentient
United States437 Posts
When I came up with my 95% figure earlier, I tested the more specific hypothesis that Terran is overrepresented. This is simpler and more to the point than looking at the distribution of all four options. Since Terran make up 30-40% of the population (depending on which subset you take), 20 is more than large enough for a bin size. | ||
Gunman_csz
United Arab Emirates492 Posts
Edit: Post -> Most obviously. | ||
abominare
United States1216 Posts
On September 04 2010 11:27 Gunman_csz wrote: Sentient and Scarmath can you guys post your findings in laymen terms . It is really difficult for me (and I am sure post of the viewers) to decode and understand stuff from your charts. I skimmed mostly but the jist of it is theyre trying to account for the affect random players have on the distributions. Theres also some attempts to adjust for a fact that there is not an equal distribution of population in all racial choices (t/p/z/r). Theres a lot of great math in here and as an actuary I'm really surprised and impressed this is out on a video game board. Unfortunately theres too much missing information to make any of this useful. Taking a step back from the math look at the system youre working. More specifically the match making system. How does it work? From the visible system youre matched with an opponent and you win or lose points based on their HIDDEN elo point system. Does a player ever get matched with a different league/large visible point discrepancy, yes all the time infact theres always whine on the blizzard forums about this. Are points only awarded for skill? Negative only wins and losses. For example can we agree that idra is one of the best sc2 players right now? I mean hes favored to win the GSL, he also has a paltry ~1300 score (rounded up) Why is that? Idra doesnt ladder a lot infact the whole system as seen by us rewards those who play a lot. How many times have you played a terrible opponent only to look him back up and seen an incredible amount of points with a terrible record? Further more how can idra clearly not be in the top 100 based on visible points but still be on blizzards top 100? Blizzard internally uses a completely different system to check players determined ELO for balance purposes. One not skewed by point inflation for mass gaming. I could go on but this is the main point is this, the graph and subsequent math adventures is just a graph of the number of people in xyz visible points that are in no way correlated to any sort of balancing information, or atleast not any useful information. Its very much of a representation of who has the most cookies in an imaginary jar (assuming there is actually a jar of cookies divided on a completely different system than what is shown) Finally: This is not the metric for blizzards balancing FYI nor should it ever be the game would be terrible. | ||
Sentient
United States437 Posts
On September 04 2010 11:27 Gunman_csz wrote: Sentient and Scarmath can you guys post your findings in laymen terms . It is really difficult for me (and I am sure post of the viewers) to decode and understand stuff from your charts. TL;DR version: The racial distribution in the top 100 players is not expected by random chance. The two probable reasons are (1) The races are not equally powerful, and/or (2) Player skill is not equally distributed among the races. Scarmath did all of the graph work. I mostly plugged some numbers into a binomial probability calculator (http://stattrek.com/Tables/Binomial.aspx). He did a lot more work and should get the credit. Imagine you want to know if a dice is weighted. You roll it 10 times and get 5 sixes. Knowing that the probability of rolling a six is 1 in 6, the odds of rolling a six 5 or more times out of 10 is 0.07%. With this information, we can conclude with 99.93% certainty that the dice is weighted. The point of contention is that 10 isn't a very large dataset. Say we are interested in how many times we roll a 1, which happens to be zero. There is a 24% chance of this occurring by chance. One could argue with 76% certainty that this is not due to chance. Since there are six possible outcomes in a dice roll, chances are that at least one of the six numbers will come up 0 times. This means that you have to be very specific in the question you are asking about the dice, since it's easy to cherry pick the statistics in your favor. I could choose any of the six numbers with an above- or below-average abundance to argue my point, but this would be dishonest, because I am looking for data to support my preconceived notion that the dice is weighted. We can use the same statistical means to test the racial distribution in the top 100 players. Knowing the distribution of players in Diamond league (eg, 30.6% Terran, etc.), we can calculate the probability of the top x players' races occurring purely by random chance. For example, we would expect 30.6% of the top 100 players to be Terran. In reality, Terran are more abundant than 30.6% in the top 100. -Is this caused by random chance? -Is it caused by only looking at 100 players? The answers to both of these are no. Using either of our analyses (mine using a binomial calculator, Scarmath's using a chi-square test), we both find that there is a 95% chance that the distribution of players in the top 100 cannot be explained by random chance. This is an all encompassing number, independent of the number of players involved. Whether we looked at 5 or 20 or 1000 players is irrelevant, because that 95% figure incorporates the number of players. As with all analyses, this has to be combined with what we already know about the game. All it tells us is that random chance has not produced a high ratio of Terrans in the top 100. Other factors must come into play. For example, - Random players are less abundant in the top of the ladder, likely because it becomes harder and harder to compete as random. Random players must learn 9 matchups instead of 3. - The "Terran's OP" meme probably shifts the player distribution in favor of Terran. The 95% figure doesn't tell us that Terran is OP, that Zerg is UP, or that Mars is really blue. All it tells us is that something other than chance (or "sample sizes" for that matter) must be used to explain the proportion of Terran players in the top 100. This is true regardless of your stance on the Terran issue. What to do with this information is another story, and it will only shift the flamewar back to whether or not Terran is balanced. Lies, damned lies, and statistics indeed. | ||
Shorack
Belgium111 Posts
I have a look at page 19 and 20 and apparantly no improvement. So it's time to enter this topic i'm afraid: As several people noted, the data contains the whole population of diamond 600+ players. Since you know the entire population, there is no need at all for statistics of significance. Statistics are what you use when you only know part of the population. In that case, there is uncertainty and statistics helps to make statements about the whole population, based on the observed sample, with a specific level of certainty. Every difference you see, is a real, existing difference. Any equality you see, is a real, existing equality. The question is not whether terran is more played on higher skill levels, we know that since we know that in the total population of the top scoring players, more than 1/3 plays terran. The questions left to answer are: 1. Are the differences so big that we need to act? 2. What are the causes of these differences? Any test of significance in this topic is used inappropriately and worthless. Unless you would add the finite population correction. But that would only prove my earlier point that any difference is a real difference, since the FPC part in this case would always be 0, meaning your confidence interval will always have a width of 0, so anything that has even the slightliest difference, would be significant. That being said, please stop posting all the worthless pseudo-statistical crap. | ||
ReplayArk
Germany23 Posts
| ||
Shorack
Belgium111 Posts
On September 04 2010 23:55 ReplayArk wrote: @Shorack you may just read the first post and the edits, it will lead you to some discussion from Scarmath and Sentient and what they have doone is not worthless pseudo-statistical crap. You may read it if you want to discuss anything further, but it is not mannered to implicitly tell the other they are dumb, or else why should one write worthless pseudo-statistical crap? First of all, apologies for the use of the word crap. I got quite nervous from seeing people throw around statistics where they aren't appropriate. It's even painful to see that some people put so much effort in it, since it's inappropriate use and hence wasted effort. Second, i went through the edits: blacktoss is right, there is no confidence to consider. too bad that he ignores that himself in the same post. since he refers to a chi-square test, which implies he wants to see if something falls within a confidence interval or not. cotonou is a 100% right. note that he refrains from statistical tests (well, except for the use of the term null hypothesis ) toxigen's post is again an interesting one. note again that he doesn't try to use statistical stuff. (i hope you start to see the pattern in what i like and what not? ) on scarmath then: he did put quite some effort in it, but it's not because you put effort in something, that it's correct. Since we're observing the whole population, the differences are always significant. So using a chi-square test has no point, it's always significant unless the expected and observed are exact. (it's not the case in scarmaths second graph because he didn't correct for finite population) I'll try to repeat myself in an attempt to make it clearer compared to my first attempt: there is no question of significance, every difference is significant. There is only a question of relevance (are the differences we see big enough to act upon?) and for the relevant ones: what's causing it. Significance is about: is the difference we see, really there? And we have to ask that ourselves when only seeing part of the population. If you see the whole population, you see the differences that are really there and so no longer need to wonder about that, you know it for certain. We still need to ask ourselves the relevance question: the difference we see, is it big enough to worry about (say we find there are 2% more terran players than expected, is it really worth the effort to work that away?) and finally, the why-question: what is the cause of the difference? (no point in changing game balance if that's not causing the difference) I hope this makes my statement a bit clearer to you? (i fear my english is falling a bit short right now) | ||
Scarmath
United States60 Posts
I'll try to repeat myself in an attempt to make it clearer compared to my first attempt: there is no question of significance, every difference is significant. There is only a question of relevance (are the differences we see big enough to act upon?) and for the relevant ones: what's causing it. I think, perhaps, you don't understand exactly. Let me state all this explicitly. It is clear that the proportions of the top 100 players are not equal to the proportions of the top 1000. The question is, how likely is that to be the result of random variation? We're not measuring sampling error. We don't need to measure sampling error. We do need to measure confidence. That is the point of both the binomial calculation and the chi-square test. We already know there is a difference, but we need to test how statistically significant that difference is. In both cases, we can say with 95% certainty that the proportions of the top 100 players are not different due to random chance. That is, the deviation is statistically significant. These are steps we need to take even when we have the entire population. | ||
Scarmath
United States60 Posts
http://www.mediafire.com/?uta56tbvr5w8ect The easiest way to play with it is by replacing the Raw table with other information copy and pasted from SC2Ranks.com. (Copy and paste it from Firefox. Chrome doesn't copy the Alt-test I use to identify race, and I haven't tested it in Internet Explorer). The rest of the sheets should update automatically. Other things may be messed with, but may require more extensive fiddling to work. Still working on this a few hours at a time. | ||
rackdude
United States882 Posts
On September 04 2010 23:24 Shorack wrote: Since you know the entire population, there is no need at all for statistics of significance. . Absolutely not true. What could be happening in the population could be random chance. If you created a population randomly with balanced races, you won't get 33% 33% 33% (forget random to make it easier), you get something slightly off. Statistical significance tells you "if the races were balanced, there would be a P percent chance of this happening". When you get a number like .01%, you go "wow, there is almost no chance this randomly occurred". However, if you look at the data, you can make an inference from something that is random. For example, if you flipped a coin 3 times, and you saw all heads, if you looked at the graph you'd go "wow, this is definitely heads biased." Statistics would tell you though "hey dude, chill. There is a 12.5% chance of that happening randomly, so I wouldn't jump to conclusions just yet". That is why we are using statistics. | ||
blacktoss
United States121 Posts
| ||
leve15
United States301 Posts
On September 05 2010 04:52 rackdude wrote: Absolutely not true. What could be happening in the population could be random chance. If you created a population randomly with balanced races, you won't get 33% 33% 33% (forget random to make it easier), you get something slightly off. In the long run you would. After thousands of games, would you call the differences in that graph slight? Use your noggin. | ||
Knutzi
Norway664 Posts
if you have two people and lets say their skill is 1000 and one of them plays terran and the other plays zerg, its fair too say the terran player would end up having a higher rating than the zerg player even if the skill between the two is actually the same | ||
rackdude
United States882 Posts
On September 05 2010 05:14 leve15 wrote: In the long run you would. After thousands of games, would you call the differences in that graph slight? Use your noggin. No, I wouldn't call those slight. That's why the statistics on those graphs came out significant. Use your noggin and please read before you make snarky comments. (I like how you cut my quote off before it got to statistical significance. Please don't misquote people and please read their whole post. Thank you.) | ||
Shorack
Belgium111 Posts
On September 05 2010 03:52 Scarmath wrote: I think, perhaps, you don't understand exactly. Let me state all this explicitly. It is clear that the proportions of the top 100 players are not equal to the proportions of the top 1000. The question is, how likely is that to be the result of random variation? We're not measuring sampling error. We don't need to measure sampling error. We do need to measure confidence. That is the point of both the binomial calculation and the chi-square test. We already know there is a difference, but we need to test how statistically significant that difference is. In both cases, we can say with 95% certainty that the proportions of the top 100 players are not different due to random chance. That is, the deviation is statistically significant. These are steps we need to take even when we have the entire population. I see your point and i understand that we want to know whether it's some odd occurrence that just happens when dealing with something as complex and odd as human behavior. However, i don't agree with your method. I disagree on two levels: 1. you are leaving out information we have. (this reduces the power and as a result prevents the drop of H0 where it would be dropped using full information) 2. i believe you might be forgetting to which random factor insignificance actually refers here. 1. i can only keep repeating this till either you agree or you give an explanation for not doing it: finite sample correction*. the exact proportions for the population are known. we can say with 100% certainty that the proportions aren't 30/30/30/10, but ab/cd/ef/gh (whatever they are, probably mentioned somewhere in the thread) 2. in case of insignificance, it means that if the population fits the H0, we'd still get the result found too often if we performed the same test on a new sample from the same population. So it refers to the randomness involved in selecting subjects for the sample. But here, the subjects observed equal the whole population. So there is zero chance of having such random effects, because we're not taking only part of the population and leaving a part out. (by which it's possible that just due to random chance, we get a sample that's too unrepresentative for the whole population and hence has different values from the population) And it's exactly that, that you're testing for when performing something of the likes of a binomial test: you're using techniques that are meant to cope with the limitations of a sample and you're using them on a full population. Again, i totally agree that it's important to find out whether the differences are due to a factor of importance and whether they're large enough to act upon, but statistics is not meant for this**. (well, there are some tests for relevancy, but the one or two i've encountered did nothing but give you another number upon which you had to decide if it was relevant or not :p) *Just to make sure, do you know what i mean when i write: root((N-n)/(N-1)) with: N=population size, n=sample size. **Maybe it becomes clear by giving you the totally opposite situation of the one you refer to: say you have a huge sample out of an infinite population. Even very small differences tend to become significant then. Say happiness of men vs women at the workplace. You can find on a stapelschaal that women score 7.3 and men 7.18 and that difference being significant. Are you going to report that to management? I hope not, the difference is so small that it's not worth any efforts. It's not because i say that all those differences are significant that i mean they're all relevant nor am i claiming that they can't be the result of the complexities ('randomness') of human behavior. On September 05 2010 04:52 rackdude wrote: Absolutely not true. What could be happening in the population could be random chance. If you created a population randomly with balanced races, you won't get 33% 33% 33% (forget random to make it easier), you get something slightly off. Statistical significance tells you "if the races were balanced, there would be a P percent chance of this happening". When you get a number like .01%, you go "wow, there is almost no chance this randomly occurred". However, if you look at the data, you can make an inference from something that is random. For example, if you flipped a coin 3 times, and you saw all heads, if you looked at the graph you'd go "wow, this is definitely heads biased." Statistics would tell you though "hey dude, chill. There is a 12.5% chance of that happening randomly, so I wouldn't jump to conclusions just yet". That is why we are using statistics. You can't create random populations. The population needs to confirm to the research question. You can create random samples though. As your post is now, i disagree (assuming we can achieve perfect balance in the broad sense (appeal), which is ofc not possible, so just as a thought-experiment.) Replace population with sample in your post and i'll completely agree. | ||
rackdude
United States882 Posts
On September 05 2010 08:03 Shorack wrote: You can't create random populations. The population needs to confirm to the research question. You can create random samples though. As your post is now, i disagree (assuming we can achieve perfect balance in the broad sense (appeal), which is ofc not possible, so just as a thought-experiment.) Replace population with sample in your post and i'll completely agree. You are right, but it actually depends completely upon where your model starts. For most models, experiments, scientific papers, etc, you are completely correct. The population is what is and the sample is what is measured. But that's because the ideas dealing with "random populations" are already dealt with in the mathematics. An example is like this. Participants enter a room where there is Card A and Card B. Assume there is no preference for either card. Participants pick a card and are now designated as group A or B. From this you create a field of theoretically possible populations from the different combinations of card picking that is possible. From this theoretical model, you can ask the question, "if I were to randomly pick a population, what is the chance I pick one that matches the population that I measured?". This is what I mean by "create a random population", it's like theoretically picking a card from your hand of possibilities. I probably should have said "take an arbitrary p element of the set of possible populations", and I probably shouldn't have said you won't get 33% 33% 33% because there could exist at least one population with that distribution. But I think you get the point. Good call because you cannot take a random population in any empirical science because the population is defined as what exists. But I was speaking from a mathematical standpoint that wasn't measuring what exists, but rather the probability of such a population existing given the model we have created (which is what the simplified formulas in non-upper division statistics classes give you). I guess it's a slip we make these days since with computers we actually do "create" random populations for models, though we should be saying we are taking a possible random population. | ||
Scarmath
United States60 Posts
I disagree on two levels: 1. you are leaving out information we have. (this reduces the power and as a result prevents the drop of H0 where it would be dropped using full information) 2. i believe you might be forgetting to which random factor insignificance actually refers here. 1. i can only keep repeating this till either you agree or you give an explanation for not doing it: finite sample correction*. the exact proportions for the population are known. we can say with 100% certainty that the proportions aren't 30/30/30/10, but ab/cd/ef/gh (whatever they are, probably mentioned somewhere in the thread) 2. in case of insignificance, it means that if the population fits the H0, we'd still get the result found too often if we performed the same test on a new sample from the same population. So it refers to the randomness involved in selecting subjects for the sample. But here, the subjects observed equal the whole population. So there is zero chance of having such random effects, because we're not taking only part of the population and leaving a part out. (by which it's possible that just due to random chance, we get a sample that's too unrepresentative for the whole population and hence has different values from the population) And it's exactly that, that you're testing for when performing something of the likes of a binomial test: you're using techniques that are meant to cope with the limitations of a sample and you're using them on a full population. 1) 30P/30T/30Z/10R is not the expected distribution. Racial preference is not dependent on balance, so there is no reason to expect that a perfectly balanced game would have an even distribution of race preference. The actual proportions for the entire top 1000 are ~36P/35T/23Z/6R. What I am testing is if each individual bin of 50 matches those proportions. That is, is the proportion of racial distribution independent of rank. The null hypothesis of my test is that the bin being tested matches that distribution. In the top 2 bins, in all the tests I've run, this null hypothesis has always been rejected. 2) What is happening in the population COULD be the result of random chance. Say a player has his power go out, or suffers from lag. These are not "balance" related issues, but the could result in undeserved losses. How can we judge if these factors shaped the proportions of the top tier players? We can use statistical methods to measure how unusual these proportions are. The chi-square test I used is specifically intended to measure how closely observations (that is, the actual preference of players) match an expected proportion (that is, the proportions of the entire population under consideration). I hope this clears up some of your concerns. | ||
| ||