|
On August 17 2010 14:03 c.Deadly wrote:Show nested quote +On August 17 2010 12:10 Milkis wrote: taking differences of percents is not a scientific study, nor is it statistically sound.
please don't just look at percents and try to frame it as statistics. this is far from how you would be approaching it. you don't even have a theory of how this actually works out nor do you consider an actual model, you just subtract differences and hope it works out
holy crap what people try to pass as statistics on the internet is absolutely appalling This is the truth - There's no measure of certainty or variance in these statistics, and you'd have to assume the game is actually balanced to set it all to a bell curve. What about considerations of unknown variables? What if players new to SC (and RTS games) are more attracted to Terran because of familiarity through the campaign, leading to Terrans having a much lower win% in Bronze league?
I was going to leave this alone, but the post you're quoting definitely isn't true. A Chi-square analysis is a foundational test in statistics. It has many uses for testing distributional assumptions and it's presented here in it's simplest form. It's absolutely the most appropriate test to use given the data that were available. The reason there's no variance in the statistics I used is because the Chi-square test doesn't use variance to generate its test statistic. As far as certainty goes, I have it, I just didn't report it because I thought the editorial standards of a video game website would permit me to publish a data analysis without reporting p-values, test statistics, and sample sizes. If your curious the p values for the Chi squared statistics using the full 48.6 million game sample sizes are 58.35 for diamond (p < 0.001), 76.44 for plat (p < 0.001), 40.79 for gold (p < 0.001), 5.15 for silver (p = 0.161), and 6582.10 for bronze (p < 0.001).
I would love to acount for variables like the ones you mentioned, but I can't measure them. You can't include data into a statistical analysis if you can't measure them. Instead, the only hypothesis that I could test was one about the distribution of win probability among the races. I demonstrated that the win probability is essentially evenly distributed. For reasons discussed earlier, this suggests that the balance state is good for a vast majority of players, although further investigation would be werented, were this actually a real study.
|
On August 17 2010 07:23 Mindcrime wrote: Scientific proof that the matchmaking system is working
That is all that this is. No conclusions about balance can be drawn from looking at win%, on ladder, when the matchmaking system is specifically designed so that you win about 50% of your games. :|
Exactly. When you deal with every day stuff a percent here or there doesn't look that big but when you're dealing with millions of games with a system that is supposed to be evenly matching everyone up I think op's results show a lot. Still I'm not all about the numbers even though I know blizzard is. I put more weight into how people are doing in the tournaments since you have to adjust to your opponent and such. I almost view it as a shortcut to what the ladder will or should do in the future although I'm not sure if that's correct or not.
|
Lol OP, you didn't really spend all that time analyzing an AMM system? You perfectly proved that the matchmaking is working, nothing more and nothing less. Is that concept really that hard to grasp, even for math nuts?
Wait a couple of more weeks and than gather data from static competition like tournaments, non-AMM leagues etc. That will be data you can use.
On August 17 2010 14:57 guitarizt wrote:Show nested quote +On August 17 2010 07:23 Mindcrime wrote: Scientific proof that the matchmaking system is working
That is all that this is. No conclusions about balance can be drawn from looking at win%, on ladder, when the matchmaking system is specifically designed so that you win about 50% of your games. :| Exactly. When you deal with every day stuff a percent here or there doesn't look that big but when you're dealing with millions of games with a system that is supposed to be evenly matching everyone up I think op's results show a lot. Still I'm not all about the numbers even though I know blizzard is. I put more weight into how people are doing in the tournaments since you have to adjust to your opponent and such. I almost view it as a shortcut to what the ladder will or should do in the future although I'm not sure if that's correct or not.
No, that is variance in the matchmaking, because it is most likely not perfect until every player has 1000+ games (it is a guessing algorithm most likely).
|
5003 Posts
Edit: and stop assuming things
|
Taking leagues lower then diamond into consideration when talking about balance is bad. Why is that? Because when we discuss about balance we want to talk about what's possible in the game not about what the unexperienced players are doing. Even in Diamond, the players that are not within 25th place in their league should not really be considered.
|
Well this was a waste of time.
User was warned for this post
|
I'm not gunna bitch about how statistics tests i dont completely understand work, so correct me if I'm wrong.
But one criticism in your methods. If hypothetically: Zerg won 75% against Protoss Protoss won 75% against Terran Terran won 75% against Zerg
wouldn't the way you analyzed it determine each race is perfectly balanced?
edit: wtf are people criticizing you for saying its imba? I'm pretty sure you showed for all practical purposes it is balanced. (assuming your methods were right)
|
I really don't think these stats are enough to determine imbalance, and I don't think stats that can determine imbalance will be available for at least another few years.
With that aside, I think the win/loss ratios for every race will smooth themselves out over time. Stop worrying. =)
|
On August 17 2010 15:08 Milkis wrote:
I will admit that I missed the part about you using chi square test to compare distributions (specifically because you didn't report it and your entire analysis was based off your ridiculous chart).
Not sure why. I say (specifically report?) it in line 4 of the methods.
Run the Chi Square again, and rather than not weighing the "random" distribution you're testing against, weight that according to the distribution of players. The site you have listed as your method has that available. This is because the base player distribution is not random. I'm guessing what you did was assume they should all get the same number of wins given the number of games played at that level.
No need to put "random" in quotes, unless other "words" need to be in quotes as well. And of course I did the analysis in the way you described, as would any competent person doing a chi squared test (which I describe elsewhere in this thread). Your guess was wrong. Why would you assume I did it wrong? Do I really need to lay it all out in a post in a Starcraft forum? If you were wondering couldn't you have asked nicely?
If you used percentages for your chi squared then there really isn't much I can say since there's too many issues arising with that because that's probably not even normalized properly -____-
Then don't say much, because I didn't use percentages. I used the raw data for 58 million games. I also describe simulating the data for fewer games elsewhere in the post.
Also never do a power test after you run a statistical test again. Because that is absolutely and utterly a meaningless number. You run Power tests only when you're designing the test, not after you run the tests. You seem to have used power to see how many games you need to run to detect imbalances... well... what kind of imbalances? 0.1% differences? Or the differences you ran and found on the chi squared? If it's the latter it's an utterly worthless figure. The former? Then you need to be arguing about what causes that instead of just pointing at some numbers and saying "oh look it's imbalanced" which is what you did and what caused most of the anger in my post. Just posting numbers and saying "look at what the numbers say" doesn't mean jack if you don't have a theory you're actually testing.
Don't give me a lecture on how to run statistical analyses. Almost everything you've said has either been patently false or based on false assumptions of my intelligence. The only reason I did the power test was because I thought it would be fun to see how many games you would have to sample before you could tell a difference in the inbalance. I thought it was neat that it took about a million games before the sample size was large enough to detect differences. I guess I should've shot myself instead, for being stupid enough to do a post hoc power test. Let me remind you that you know NOTHING about how much statistical knowledge I have and that it may not be safe to assume that you know more than I do.
Next time you run a test, decide before hand what you're actually testing. "Okay, I think 1% is imbalanced, let's test if there's a 1 % difference". All you did was literally just provide some summary statistics since you did a complete after the fact analysis rather than actually testing something.
All I did was to try to make a post I thought would be interesting to some members of the community, showing that there's no obvious statistical reason to suspect that the races are inbalanced, based on a limited data set that I discovered last night. It's not like this is my dissertation project. Now I'm pretty much through with this thread. Most people have been civil and I've tried to appreciate the discussion, but there's been a disturbing number of really hostile posters with openly bad attitudes. Is this forum always like this?
|
On August 17 2010 07:38 dcberkeley wrote:Scientific != science
No, but scientific refers to a specific methodology that is not followed in the OP, which seems more aimed at being a statistical study (although many have already pointed out that it still doesn't really use statistics but rather a layman's look at numbers).
|
Wheres baller when we need charts?
|
On August 17 2010 16:59 LlamaNamedOsama wrote:Show nested quote +On August 17 2010 07:38 dcberkeley wrote:On August 17 2010 07:35 neobowman wrote: Isn't this math and not science? Scientific != science No, but scientific refers to a specific methodology that is not followed in the OP, which seems more aimed at being a statistical study (although many have already pointed out that it still doesn't really use statistics but rather a layman's look at numbers).
Science is a process by which you formulate a hypothesis based on a theory, a prior hypothesis, or an observation, and then devise a method to objectively test that hypothesis.
I heard a hypothesis that the races were inbalanced, observed a dataset that suggested that win rates were race independent, and formulated a hypothesis that win rates were race independent. I then found support for my hypothesis by analyzing a data set of 58 million cases using a time honored statistical technique. Where does that deviate from the definition of science?
Also, I'm not a layman, and despite what you may have heard, a Chi-square analysis is in fact a statistical technique. In fact, it's possibly the most widely used technique in scientific literature, particularly in advanced statistical modeling where it's used for model verification.
|
sorry folks,
as a statistician consultant (yea kill me please), this statistic discution is quite a non-sens (at least the OP, and many comments on the first pages)
it doesn't even compare race-X vs race-Y distribution W/L my overall corporal temperatur is fine, while my head is rosting in an oven and my feets are in cold water.. can't discut longuer on it, it is raining now at my office, i need to go buy some sunglasses to have the sunshine back
hint : it would have be more interesting to dig into random player games, even if conclusion may prove nothing in the end
if you want an undisputed balanced game : GO game is for you (or chess, but nowadays even a PC soft beat masters). Of course it is black&white 2D... no flashy battle :/
|
I think you didn't mention the simple fact that Battle.net actively messes up the winrate by tryign to make everyone 50-50,i.e, Battle.net is manipulating (not a bad thing) the winrates to go to 50-50. Have you considered that could alter your conclusions and make this anlysis not that valuable? If that didn't happen this would be great, but all this proves is that battlle.net efficiently matches people with others of their skill.
Since this is this way, it's better to look at tournaments. There's no win rate messing there, but only raw data. You should do some recent tournament race analysis if you have the free time, since you seem to have some taste for it , and that would be more valuable i believe.
|
balance is only relevant for the top 1% of players, any results of players below that is inconsequential
|
tbh if you are earning as much money as blizzard im pretty sure they higher enough mathematicians/statisticians to do GIANT SPREADSHEETS for balancing and tuning of their different mechanics.
for godsake, they have GIANT SPREADSHEETS for world of warcraft, a game which i'd like to say is has a HUGE SET OF COMPLICATIONS/BALANCING ISSUES
|
There are imbalances in every game made. Some games have bigger imbalances than others but that doesn't make the game completely terrible. They balanced this enough to resemble maybe wow but kept the dumb above their parents blockbuster MW2. The idea was to get as many people as possible to buy it. It's still #1 seller 3 weeks in a row. I don't see the point in another one of these list statistics threads to show the matchmaking system is working like it was intended to.
|
On August 17 2010 15:08 Milkis wrote:Show nested quote +On August 17 2010 14:21 GagnarTheUnruly wrote: I was going to leave this alone, but the post you're quoting definitely isn't true. A Chi-square analysis is a foundational test in statistics. It has many uses for testing distributional assumptions and it's presented here in it's simplest form. It's absolutely the most appropriate test to use given the data that were available. The reason there's no variance in the statistics I used is because the Chi-square test doesn't use variance to generate its test statistic. As far as certainty goes, I have it, I just didn't report it because I thought the editorial standards of a video game website would permit me to publish a data analysis without reporting p-values, test statistics, and sample sizes. If your curious the p values for the Chi squared statistics using the full 48.6 million game sample sizes are 58.35 for diamond (p < 0.001), 76.44 for plat (p < 0.001), 40.79 for gold (p < 0.001), 5.15 for silver (p = 0.161), and 6582.10 for bronze (p < 0.001).
I will admit that I missed the part about you using chi square test to compare distributions (specifically because you didn't report it and your entire analysis was based off your ridiculous chart). However, there's still too many assumptions going in there, not even going into the matchmaking issue. Run the Chi Square again, and rather than not weighing the "random" distribution you're testing against, weight that according to the distribution of players. The site you have listed as your method has that available. This is because the base player distribution is not random. I'm guessing what you did was assume they should all get the same number of wins given the number of games played at that level. If you used percentages for your chi squared then there really isn't much I can say since there's too many issues arising with that because that's probably not even normalized properly -____- Also never do a power test after you run a statistical test again. Because that is absolutely and utterly a meaningless number. You run Power tests only when you're designing the test, not after you run the tests. You seem to have used power to see how many games you need to run to detect imbalances... well... what kind of imbalances? 0.1% differences? Or the differences you ran and found on the chi squared? If it's the latter it's an utterly worthless figure. The former? Then you need to be arguing about what causes that instead of just pointing at some numbers and saying "oh look it's imbalanced" which is what you did and what caused most of the anger in my post. Just posting numbers and saying "look at what the numbers say" doesn't mean jack if you don't have a theory you're actually testing. Next time you run a test, decide before hand what you're actually testing. "Okay, I think 1% is imbalanced, let's test if there's a 1 % difference". All you did was literally just provide some summary statistics since you did a complete after the fact analysis rather than actually testing something. [/i]
pretty reprehensible post. i applaud the OP for being so levelheaded in dealing with responses of this kind
no, the original analysis isn't flawless. but i don't think that makes it meaningless. take from it what you will
|
Everything in this statistic is wrong becouse this stats are not enought to do such analyze... The fake balance come from less number zerg players and the fact that there are alot less zergs in diamond than terran. So from 100 players in div zergs are only 20 to be diamond zerg it takes alot more skill than Diamond terran. So in diamond there are very skilled terrans and not so skilled terran players. While diamond zergs should be alot better than some of the terrans. So the ladder system will advance to diamond zergs with higher skill level than, terrans. If u are avarage skill player and abusing terran get you to diamond where u face alot more skilled zerg players you bring down the whole diamond terran win%. So from this statistic you see balance but in fact this come from skills not from ingame balance. how many good players are in diamond and how many casual ?
User was warned for this post
|
On August 17 2010 17:59 stochastic wrote: i applaud the OP for being so levelheaded in dealing with responses of this kind
no, the original analysis isn't flawless. but i don't think that makes it meaningless. take from it what you will
Thanks, I'm trying but I did lose my head a little bit. I've edited the OP in a way that hopefully will cause people to react less agressively towards it and take it in the spirit in which it was originally intended.
I also don't want to spend much space in the post discussing the stats, because they're a little wonky and I had to fudge the numbers slightly due to the nature of the data set that I had access to. Choosing what data to use and how to set up the analysis was a little tricky given the nature of the data. Nothing I did should impact the overall findings, however. If people are genuinely curious than I can elaborate in the reply thread.
|
|
|
|