tl;dr: In double elimination brackets, a 1-0 advantage in Grand Finals for the team coming from Winner's increases the chance of the better team winning the tournament.
In Dota, double-elimination brackets are almost always used, and the grand finals are almost always Bo5. Tournaments have not agreed, however, on whether to give teams from the Winner's Bracket an advantage in Grand Finals. For example, when Alliance and Na`Vi played in the TI3 finals, the series started 0-0, but when those two teams played in Starladder Season 8, Alliance started up 1-0 because they came from the Winner's Bracket.
I'm under the impression that spectators don't like the 1-0 start, but some tournaments (D2CL and Starladder most notably) employ it nonetheless.
Being a massive nerd, I have these various brackets simulated in Excel, so I decided to do some tests and try to test how, in theory, a winner's bracket advantage affects the tournament outcome.
The best team doesn't always win a tournament. Dota is a game with a lot of variance involved, and it only takes a glance at Dota2lounge bet odds to see that. There is a 100% chance that Secret is a better team than M5, but the odds of Secret winning against M5 are not 100%. Nor is the chance of Secret winning a tournament against 7 other scrub teams 100%.
I think that an implicit goal of tournament organizers is to create a format where the best team has a good chance to win. Spectators generally want this. An uproar would surely result if a tournament advanced the second place team to bracket, rather than the first place team, or made the Grand Finals a Bo1. A caveat: spectators want to see good teams earn the win, which is probably why 1-0 advantages leave a bad taste in their mouths.
So if tournament organizers want to create a tournament format where the best team wins most often, spectators be damned, they should create a simulated bracket with teams assigned Elo values (representing "true" skill), run the simulation 10,000 times, and see how many times the best team won with (1) a 1-0 advantage in Grand Finals and (2) no advantage! Or let me do it.
First, I simulated a bracket with two good teams and a bunch of scrubs (1500 Elo, 1480, and a bunch of 1300s). The best team won 51.6% of the time without a Grand Finals advantage, and 52.2% with a 1-0 advantage. That's a 0.6% increase. (Note that the only number we really care about is the increase.)
Second, I simulated an Elo distribution that resembled TI4, meaning that there were a few teams clustered near the top and some semi-competitive teams just afterwards. Here we saw an increase of 1.7% in the best team's win chance from no advantage to 1-0 advantage.
Third, I simulated a very steady drop in Elo (1500, 1490, 1480, 1470...). With this distribution, the best team saw a 1.4% chance increase in winning.
To clarify: one thing to note about the above simulations is that I'm simulating the whole tournament, not the grand finals. In some runs of the simulation where the best team ended up winning, the team lost in Winner's and won GF coming from Loser's. In other runs, the team won Winner's and then won GF.
So with these different distributions of Elo, creating a 1-0 advantage increased the chance of the best team winning the whole tournament. I can't say for sure that that would be true for any combination of teams, but I think that's what these results imply. If y'all want me to test unusual Elo distributions or weird tournament formats (e.g. Bo5 WF instead of Bo3), ask in the comments.
The conclusion I derive from these results is this: if tournament organizers are concerned solely with creating a format where the best team wins, they should have GF with a 1-0 advantage. But the difference between formats seems small enough that, if I were an organizer, I would just keep doing what spectators want (no advantage).
-Winner team deserves it -Loser team deserves it anyway because they fell down yet showed great psychological strenght and managed to reach the finals anyways.
As you said, from a spectator point of vie starting 1-0 is meh :/
IMHO all e-sports should do what FGC already does and give the player coming from the winners a full match advantage. Although, scheduling and time issues would be a big problem. So I guess stick to 1 game advantage. I think allowing 1 team to drop a series and another not is unfair.
On March 14 2015 19:52 SoSexy wrote: My thinking for starting 0-0 is:
-Winner team deserves it -Loser team deserves it anyway because they fell down yet showed great psychological strenght and managed to reach the finals anyways.
As you said, from a spectator point of vie starting 1-0 is meh :/
The winning team does not deserve zero advantage. The losing team doesn't deserve it because they already got their second chance.
The whole point is that you have a DOUBLE elimination bracket in these tournaments... right up to the final game where you suddenly decide that it's single elimination. That means that all the hard work done by one team to not lose a single series is for nothing, as basically everything resets. The winner team should have an advantage because they've earned it by not losing.
The "other" way is to have two BoXs, where the losing team has to win both, the winning team only has to win one. That's the true double elimination right up to the end of the competition. What's so hard about just using that method? What impact does that also have on your calculations for differences, since that's the REAL way to complete a double elim tournament?
I'm more curious about how often the team from the Upper (winner's) bracket won the GF with a 1-0 lead compared to 0-0. - to me that's more important. (Why reward the team that's slightly better "on paper" than the team that's possibly already beat them.)
If it's just about "setting up for the best team to win" isn't a seeded single elimination bracket best?
On March 14 2015 19:52 SoSexy wrote: My thinking for starting 0-0 is:
-Winner team deserves it -Loser team deserves it anyway because they fell down yet showed great psychological strenght and managed to reach the finals anyways.
As you said, from a spectator point of vie starting 1-0 is meh :/
The winning team does not deserve zero advantage. The losing team doesn't deserve it because they already got their second chance.
The whole point is that you have a DOUBLE elimination bracket in these tournaments... right up to the final game where you suddenly decide that it's single elimination. That means that all the hard work done by one team to not lose a single series is for nothing, as basically everything resets. The winner team should have an advantage because they've earned it by not losing.
The "other" way is to have two BoXs, where the losing team has to win both, the winning team only has to win one. That's the true double elimination right up to the end of the competition. What's so hard about just using that method? What impact does that also have on your calculations for differences, since that's the REAL way to complete a double elim tournament?
This used to be done in some foreign BW tournaments. While I agree that this method would be the most fair for the team coming from the WB finals, it has a few glaring faults, which is generally why tournaments opt to not employ it and instead recompense the team with a 1-0 advantage. First of all, it takes a long, long time. Potentially forcing the teams to play 8 games (assuming a bo3 and a bo5), which for dota would mean a grand finals which could easily span the better part of 11-12 hours. To be honest, now that I think about it, knowing dota tournaments it would probably take 2 days at least. Taking that into consideration one can interpolate that it would probably also have a negative impact on the viewership/ad revenue to cost of the event ratio.
While it's desirable from a purely competitive standpoint, the logistical problems it'd pose to play that many additional games usually just make it so that tournament organisers shy away from it.
motbob can you also run simulations with single elimination? It would be interesting to see how the two double-elimination formats above compare to single elimination in odds of the best team winning the tournament.
Kupon and I had a nice discussion on LiquidDota about these simulations. He pointed out that, if teams have very different adaptation capabilities during a tournament, my definition of "best team" becomes questionable. Is the best team the team which started out with the best value, or the team that adapted to the "tourney meta" (especially important at TI/DAC) and performed the best at the end?
Kupon recommended that I change the simulation to reflect this possibility. It turns out that with a dramatic adaptation variable (teams have a 50% of being either "good" or "bad" adapters, gaining a constant 20 or 5 points per round, respectively, with 5 rounds), a 1-0 advantage system does hurt the best team's chance of winning if the best team is defined as the team with the highest initial Elo and also the 20 point adaptation. A lower adaptation variable (2.5/1) resulted in the "best team," similarly defined, benefiting from the 1-0 advantage.
On March 14 2015 23:19 micronesia wrote: motbob can you also run simulations with single elimination? It would be interesting to see how the two double-elimination formats above compare to single elimination in odds of the best team winning the tournament.
With 8 teams spaced 20 Elo apart each, it's a 2-3% difference between single and double elim.
Double elimination has quite a few problems and this is one of them, almost all real sports use a combination of round robin and single elimination and the only exception I can think of is college baseball.
There's also the problem of the 4-player group where player A beat player B, player B went 1-1 with a winning record in matches over player C, player C beat player D, and player A beat player D.
A > B > C > D and A and C advance.
Also in large bracket the player coming from the loser's bracket can end up playing twice as many games as the winner's bracket player, this creates a huge disparity in player fatigue.
On March 14 2015 23:44 motbob wrote: Kupon and I had a nice discussion on LiquidDota about these simulations. He pointed out that, if teams have very different adaptation capabilities during a tournament, my definition of "best team" becomes questionable. Is the best team the team which started out with the best value, or the team that adapted to the "tourney meta" (especially important at TI/DAC) and performed the best at the end?
Kupon recommended that I change the simulation to reflect this possibility. It turns out that with a dramatic adaptation variable (teams have a 50% of being either "good" or "bad" adapters, gaining a constant 20 or 5 points per round, respectively, with 5 rounds), a 1-0 advantage system does hurt the best team's chance of winning if the best team is defined as the team with the highest initial Elo and also the 20 point adaptation. A lower adaptation variable (2.5/1) resulted in the "best team," similarly defined, benefiting from the 1-0 advantage.
How about if the best team is defined as the one with the highest ELO after?
Yea, I also don't understand why double elim brackets end with a bo5 instead of two bo3s. It only changes scheduling in the worst case, and gives consistency across the entire bracket..
It changes the schedule from 3 to 5 games to 2 to 6 games. That's a lot. Plus people don't like it for the same reason they dislike the 1 game advantage.
I'm not entirely sure I understood your point. There are a few things I really can't grasp. Anyhow, don't try to sound smug here, my statistics knowledge is more than just limited and I'm not that great when it comes to mathematics.
First off, I don't really get the question behind it. Imo it doesn't matter what kind of mode you use for a tournament, the assumption that there is a "best" team will tell you that this best team will win more often than any other team, as long as the circumstances are even for all teams. That's like trivial. It should also be somewhat obvious that longer distances, in theory, support the better team.
Now you take ELO as measurement of skill, which in itself sounds kind of overcomplicated. Why not just align values from 0 (worst team in the tour) to 1 (best team). Basically, that's the idea, no? Might be my mathematics being strange. However, related to that point, I don't think the changes in the outcome of what you tried to calculate have any meaning to them. The distances in skill are arbitrary. I'm not even sure anyone could tell you what a difference of 10 points on the ELO scale would mean - for your tournament, for the entire player/team base or anything. You can only losely relate gaps in such a ranking. That being said, a change in the outcome of win% per mode in the range of 0.x - 2% seem... I don't know. Not much? Especially without T-Test behind it.
Let's put aside Elo chess assumptions being set up in a sample size as small as a one-off tournament in games that are not-chess not being reliable at all and go with this: you note an increase of .6%
That doesn't sound statistically significant, even with your highest stated increase. You do no testing to show whether it is. We have you rejecting the null hypothesis here without actually giving a good reason why.
On March 15 2015 07:14 itsjustatank wrote: ...you note an increase of .6% That doesn't sound statistically significant, even with your highest stated increase.
This was exactly my thought as I finished reading the OP, however I strongly believe that this topic warrants further testing and discussion because there is obviously dissension about whether the 1-0 advantage is necessary. The real question is"What is the real motivation behind the 1-0 advantage? Is it really to help the better team win or, as the OP suggested, is it actually beneficial because of the way the brackets and numbers work out?" Hopefully Motbob can hammer away and help us plebs figure out what's what.
I don't think there's any dissension here. If you read anything in the post, you should have read the conclusion: if I were a tournament organizer, I would stick with no advantage.
Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.
Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0 Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.
Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.
I still don't get it or why you needed math to make a point.
Like you start out with something like this:
You have a team, which is better than any other team participating
This team therefore wins with a higher likelihood against any other team
If the gap in between the "true skill" is not that large, the distances / modes a tournament uses gets important
That's somewhere in the blog already as far as I understood. What's left out is:
The longer distances (Bo3 vs. Bo9 etc) are, the more certain (? sry, English) it is the better team will win within one tournament
If every team plays exactly the same modes, the better team, under the assumption skill won't ever change, will win in more tournaments if you look at enough samples
Now something happens in your trail of thought. E.g. you want to ensure the best team wins, for whatever reasons possible. You entirely miss however, that as long as you don't drastically introduce one sided changes, any mode will support the best team already.
Like, it should be kind of obvious with a 1-0 advantage, that:
the best team will advance through the WB to the Grand Finals more often and therefore more often starts with a 1-0 lead
even in cases they need their second chance via the LB route to Grand Finals the better team has a somewhat larger chance to win with a 0-1 disadvantage
Grant you, it'd be propably interesting, from a very theorycrafting point of view, how much influence this 1-0 has. However, you will never know, even if you test your results (the differences you list). Why you already explained:
You can not possibly meassure skill
All indicators for skill do not tell you how much better a team is, even indirectly via ELO. There's always a large margin of error involved, those estimators operate with them. Hence, the statements like "twice as good" are just your very subjective view on that matter
Hence, it's not really suprising that your results mostly tell you that the better team wins more likely. That's all I could learn in what you wrote.
Disregard all that, it'd probably comes down to other points. People already pointed out that a DE format is designed to give a second chance. Therefore the only logical follow-up is to set up the Grand Finals as 0-0 and double Best of X. If the LB Team wins, they have to endure a second Grand Final Best of X - because the WB Team never got a second chance.
Since this takes much time - as pointed out - the 1-0 lead is in place, depending on the game. Setting it entirely to 0-0 is - tournament design wise - just silly.
Btw, if you're interested in the topic itself, try to google for interviews of Barry Hearn and the PTC Snooker series. He changed tons of professional billard tournaments to shorter distances (iirc Bo9-Bo17 to Bo7 only). He tries to explain why that is - without any math - and just summarizes it as: "it's the only way to get all games done in a short time frame".
Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."
It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.
you're not taking into account the massive psychological deficit of being down 0-2 in a bo5 compared to being down 0-1 in a bo5 with and without the extra game advantage. Turning a 0-2 is almost impossible while 0-1 is very possible.
On March 15 2015 08:57 itsjustatank wrote: Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.
Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0 Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.
Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.
What is the point of worrying about null/alternate hypotheses, usually? The normal case is this: we sat on the side of the curb all day and observed 200 people passing by. 120 of those people were male. Assuming (liberally) that this has been a completely typical day in terms of the composition of people passing by, can we take our 120/200 number and say that people who walk past the curb are more likely to be male than not? Or was what we saw dictated by random chance? We have to use statistical tests to get a P-value and thereby answer that question and see if we can reject the null.
In Excel, those considerations don't really make any sense because we can just increase the sample size to some absurd number. Imagine I simulate my exercise: I generate a random number and create a cell that returns 1 (for male) 51% of the time and 0 (female) 49% of the time. I then run the test 200 times. The test gives me 54.5%; a test with a 1000 "sample size" gave 52.8%; 10000, 51.5%; 50000, 51.036%. As the sample size gets larger and larger, the value observed converges to the "true" value of 51%.
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
i get the impression that a bo5 finals between two teams should be about how good they are against one another in a bo5 finals and NOT about how good they are in a bo5 finals where one team has a 1 game advantage for playing better during the earlier stages of the tourney
On March 15 2015 08:57 itsjustatank wrote: Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.
Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0 Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.
Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.
What is the point of worrying about null/alternate hypotheses, usually? The normal case is this: we sat on the side of the curb all day and observed 200 people passing by. 120 of those people were male. Assuming (liberally) that this has been a completely typical day in terms of the composition of people passing by, can we take our 120/200 number and say that people who walk past the curb are more likely to be male than not? Or was what we saw dictated by random chance? We have to use statistical tests to get a P-value and thereby answer that question and see if we can reject the null.
In Excel, those considerations don't really make any sense because we can just increase the sample size to some absurd number. Imagine I simulate my exercise: I generate a random number and create a cell that returns 1 (for male) 51% of the time and 0 (female) 49% of the time. I then run the test 200 times. The test gives me 54.5%; a test with a 1000 "sample size" gave 52.8%; 10000, 51.5%; 50000, 51.036%. As the sample size gets larger and larger, the value observed converges to the "true" value of 51%.
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
You do realize you're not flipping a simulated coin, but you're using estimators with assumptions, right?
On March 15 2015 08:57 itsjustatank wrote: Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.
Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0 Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.
Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.
What is the point of worrying about null/alternate hypotheses, usually? The normal case is this: we sat on the side of the curb all day and observed 200 people passing by. 120 of those people were male. Assuming (liberally) that this has been a completely typical day in terms of the composition of people passing by, can we take our 120/200 number and say that people who walk past the curb are more likely to be male than not? Or was what we saw dictated by random chance? We have to use statistical tests to get a P-value and thereby answer that question and see if we can reject the null.
In Excel, those considerations don't really make any sense because we can just increase the sample size to some absurd number. Imagine I simulate my exercise: I generate a random number and create a cell that returns 1 (for male) 51% of the time and 0 (female) 49% of the time. I then run the test 200 times. The test gives me 54.5%; a test with a 1000 "sample size" gave 52.8%; 10000, 51.5%; 50000, 51.036%. As the sample size gets larger and larger, the value observed converges to the "true" value of 51%.
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
You do realize you're not flipping a simulated coin, but you're using estimators with assumptions, right?
From my perspective a tournament is just a series of specifically weighted coin flips.
Yeah, but you use ELO to determine the skill, which uses rather strong assumptions, which makes stuff complicated. It's not really a fair coin toss or a fair dice throw that way. At least from my point of view. But w/e it's getting late.
you are computing and comparing multiple conditional probabilities based on arbitrary Elo distributions. there are a number of problems with this:
you don't just have an Elo arbitrarily, you maintain one through long-term play within a given population of players playing games that are similar to each other. Elo is not an absolute determination, it is an inference based on prior performance. your probability to win and lose and draw is dependent on that prior performance, and the make-up of the population.
Elo is supposed to be distributed normally because that is the fundamental assumption of player skill in that ratings system. this is compounded by the fact that you do not say how many teams are in the simulations, whether they are a sample from a population or whether they are the population. you also never say how many games they play in each stage. you just say they have a given distribution
the real world does not have infinite sample size or pre-arranged and cherrypicked Elo distributions. in the real world skill also isn't accurately determined by Elo. it is a best-guess estimator and it is pretty shitty in all implementations in ESPORTS right now.
im also fairly certain that you cannot draw in dota, and you cannot draw in most games other than starcraft and fighting games.
given this, we are not denying that there is an observed difference between the two. we are talking about whether that observation is significant. this is very important in the grand scheme of things.
at the point where you even admit this in your OP, there isn't much else to say.
On March 14 2015 17:17 motbob wrote:But the difference between formats seems small enough that, if I were an organizer, I would just keep doing what spectators want (no advantage).
at best you win that in your perfect little infinite compting boxes of imaginary players, it is perhaps a tiny bit better to have 1-0 start in the finals of a double-elimination tournament for the winners bracket player.
if it were significant though, then you would be doing more than just cloaking uncertainties with claims of certainties. you'd have a solid basis to go to every tournament designer and have them unfuck their systems. as it is, you don't.
On March 15 2015 09:13 Cheren wrote: Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."
It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.
In the absence of perfect seeding, double elim has obvious advantages if you care about more teams than just the winner. People sometimes talk about the "real finals" in tournaments like the GSL; sometimes the two best players land on one side of the bracket. If that's a problem, double elim fixes it.
On March 15 2015 09:27 motbob wrote: So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?
For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.
Nonetheless, the idea of the simulation is great! I love the approach.
On March 14 2015 17:17 motbob wrote: The conclusion I derive from these results is this: if tournament organizers are concerned solely with creating a format where the best team wins, they should have GF with a 1-0 advantage. But the difference between formats seems small enough that, if I were an organizer, I would just keep doing what spectators want (no advantage).
Nope, team from winners' side should need to win one bo3, team from losers' side two. It's not called "double elimination" for nothing.
On March 15 2015 09:27 motbob wrote: So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?
For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.
Nonetheless, the idea of the simulation is great! I love the approach.
I actually made exactly the same remark on the LiquidDota version of this blog . Errors and standard deviation are important, regardless of how many toys you run, at the very least so we can see how significant it is.
I'd also be interested in seeing the correlation between say, ELO difference between the top two teams and the top team win rate. You'd definitely expect some correlation, but if its too strongly correlated (or the reverse, I guess), then I'd say that there's a bias there, that you'd have to take into account when dealing with the significance of the results. Or do some reweighting in your monte carlo. I mean, maybe its a small thing, but it'd be nice to see.
Edit: my knowledge of statistics comes from particle physics, where we do some weird stuff that isn't necessarily, rigorously mathematically correct. And our monte carlo samples are often >500k events, and we still worry about statistical uncertainties (not to mention systematics, which might come into play here as part of your ELO definitions). Still want to see the errors, though
On March 15 2015 09:13 Cheren wrote: Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."
It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.
I'm sorry, I actually completely agree that double elimination shouldn't be used for serious competition. But when I started reading about Instant Runoff, it immediately struck me as a pretty sweet voting system. Why does it suck?
On March 15 2015 09:27 motbob wrote: So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?
For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.
Nonetheless, the idea of the simulation is great! I love the approach.
I actually made exactly the same remark on the LiquidDota version of this blog . Errors and standard deviation are important, regardless of how many toys you run, at the very least so we can see how significant it is.
I'd also be interested in seeing the correlation between say, ELO difference between the top two teams and the top team win rate. You'd definitely expect some correlation, but if its too strongly correlated (or the reverse, I guess), then I'd say that there's a bias there, that you'd have to take into account when dealing with the significance of the results. Or do some reweighting in your monte carlo. I mean, maybe its a small thing, but it'd be nice to see.
Edit: my knowledge of statistics comes from particle physics, where we do some weird stuff that isn't necessarily, rigorously mathematically correct. And our monte carlo samples are often >500k events, and we still worry about statistical uncertainties (not to mention systematics, which might come into play here as part of your ELO definitions). Still want to see the errors, though
Ahaha, I'm an (ex) particle physicist myself. :D wrote a minimum bias event generator. Qcd phenomenology essentially.
good to see the particle physics kind of thinking around. exactly what are you doing? (Did do?) You location is Switzerland, so I guess LHC?
On March 15 2015 09:13 Cheren wrote: Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."
It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.
I'm sorry, I actually completely agree that double elimination shouldn't be used for serious competition. But when I started reading about Instant Runoff, it immediately struck me as a pretty sweet voting system. Why does it suck?
IRV does not pick the Condorcet winner. Here, an example from Wikipedia:
IRV uses a process of elimination to assign each voter's ballot to their first choice among a dwindling list of remaining candidates until one candidate receives an outright majority of ballots. It does not comply with the Condorcet criterion. Consider, for example, the following vote count of preferences with three candidates {A,B,C}:
35 A>B>C 34 C>B>A 31 B>C>A
In this case, B is preferred to A by 65 votes to 35, and B is preferred to C by 66 to 34, hence B is strongly preferred to both A and C. B must then win according to the Condorcet criterion. Using the rules of IRV, B is ranked first by the fewest voters and is eliminated, and then C wins with the transferred votes from B.
In cases where there is a Condorcet Winner, and where IRV does not choose it, a majority would by definition prefer the Condorcet Winner to the IRV winner.
Just an additional remark about the statistics in the final set for a double elimination tournament:
Assuming we are running a double elimination bracket where all sets are best-of-threes. In the final game the winner of the winners bracket and the winner of the losers bracket meet. As pointed out earlier, the consistent choice of format would be a BO3, and in the case of the participant from the winners bracket losing, another BO3. A more common choice is the BO5 with an 1:0 advantage for the participant from the winners bracket.
Assuming further, that between the two competitors the chance of one of them winning is constant (like team A has a 60% chance of winning against B for all games), we can calculate the probabilites for the total sets.
The following graph shows the chance of winning the whole set for the team from the winners bracket dependent on their chance of winning the individual matches against the team from the losers bracket. The different curves show a standard BO3 and BO5, as well as the double elimination BO3 and the BO5 with winners bracket advantage.
The first observation is, that the BO5 with 1:0 advantage probability curve is similar to the double elimination BO3 curve, which makes it a viable choice as the final set in terms of consistency. The second observation is the huge advantage of the team from the winners bracket. Even with a 40% win chance against the team from the losers bracket in the individual matches, the overall chance of winning is still >50%.
Assuming that the chance of winning in a game like Dota is a constant is a very big assumption and one that cannot be made safely unless we are talking about a game that is about to be fixed and intentionally thrown or a card game like blackjack in which strength of hand can be seen and the next cards can be pretty safely predicted.
A team may be more likely to win, but the field of predicting human action is not reducible to numbers currently as much as we would love them to be and try. To pretend that we can is the height of arrogance and to tell others we can is to lie with statistics.
We can talk about likelihoods, but we must qualify that with a lot of uncertainty. If it is not qualified, it is lying.
While it's true that chance of winning is variable with time and depends on a variety of factors, it is unrealistic to try to model those variations. An example is calculating the odds of getting a 300 if you know your odds of getting a strike in bowling. When you get to frames 8, 9, 10, you most likely will get nervous (which can be exacerbated depending how the people around you react), affecting how you bowl. Of course, you are also getting more physically tired as the game progresses, and the conditions of the lane (oil) are slowly changing. The surface of your bowling ball(s) is also changing over time. On a given throw, any of those effects can have a positive or negative effect on your likelihood of throwing a strike.
You can use a simplified model and say the odds of getting a 300 are 1% if you throw strikes with a consistent success rate of about 68 percent. If you try to argue that the model does not account fully for the other variables described above, you are correct, but the only reasonable thing you can do is say there's not point in doing any calculation, then. Instead, we perform the calculation anyway and just acknowledge what was and was not modeled. It is still interesting to determine that you need a 68% chance of getting a strike to roll a 300 one game in 100.
edit: tank, the edit you made to your post while I was typing seems to already address what I was getting at
I didn't want to reduce the outcome of a match to purely statistics, but that is what we can calculate. That is, why I was pointing out all the assumptions made for this evaluation.
But we cannot deny that statistics plays a role. Why is the BO3 format preferred to a single match? Because the better team has a higher success rate.
Edit: We also know from experience that the winners bracket team has a big advantage. Many pointed out that this destroys the pleasure of watching a grand finale
On March 16 2015 07:05 micronesia wrote: While it's true that chance of winning is variable with time and depends on a variety of factors, it is unrealistic to try to model those variations. An example is calculating the odds of getting a 300 if you know your odds of getting a strike in bowling. When you get to frames 8, 9, 10, you most likely will get nervous (which can be exacerbated depending how the people around you react), affecting how you bowl. Of course, you are also getting more physically tired as the game progresses, and the conditions of the lane (oil) are slowly changing. The surface of your bowling ball(s) is also changing over time. On a given throw, any of those effects can have a positive or negative effect on your likelihood of throwing a strike.
You can use a simplified model and say the odds of getting a 300 are 1% if you throw strikes with a consistent success rate of about 68 percent. If you try to argue that the model does not account fully for the other variables described above, you are correct, but the only reasonable thing you can do is say there's not point in doing any calculation, then. Instead, we perform the calculation anyway and just acknowledge what was and was not modeled. It is still interesting to determine that you need a 68% chance of getting a strike to roll a 300 one game in 100.
edit: tank, the edit you made to your post while I was typing seems to already address what I was getting at
Yes, while I lean towards saying no one should predict, prediction is fine as long as you are being honest with people about what you are actually doing and what its shortcomings are.
EDIT: The obvious counterargument is that the layman cannot understand the methodology and therefore cannot make a reasoned judgment as to whether to accept the outcome. However, they can read the thread, in which you have called me an incompetent liar. So laymen do have the opportunity to make a reasoned judgment on something like this since they can observe your arguments.
The burden is on you when you present statistics as part of advocating a position to also present fully your methodology and the limitations of your design and the constraints in the applicability of your results.
On March 16 2015 10:38 itsjustatank wrote: The burden is on you when you present statistics as part of advocating a position to also present fully your methodology and the limitations of your design and the constraints in the applicability of your results.
I agree in principle, but you have to consider the media he publishes in, and how important the factors he leave out are.
As this is a gamers forum, I feel we can forgive him for not going into detail of the possible inaccuracies of the ELO system for example. His main point is likely not affected by that.
However, presenting very small differences between numbers without mentioning the uncertainty of the numbers is a big deal, as it can significantly change his point (for example to "I've been measuring nose").
This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair. Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...
On March 16 2015 16:47 ZenithM wrote: This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair. Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...
something you and the OP did was not differentiate between "best team" (going in) and "winner's bracket finalist". Granted, that "should" be the best team, but for the sake of statistics the "best team" doesn't always make the winner's side...
I'm still curious to see how often the highest elo team ends up winning if the tournament is single elimination or true double elimination (compared to the bo5 at 0-0 and 1-0).
On March 16 2015 18:46 ZenithM wrote: Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.
If you think that's the only meaningful use of the term "best player" is the winner brackets finalist, imho, I think you missed the entire point of the OP. Point is that the best team, as in the team having a larger than 50% probability to beat any other team (and THAT'S a useful definition of best), can end up in the losers bracket, and should then be given a chance to prove that they are indeed better than the winner brackets finalist.
If you claim that the winner brackets finalist is always the best team, then optimal way to have the best team win is to just give the tournament to the winner brackets finalist, ie single elimination.
I am bit confused I have to say, maybe I just misunderstand you...
On March 16 2015 18:46 ZenithM wrote: Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.
If you think that's the only meaningful use of the term "best player" is the winner brackets finalist, imho, I think you missed the entire point of the OP. Point is that the best team, as in the team having a larger than 50% probability to beat any other team (and THAT'S a useful definition of best), can end up in the losers bracket, and should then be given a chance to prove that they are indeed better than the winner brackets finalist.
If you claim that the winner brackets finalist is always the best team, then optimal way to have the best team win is to just give the tournament to the winner brackets finalist, ie single elimination.
I am bit confused I have to say, maybe I just misunderstand you...
On March 15 2015 09:27 motbob wrote: So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?
For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.
Nonetheless, the idea of the simulation is great! I love the approach.
I actually made exactly the same remark on the LiquidDota version of this blog . Errors and standard deviation are important, regardless of how many toys you run, at the very least so we can see how significant it is.
I'd also be interested in seeing the correlation between say, ELO difference between the top two teams and the top team win rate. You'd definitely expect some correlation, but if its too strongly correlated (or the reverse, I guess), then I'd say that there's a bias there, that you'd have to take into account when dealing with the significance of the results. Or do some reweighting in your monte carlo. I mean, maybe its a small thing, but it'd be nice to see.
Edit: my knowledge of statistics comes from particle physics, where we do some weird stuff that isn't necessarily, rigorously mathematically correct. And our monte carlo samples are often >500k events, and we still worry about statistical uncertainties (not to mention systematics, which might come into play here as part of your ELO definitions). Still want to see the errors, though
Ahaha, I'm an (ex) particle physicist myself. :D wrote a minimum bias event generator. Qcd phenomenology essentially.
good to see the particle physics kind of thinking around. exactly what are you doing? (Did do?) You location is Switzerland, so I guess LHC?
Oh cool! My masters project was writing an event generator for black hole events at the LHC, was fun. Now I do experimental stuff which is far less fun Yep, working on ATLAS, a little over halfway through my PhD. Looking for SUSY - I don't hold much hope for getting a positive result. xD
On March 16 2015 18:46 ZenithM wrote: Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.
If you think that's the only meaningful use of the term "best player" is the winner brackets finalist, imho, I think you missed the entire point of the OP. Point is that the best team, as in the team having a larger than 50% probability to beat any other team (and THAT'S a useful definition of best), can end up in the losers bracket, and should then be given a chance to prove that they are indeed better than the winner brackets finalist.
If you claim that the winner brackets finalist is always the best team, then optimal way to have the best team win is to just give the tournament to the winner brackets finalist, ie single elimination.
I am bit confused I have to say, maybe I just misunderstand you...
I think I understand what the OP wants to say, I just say that if you remove the 1-0 advantage, the "best team" (in the sense of "the one who didn't lose") is not rewarded at all for being the best that day, because its opponent has had the opportunity to lose once already, and this same opportunity is denied to the "best". As I understand it, the OP claims that the statistical difference in the chances that the best team wins (the best gameplay wise this time) is negligible compared to how badly the 1-0 advantage is perceived by the viewers. I'm just saying that if you remove that, the tournament becomes unfair, and certainly doesn't deserve to be called "double elimination". And back in the day, it wasn't even a 1-0 advantage, it was a full Bo5 advantage (back in early MLGs). Now that shit was sad to watch ;D
Interesting idea. I wasn't sure the conclusion would hold so I did a few calculations myself. I calculated the chances of reaching the final for a team assuming they have a chance P to win a single game vs anyone (and thus 1-P to lose). To determine if the 1-0 advantage is good or not for the best team it matters what the relative probabilities of reaching the final by the winner's or loser's bracket is. So i calculated it for tournaments of size 8, 16 and 32. First graph shows it where the red lines are for tournament size 8, the bumpy one being reaching by loser's bracket, the other one being by winner's bracket. Likewise blue for 16 teams and green for 32. (All matches being bo3). Basically the more rounds you have or the worse the team is the chance to reach the final by loser's bracket is relatively bigger. This means that for a bigger tournament even if you are a dominant team (60% to win a single game against any other team) you are still more likely to enter the final by loser's bracket than winner's bracket.
This also has the result that unless you are very dominant as team the format for the final doesn't matter much. Only for small tournaments (where double elimination is somewhat silly anyway) or if you are very dominant is it really a disadvantage to have the final start 1-0 up. The reason simply is that even the best team often enters the final by loser's bracket. The second graph shows the chance to win the whole tournament for a 8 team double elimination if your team has P chance (shown on X-axis) to win a single game. The red line is with a 0-0 starting final, the blue one with a 1-0 for WB team final.
Basically for determining the 'fair' winner it hardly matters, only a little if the team is very dominant to begin with.
As for discussion about the format, I think double elimination with 1-0 up in the final is fine for most tournaments. It's important for the tournament to be intersting and be a bit fair. Too much luck like single elimination can cause random or weak teams to reach too far too often, but a fairer system of round robin can last too long. Double elimination gives good chances for the best teams to come out on top, while still having the thrill of elimination. Round robin has useless matches, match throwing and all other sorts of problems, double elimination has exciting matches while still having good chances for a great final. The 1-0 up in the final is a decent method to give the WB team for having a bit of an advantage without it being too big. Double bo3 gives slightly bigger advantage but feels a bit sillier to me. For determining the fairest winner it doesn't matter much how the final is done and it's not even in a tournaments interest per se to do that. They want viewership and excitement, having the favourite roll over people stinks. You could argue tennis is doing much poorer and soccer is so popular because they are respectively too predictable and excitingly unpredictable.
You can always tell who has never taken a stats course past the high school level by who talks the loudest with the most strongly held opinions. It's a notoriously difficult field with conclusions drawn from studies usually being quite nuanced and qualified, but conclusions that are significant nonetheless.
Otherwise you get drawn into arguments with people who don't think N=25 is a significant sample size because it "feels low".
On March 16 2015 16:47 ZenithM wrote: This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair. Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...
something you and the OP did was not differentiate between "best team" (going in) and "winner's bracket finalist". Granted, that "should" be the best team, but for the sake of statistics the "best team" doesn't always make the winner's side...
I'm still curious to see how often the highest elo team ends up winning if the tournament is single elimination or true double elimination (compared to the bo5 at 0-0 and 1-0).
If the model only looked at who was the best team going in (statistically the highest WR) and who actually wins, the team that ends up in WF's is irrelevant. It gets into conditional probability that has nothing to do with the original hypothesis. It would be like flipping a coin 100 times to see if it's a fair coin and looking at if heads was ever flipped 10 times in a row as that would be evidence against the hypothesis even though the only relevant metric is the end number of heads and tails.
On March 17 2015 04:33 hariooo wrote: You can always tell who has never taken a stats course past the high school level by who talks the loudest with the most strongly held opinions. It's a notoriously difficult field with conclusions drawn from studies usually being quite nuanced and qualified, but conclusions that are significant nonetheless.
Otherwise you get drawn into arguments with people who don't think N=25 is a significant sample size because it "feels low".
On March 16 2015 16:47 ZenithM wrote: This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair. Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...
something you and the OP did was not differentiate between "best team" (going in) and "winner's bracket finalist". Granted, that "should" be the best team, but for the sake of statistics the "best team" doesn't always make the winner's side...
I'm still curious to see how often the highest elo team ends up winning if the tournament is single elimination or true double elimination (compared to the bo5 at 0-0 and 1-0).
If the model only looked at who was the best team going in (statistically the highest WR) and who actually wins, the team that ends up in WF's is irrelevant. It gets into conditional probability that has nothing to do with the original hypothesis. It would be like flipping a coin 100 times to see if it's a fair coin and looking at if heads was ever flipped 10 times in a row as that would be evidence against the hypothesis even though the only relevant metric is the end number of heads and tails.
Correct, the team that was the WF is irrelevant in the OP's calculations. That's my point. The entire issue with dual elimination revolves around how the WF faces elimination in the finals, not "does it give the best team the highest chance to win". + Show Spoiler +
although I am curious about THAT statistic in various formats
People could easily misunderstand the OP and think that "best team" is synonymous with WF and incorrectly conclude that 0-0 vs 1-0 starts are statistically fair.
Btw, if you're interested in the topic itself, try to google for interviews of Barry Hearn and the PTC Snooker series. He changed tons of professional billard tournaments to shorter distances (iirc Bo9-Bo17 to Bo7 only). He tries to explain why that is - without any math - and just summarizes it as: "it's the only way to get all games done in a short time frame".
Hearn (who's ruining snooker by the way) has the issue that doesn't exist with esports in that they're limited by logistics - they only have so many tables available and so much time. With esports you can play as many games concurrently as you like - at least until you get to an offline stage. of course if you do an online stage correctly you can allow for the offline games to be of a decent length
edit:
On March 15 2015 09:13 Cheren wrote: Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."
It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.
again, it is rarely used in normal sports because of logistical reasons as detailed above
Note that double elimination is much more reasonable for games where fortunate/unfortunate matchups in terms of characters/races in bracket are a bigger issue in terms of variation, especially fighters or games like SC. DOTA can quite reasonably start doing solo elimination brackets because you're given tools to alleviate that through the draft system.