• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 17:53
CEST 23:53
KST 06:53
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL20] Ro24 Preview Pt2: Take-Off6[ASL20] Ro24 Preview Pt1: Runway132v2 & SC: Evo Complete: Weekend Double Feature4Team Liquid Map Contest #21 - Presented by Monster Energy9uThermal's 2v2 Tour: $15,000 Main Event18
Community News
Weekly Cups (Aug 18-24): herO dethrones MaxPax5Maestros of The Game—$20k event w/ live finals in Paris30Weekly Cups (Aug 11-17): MaxPax triples again!13Weekly Cups (Aug 4-10): MaxPax wins a triple6SC2's Safe House 2 - October 18 & 195
StarCraft 2
General
Weekly Cups (Aug 18-24): herO dethrones MaxPax What mix of new and old maps do you want in the next 1v1 ladder pool? (SC2) : A Eulogy for the Six Pool Geoff 'iNcontroL' Robinson has passed away 2v2 & SC: Evo Complete: Weekend Double Feature
Tourneys
WardiTV Mondays Maestros of The Game—$20k event w/ live finals in Paris RSL: Revival, a new crowdfunded tournament series Sparkling Tuna Cup - Weekly Open Tournament Monday Nights Weeklies
Strategy
Custom Maps
External Content
Mutation # 488 What Goes Around Mutation # 487 Think Fast Mutation # 486 Watch the Skies Mutation # 485 Death from Below
Brood War
General
No Rain in ASL20? BW General Discussion Flash On His 2010 "God" Form, Mind Games, vs JD BGH Auto Balance -> http://bghmmr.eu/ [ASL20] Ro24 Preview Pt2: Take-Off
Tourneys
[ASL20] Ro24 Group E [Megathread] Daily Proleagues [ASL20] Ro24 Group D [ASL20] Ro24 Group B
Strategy
Simple Questions, Simple Answers Fighting Spirit mining rates [G] Mineral Boosting Muta micro map competition
Other Games
General Games
Stormgate/Frost Giant Megathread Nintendo Switch Thread General RTS Discussion Thread Dawn of War IV Path of Exile
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Vanilla Mini Mafia
Community
General
Things Aren’t Peaceful in Palestine Russo-Ukrainian War Thread US Politics Mega-thread The year 2050 European Politico-economics QA Mega-thread
Fan Clubs
INnoVation Fan Club SKT1 Classic Fan Club!
Media & Entertainment
Anime Discussion Thread Movie Discussion! [Manga] One Piece [\m/] Heavy Metal Thread
Sports
2024 - 2026 Football Thread TeamLiquid Health and Fitness Initiative For 2023 Formula 1 Discussion
World Cup 2022
Tech Support
High temperatures on bridge(s) Gtx660 graphics card replacement Installation of Windows 10 suck at "just a moment"
TL Community
The Automated Ban List TeamLiquid Team Shirt On Sale
Blogs
Evil Gacha Games and the…
ffswowsucks
Breaking the Meta: Non-Stand…
TrAiDoS
INDEPENDIENTE LA CTM
XenOsky
[Girl blog} My fema…
artosisisthebest
Sharpening the Filtration…
frozenclaw
ASL S20 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 3226 users

Winner's Advantage in Grand Finals

Blogs > motbob
Post a Reply
Normal
motbob
Profile Blog Joined July 2008
United States12546 Posts
March 14 2015 08:17 GMT
#1
tl;dr: In double elimination brackets, a 1-0 advantage in Grand Finals for the team coming from Winner's increases the chance of the better team winning the tournament.

In Dota, double-elimination brackets are almost always used, and the grand finals are almost always Bo5. Tournaments have not agreed, however, on whether to give teams from the Winner's Bracket an advantage in Grand Finals. For example, when Alliance and Na`Vi played in the TI3 finals, the series started 0-0, but when those two teams played in Starladder Season 8, Alliance started up 1-0 because they came from the Winner's Bracket.

I'm under the impression that spectators don't like the 1-0 start, but some tournaments (D2CL and Starladder most notably) employ it nonetheless.

Being a massive nerd, I have these various brackets simulated in Excel, so I decided to do some tests and try to test how, in theory, a winner's bracket advantage affects the tournament outcome.

The best team doesn't always win a tournament. Dota is a game with a lot of variance involved, and it only takes a glance at Dota2lounge bet odds to see that. There is a 100% chance that Secret is a better team than M5, but the odds of Secret winning against M5 are not 100%. Nor is the chance of Secret winning a tournament against 7 other scrub teams 100%.

I think that an implicit goal of tournament organizers is to create a format where the best team has a good chance to win. Spectators generally want this. An uproar would surely result if a tournament advanced the second place team to bracket, rather than the first place team, or made the Grand Finals a Bo1. A caveat: spectators want to see good teams earn the win, which is probably why 1-0 advantages leave a bad taste in their mouths.

So if tournament organizers want to create a tournament format where the best team wins most often, spectators be damned, they should create a simulated bracket with teams assigned Elo values (representing "true" skill), run the simulation 10,000 times, and see how many times the best team won with (1) a 1-0 advantage in Grand Finals and (2) no advantage! Or let me do it.

First, I simulated a bracket with two good teams and a bunch of scrubs (1500 Elo, 1480, and a bunch of 1300s). The best team won 51.6% of the time without a Grand Finals advantage, and 52.2% with a 1-0 advantage. That's a 0.6% increase. (Note that the only number we really care about is the increase.)

Second, I simulated an Elo distribution that resembled TI4, meaning that there were a few teams clustered near the top and some semi-competitive teams just afterwards. Here we saw an increase of 1.7% in the best team's win chance from no advantage to 1-0 advantage.

Third, I simulated a very steady drop in Elo (1500, 1490, 1480, 1470...). With this distribution, the best team saw a 1.4% chance increase in winning.

To clarify: one thing to note about the above simulations is that I'm simulating the whole tournament, not the grand finals. In some runs of the simulation where the best team ended up winning, the team lost in Winner's and won GF coming from Loser's. In other runs, the team won Winner's and then won GF.

So with these different distributions of Elo, creating a 1-0 advantage increased the chance of the best team winning the whole tournament. I can't say for sure that that would be true for any combination of teams, but I think that's what these results imply. If y'all want me to test unusual Elo distributions or weird tournament formats (e.g. Bo5 WF instead of Bo3), ask in the comments.

The conclusion I derive from these results is this: if tournament organizers are concerned solely with creating a format where the best team wins, they should have GF with a 1-0 advantage. But the difference between formats seems small enough that, if I were an organizer, I would just keep doing what spectators want (no advantage).

**
ModeratorGood content always wins.
SoSexy
Profile Blog Joined February 2011
Italy3725 Posts
March 14 2015 10:52 GMT
#2
My thinking for starting 0-0 is:

-Winner team deserves it
-Loser team deserves it anyway because they fell down yet showed great psychological strenght and managed to reach the finals anyways.

As you said, from a spectator point of vie starting 1-0 is meh :/
Dating thread on TL LUL
Ej_
Profile Blog Joined January 2013
47656 Posts
Last Edited: 2015-03-14 11:42:23
March 14 2015 11:25 GMT
#3
IMHO all e-sports should do what FGC already does and give the player coming from the winners a full match advantage. Although, scheduling and time issues would be a big problem. So I guess stick to 1 game advantage. I think allowing 1 team to drop a series and another not is unfair.
"Technically the dictionary has zero authority on the meaning or words" - Rodya
Lonyo
Profile Blog Joined December 2009
United Kingdom3884 Posts
Last Edited: 2015-03-14 11:39:16
March 14 2015 11:38 GMT
#4
On March 14 2015 19:52 SoSexy wrote:
My thinking for starting 0-0 is:

-Winner team deserves it
-Loser team deserves it anyway because they fell down yet showed great psychological strenght and managed to reach the finals anyways.

As you said, from a spectator point of vie starting 1-0 is meh :/

The winning team does not deserve zero advantage.
The losing team doesn't deserve it because they already got their second chance.

The whole point is that you have a DOUBLE elimination bracket in these tournaments... right up to the final game where you suddenly decide that it's single elimination.
That means that all the hard work done by one team to not lose a single series is for nothing, as basically everything resets.
The winner team should have an advantage because they've earned it by not losing.

The "other" way is to have two BoXs, where the losing team has to win both, the winning team only has to win one. That's the true double elimination right up to the end of the competition. What's so hard about just using that method?
What impact does that also have on your calculations for differences, since that's the REAL way to complete a double elim tournament?
HOLY CHECK!
y0su
Profile Blog Joined September 2011
Finland7871 Posts
March 14 2015 12:19 GMT
#5
I'm more curious about how often the team from the Upper (winner's) bracket won the GF with a 1-0 lead compared to 0-0. - to me that's more important. (Why reward the team that's slightly better "on paper" than the team that's possibly already beat them.)

If it's just about "setting up for the best team to win" isn't a seeded single elimination bracket best?
dismiss
Profile Blog Joined March 2009
United Kingdom3341 Posts
March 14 2015 13:27 GMT
#6
On March 14 2015 20:38 Lonyo wrote:
Show nested quote +
On March 14 2015 19:52 SoSexy wrote:
My thinking for starting 0-0 is:

-Winner team deserves it
-Loser team deserves it anyway because they fell down yet showed great psychological strenght and managed to reach the finals anyways.

As you said, from a spectator point of vie starting 1-0 is meh :/

The winning team does not deserve zero advantage.
The losing team doesn't deserve it because they already got their second chance.

The whole point is that you have a DOUBLE elimination bracket in these tournaments... right up to the final game where you suddenly decide that it's single elimination.
That means that all the hard work done by one team to not lose a single series is for nothing, as basically everything resets.
The winner team should have an advantage because they've earned it by not losing.

The "other" way is to have two BoXs, where the losing team has to win both, the winning team only has to win one. That's the true double elimination right up to the end of the competition. What's so hard about just using that method?
What impact does that also have on your calculations for differences, since that's the REAL way to complete a double elim tournament?

This used to be done in some foreign BW tournaments. While I agree that this method would be the most fair for the team coming from the WB finals, it has a few glaring faults, which is generally why tournaments opt to not employ it and instead recompense the team with a 1-0 advantage.
First of all, it takes a long, long time. Potentially forcing the teams to play 8 games (assuming a bo3 and a bo5), which for dota would mean a grand finals which could easily span the better part of 11-12 hours. To be honest, now that I think about it, knowing dota tournaments it would probably take 2 days at least.
Taking that into consideration one can interpolate that it would probably also have a negative impact on the viewership/ad revenue to cost of the event ratio.

While it's desirable from a purely competitive standpoint, the logistical problems it'd pose to play that many additional games usually just make it so that tournament organisers shy away from it.
Failure to improve posting standards will result in a lengthy ban. I <crms_> !dumb <GeoffAnderson> crmsdota <crms_> damnit
micronesia
Profile Blog Joined July 2006
United States24698 Posts
March 14 2015 14:19 GMT
#7
motbob can you also run simulations with single elimination? It would be interesting to see how the two double-elimination formats above compare to single elimination in odds of the best team winning the tournament.
ModeratorThere are animal crackers for people and there are people crackers for animals.
motbob
Profile Blog Joined July 2008
United States12546 Posts
Last Edited: 2015-03-14 14:45:28
March 14 2015 14:44 GMT
#8
Kupon and I had a nice discussion on LiquidDota about these simulations. He pointed out that, if teams have very different adaptation capabilities during a tournament, my definition of "best team" becomes questionable. Is the best team the team which started out with the best value, or the team that adapted to the "tourney meta" (especially important at TI/DAC) and performed the best at the end?

Kupon recommended that I change the simulation to reflect this possibility. It turns out that with a dramatic adaptation variable (teams have a 50% of being either "good" or "bad" adapters, gaining a constant 20 or 5 points per round, respectively, with 5 rounds), a 1-0 advantage system does hurt the best team's chance of winning if the best team is defined as the team with the highest initial Elo and also the 20 point adaptation. A lower adaptation variable (2.5/1) resulted in the "best team," similarly defined, benefiting from the 1-0 advantage.
ModeratorGood content always wins.
motbob
Profile Blog Joined July 2008
United States12546 Posts
March 14 2015 14:55 GMT
#9
On March 14 2015 23:19 micronesia wrote:
motbob can you also run simulations with single elimination? It would be interesting to see how the two double-elimination formats above compare to single elimination in odds of the best team winning the tournament.

With 8 teams spaced 20 Elo apart each, it's a 2-3% difference between single and double elim.
ModeratorGood content always wins.
Yorbon
Profile Joined December 2011
Netherlands4272 Posts
March 14 2015 15:45 GMT
#10
Are these changes significant? And how did you test?
Cheren
Profile Blog Joined September 2013
United States2911 Posts
Last Edited: 2015-03-14 18:06:38
March 14 2015 18:00 GMT
#11
Double elimination has quite a few problems and this is one of them, almost all real sports use a combination of round robin and single elimination and the only exception I can think of is college baseball.

There's also the problem of the 4-player group where player A beat player B, player B went 1-1 with a winning record in matches over player C, player C beat player D, and player A beat player D.

A > B > C > D and A and C advance.

Also in large bracket the player coming from the loser's bracket can end up playing twice as many games as the winner's bracket player, this creates a huge disparity in player fatigue.
y0su
Profile Blog Joined September 2011
Finland7871 Posts
March 14 2015 18:22 GMT
#12
On March 14 2015 23:44 motbob wrote:
Kupon and I had a nice discussion on LiquidDota about these simulations. He pointed out that, if teams have very different adaptation capabilities during a tournament, my definition of "best team" becomes questionable. Is the best team the team which started out with the best value, or the team that adapted to the "tourney meta" (especially important at TI/DAC) and performed the best at the end?

Kupon recommended that I change the simulation to reflect this possibility. It turns out that with a dramatic adaptation variable (teams have a 50% of being either "good" or "bad" adapters, gaining a constant 20 or 5 points per round, respectively, with 5 rounds), a 1-0 advantage system does hurt the best team's chance of winning if the best team is defined as the team with the highest initial Elo and also the 20 point adaptation. A lower adaptation variable (2.5/1) resulted in the "best team," similarly defined, benefiting from the 1-0 advantage.


How about if the best team is defined as the one with the highest ELO after?
Tephus
Profile Joined May 2011
Cascadia1753 Posts
March 14 2015 18:53 GMT
#13
Yea, I also don't understand why double elim brackets end with a bo5 instead of two bo3s. It only changes scheduling in the worst case, and gives consistency across the entire bracket..

Mind running a simulation for that?
AdministratorDirector of Esports
SKC
Profile Joined October 2010
Brazil18828 Posts
March 14 2015 20:39 GMT
#14
It changes the schedule from 3 to 5 games to 2 to 6 games. That's a lot. Plus people don't like it for the same reason they dislike the 1 game advantage.
GeckoXp
Profile Blog Joined June 2013
Germany2016 Posts
March 14 2015 20:57 GMT
#15
I'm not entirely sure I understood your point. There are a few things I really can't grasp. Anyhow, don't try to sound smug here, my statistics knowledge is more than just limited and I'm not that great when it comes to mathematics.

First off, I don't really get the question behind it. Imo it doesn't matter what kind of mode you use for a tournament, the assumption that there is a "best" team will tell you that this best team will win more often than any other team, as long as the circumstances are even for all teams. That's like trivial. It should also be somewhat obvious that longer distances, in theory, support the better team.

Now you take ELO as measurement of skill, which in itself sounds kind of overcomplicated. Why not just align values from 0 (worst team in the tour) to 1 (best team). Basically, that's the idea, no?
Might be my mathematics being strange.
However, related to that point, I don't think the changes in the outcome of what you tried to calculate have any meaning to them. The distances in skill are arbitrary. I'm not even sure anyone could tell you what a difference of 10 points on the ELO scale would mean - for your tournament, for the entire player/team base or anything. You can only losely relate gaps in such a ranking. That being said, a change in the outcome of win% per mode in the range of 0.x - 2% seem... I don't know. Not much? Especially without T-Test behind it.
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
Last Edited: 2015-03-14 22:27:24
March 14 2015 22:14 GMT
#16
Let's put aside Elo chess assumptions being set up in a sample size as small as a one-off tournament in games that are not-chess not being reliable at all and go with this: you note an increase of .6%

That doesn't sound statistically significant, even with your highest stated increase. You do no testing to show whether it is. We have you rejecting the null hypothesis here without actually giving a good reason why.

Edit: sniped by gecko. hi gecko.
Photographer"nosotros estamos backamos" - setsuko
FalconHoof
Profile Joined December 2012
Canada183 Posts
Last Edited: 2015-03-14 22:32:30
March 14 2015 22:32 GMT
#17
On March 15 2015 07:14 itsjustatank wrote:
...you note an increase of .6% That doesn't sound statistically significant, even with your highest stated increase.


This was exactly my thought as I finished reading the OP, however I strongly believe that this topic warrants further testing and discussion because there is obviously dissension about whether the 1-0 advantage is necessary. The real question is"What is the real motivation behind the 1-0 advantage? Is it really to help the better team win or, as the OP suggested, is it actually beneficial because of the way the brackets and numbers work out?" Hopefully Motbob can hammer away and help us plebs figure out what's what.
Masturbation this good deserves it's own foreplay.
motbob
Profile Blog Joined July 2008
United States12546 Posts
March 14 2015 23:16 GMT
#18
I don't think there's any dissension here. If you read anything in the post, you should have read the conclusion: if I were a tournament organizer, I would stick with no advantage.
ModeratorGood content always wins.
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
Last Edited: 2015-03-14 23:59:52
March 14 2015 23:57 GMT
#19
Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.

Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0

You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.

Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.
Photographer"nosotros estamos backamos" - setsuko
GeckoXp
Profile Blog Joined June 2013
Germany2016 Posts
March 15 2015 00:02 GMT
#20
I still don't get it or why you needed math to make a point.

Like you start out with something like this:

  • You have a team, which is better than any other team participating
  • This team therefore wins with a higher likelihood against any other team
  • If the gap in between the "true skill" is not that large, the distances / modes a tournament uses gets important


That's somewhere in the blog already as far as I understood. What's left out is:

  • The longer distances (Bo3 vs. Bo9 etc) are, the more certain (? sry, English) it is the better team will win within one tournament
  • If every team plays exactly the same modes, the better team, under the assumption skill won't ever change, will win in more tournaments if you look at enough samples


Now something happens in your trail of thought. E.g. you want to ensure the best team wins, for whatever reasons possible. You entirely miss however, that as long as you don't drastically introduce one sided changes, any mode will support the best team already.

Like, it should be kind of obvious with a 1-0 advantage, that:

  • the best team will advance through the WB to the Grand Finals more often and therefore more often starts with a 1-0 lead
  • even in cases they need their second chance via the LB route to Grand Finals the better team has a somewhat larger chance to win with a 0-1 disadvantage


Grant you, it'd be propably interesting, from a very theorycrafting point of view, how much influence this 1-0 has. However, you will never know, even if you test your results (the differences you list). Why you already explained:

  • You can not possibly meassure skill
  • All indicators for skill do not tell you how much better a team is, even indirectly via ELO. There's always a large margin of error involved, those estimators operate with them. Hence, the statements like "twice as good" are just your very subjective view on that matter


Hence, it's not really suprising that your results mostly tell you that the better team wins more likely. That's all I could learn in what you wrote.

Disregard all that, it'd probably comes down to other points. People already pointed out that a DE format is designed to give a second chance. Therefore the only logical follow-up is to set up the Grand Finals as 0-0 and double Best of X. If the LB Team wins, they have to endure a second Grand Final Best of X - because the WB Team never got a second chance.

Since this takes much time - as pointed out - the 1-0 lead is in place, depending on the game. Setting it entirely to 0-0 is - tournament design wise - just silly.


Btw, if you're interested in the topic itself, try to google for interviews of Barry Hearn and the PTC Snooker series. He changed tons of professional billard tournaments to shorter distances (iirc Bo9-Bo17 to Bo7 only). He tries to explain why that is - without any math - and just summarizes it as: "it's the only way to get all games done in a short time frame".
Cheren
Profile Blog Joined September 2013
United States2911 Posts
March 15 2015 00:13 GMT
#21
Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."

It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.
+
sertas
Profile Joined April 2012
Sweden887 Posts
March 15 2015 00:16 GMT
#22
you're not taking into account the massive psychological deficit of being down 0-2 in a bo5 compared to being down 0-1 in a bo5 with and without the extra game advantage. Turning a 0-2 is almost impossible while 0-1 is very possible.
motbob
Profile Blog Joined July 2008
United States12546 Posts
Last Edited: 2015-03-15 00:28:22
March 15 2015 00:27 GMT
#23
On March 15 2015 08:57 itsjustatank wrote:
Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.

Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0

You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.

Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.

What is the point of worrying about null/alternate hypotheses, usually? The normal case is this: we sat on the side of the curb all day and observed 200 people passing by. 120 of those people were male. Assuming (liberally) that this has been a completely typical day in terms of the composition of people passing by, can we take our 120/200 number and say that people who walk past the curb are more likely to be male than not? Or was what we saw dictated by random chance? We have to use statistical tests to get a P-value and thereby answer that question and see if we can reject the null.

In Excel, those considerations don't really make any sense because we can just increase the sample size to some absurd number. Imagine I simulate my exercise: I generate a random number and create a cell that returns 1 (for male) 51% of the time and 0 (female) 49% of the time. I then run the test 200 times. The test gives me 54.5%; a test with a 1000 "sample size" gave 52.8%; 10000, 51.5%; 50000, 51.036%. As the sample size gets larger and larger, the value observed converges to the "true" value of 51%.

So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.
ModeratorGood content always wins.
FFGenerations
Profile Blog Joined April 2011
7088 Posts
Last Edited: 2015-03-15 00:41:34
March 15 2015 00:41 GMT
#24
i get the impression that a bo5 finals between two teams should be about how good they are against one another in a bo5 finals and NOT about how good they are in a bo5 finals where one team has a 1 game advantage for playing better during the earlier stages of the tourney
Cool BW Music Vid - youtube.com/watch?v=W54nlqJ-Nx8 ~~~~~ ᕤ OYSTERS ᕤ CLAMS ᕤ AND ᕤ CUCKOLDS ᕤ ~~~~~~ ༼ ᕤ◕◡◕ ༽ᕤ PUNCH HIM ༼ ᕤ◕◡◕ ༽ᕤ
GeckoXp
Profile Blog Joined June 2013
Germany2016 Posts
March 15 2015 00:44 GMT
#25
On March 15 2015 09:27 motbob wrote:
Show nested quote +
On March 15 2015 08:57 itsjustatank wrote:
Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.

Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0

You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.

Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.

What is the point of worrying about null/alternate hypotheses, usually? The normal case is this: we sat on the side of the curb all day and observed 200 people passing by. 120 of those people were male. Assuming (liberally) that this has been a completely typical day in terms of the composition of people passing by, can we take our 120/200 number and say that people who walk past the curb are more likely to be male than not? Or was what we saw dictated by random chance? We have to use statistical tests to get a P-value and thereby answer that question and see if we can reject the null.

In Excel, those considerations don't really make any sense because we can just increase the sample size to some absurd number. Imagine I simulate my exercise: I generate a random number and create a cell that returns 1 (for male) 51% of the time and 0 (female) 49% of the time. I then run the test 200 times. The test gives me 54.5%; a test with a 1000 "sample size" gave 52.8%; 10000, 51.5%; 50000, 51.036%. As the sample size gets larger and larger, the value observed converges to the "true" value of 51%.

So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.


You do realize you're not flipping a simulated coin, but you're using estimators with assumptions, right?
motbob
Profile Blog Joined July 2008
United States12546 Posts
March 15 2015 00:50 GMT
#26
On March 15 2015 09:44 GeckoXp wrote:
Show nested quote +
On March 15 2015 09:27 motbob wrote:
On March 15 2015 08:57 itsjustatank wrote:
Your generated Elo predictions based on arbitrary distribution choices resulted in differences that do not seem statistically significant. You do no test to prove that they are statistically significant, you just give the differences in observed percentages.

Null hypothesis: there is no statistically significant difference between starting a double-elimination finals 1-0 versus 0-0
Alternate hypothesis: there is a statistically significant difference between starting a double-elimination finals 1-0 versus 0-0

You have not proven whether or not what you got is noise and whether or not there really is a difference between a 1-0 start and a 0-0 start. You just want one of the two, clearly, and think this is enough to want to make a change.

Your argument is completely non-falsifiable right now. Sure, it may work for the internet, but unless you do that extra work you are pissing in the wind with a cloak of statistics making your advocacy look smart to people who do not know what they are reading.

What is the point of worrying about null/alternate hypotheses, usually? The normal case is this: we sat on the side of the curb all day and observed 200 people passing by. 120 of those people were male. Assuming (liberally) that this has been a completely typical day in terms of the composition of people passing by, can we take our 120/200 number and say that people who walk past the curb are more likely to be male than not? Or was what we saw dictated by random chance? We have to use statistical tests to get a P-value and thereby answer that question and see if we can reject the null.

In Excel, those considerations don't really make any sense because we can just increase the sample size to some absurd number. Imagine I simulate my exercise: I generate a random number and create a cell that returns 1 (for male) 51% of the time and 0 (female) 49% of the time. I then run the test 200 times. The test gives me 54.5%; a test with a 1000 "sample size" gave 52.8%; 10000, 51.5%; 50000, 51.036%. As the sample size gets larger and larger, the value observed converges to the "true" value of 51%.

So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.


You do realize you're not flipping a simulated coin, but you're using estimators with assumptions, right?

From my perspective a tournament is just a series of specifically weighted coin flips.
ModeratorGood content always wins.
GeckoXp
Profile Blog Joined June 2013
Germany2016 Posts
Last Edited: 2015-03-15 00:54:25
March 15 2015 00:52 GMT
#27
Yeah, but you use ELO to determine the skill, which uses rather strong assumptions, which makes stuff complicated. It's not really a fair coin toss or a fair dice throw that way. At least from my point of view. But w/e it's getting late.

edit, the a in toss / throw means single. 8[
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
Last Edited: 2015-03-15 01:36:12
March 15 2015 01:31 GMT
#28
you are computing and comparing multiple conditional probabilities based on arbitrary Elo distributions. there are a number of problems with this:
  1. you don't just have an Elo arbitrarily, you maintain one through long-term play within a given population of players playing games that are similar to each other. Elo is not an absolute determination, it is an inference based on prior performance. your probability to win and lose and draw is dependent on that prior performance, and the make-up of the population.

  2. Elo is supposed to be distributed normally because that is the fundamental assumption of player skill in that ratings system. this is compounded by the fact that you do not say how many teams are in the simulations, whether they are a sample from a population or whether they are the population. you also never say how many games they play in each stage. you just say they have a given distribution

  3. the real world does not have infinite sample size or pre-arranged and cherrypicked Elo distributions. in the real world skill also isn't accurately determined by Elo. it is a best-guess estimator and it is pretty shitty in all implementations in ESPORTS right now.

  4. im also fairly certain that you cannot draw in dota, and you cannot draw in most games other than starcraft and fighting games.
given this, we are not denying that there is an observed difference between the two. we are talking about whether that observation is significant. this is very important in the grand scheme of things.

at the point where you even admit this in your OP, there isn't much else to say.

On March 14 2015 17:17 motbob wrote:But the difference between formats seems small enough that, if I were an organizer, I would just keep doing what spectators want (no advantage).


at best you win that in your perfect little infinite compting boxes of imaginary players, it is perhaps a tiny bit better to have 1-0 start in the finals of a double-elimination tournament for the winners bracket player.

if it were significant though, then you would be doing more than just cloaking uncertainties with claims of certainties. you'd have a solid basis to go to every tournament designer and have them unfuck their systems. as it is, you don't.
Photographer"nosotros estamos backamos" - setsuko
motbob
Profile Blog Joined July 2008
United States12546 Posts
March 15 2015 01:57 GMT
#29
Pretty harsh! Good thing we agree on the real-world conclusions to be drawn from this.
ModeratorGood content always wins.
motbob
Profile Blog Joined July 2008
United States12546 Posts
March 15 2015 02:45 GMT
#30
On March 15 2015 09:13 Cheren wrote:
Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."

It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.

In the absence of perfect seeding, double elim has obvious advantages if you care about more teams than just the winner. People sometimes talk about the "real finals" in tournaments like the GSL; sometimes the two best players land on one side of the bracket. If that's a problem, double elim fixes it.
ModeratorGood content always wins.
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
Last Edited: 2015-03-15 08:49:07
March 15 2015 08:39 GMT
#31
On March 15 2015 09:27 motbob wrote:
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.

Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?

For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.

Nonetheless, the idea of the simulation is great! I love the approach.
MysteryMeat1
Profile Blog Joined June 2011
United States3292 Posts
March 15 2015 10:07 GMT
#32
In competitive sports the advantage is that you have to play less games and have an easier time to get to the finals.
"Cause ya know, Style before victory." -The greatest mafia player alive
Lucumo
Profile Joined January 2010
6850 Posts
March 15 2015 11:40 GMT
#33
On March 14 2015 17:17 motbob wrote:
The conclusion I derive from these results is this: if tournament organizers are concerned solely with creating a format where the best team wins, they should have GF with a 1-0 advantage. But the difference between formats seems small enough that, if I were an organizer, I would just keep doing what spectators want (no advantage).

Nope, team from winners' side should need to win one bo3, team from losers' side two. It's not called "double elimination" for nothing.
eonrulz
Profile Blog Joined March 2013
United Kingdom225 Posts
Last Edited: 2015-03-15 13:45:03
March 15 2015 13:34 GMT
#34
On March 15 2015 17:39 Cascade wrote:
Show nested quote +
On March 15 2015 09:27 motbob wrote:
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.

Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?

For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.

Nonetheless, the idea of the simulation is great! I love the approach.


I actually made exactly the same remark on the LiquidDota version of this blog . Errors and standard deviation are important, regardless of how many toys you run, at the very least so we can see how significant it is.

I'd also be interested in seeing the correlation between say, ELO difference between the top two teams and the top team win rate. You'd definitely expect some correlation, but if its too strongly correlated (or the reverse, I guess), then I'd say that there's a bias there, that you'd have to take into account when dealing with the significance of the results. Or do some reweighting in your monte carlo. I mean, maybe its a small thing, but it'd be nice to see.

Edit: my knowledge of statistics comes from particle physics, where we do some weird stuff that isn't necessarily, rigorously mathematically correct. And our monte carlo samples are often >500k events, and we still worry about statistical uncertainties (not to mention systematics, which might come into play here as part of your ELO definitions). Still want to see the errors, though
Boop!
Liquid`Drone
Profile Joined September 2002
Norway28677 Posts
March 15 2015 18:04 GMT
#35
On March 15 2015 09:13 Cheren wrote:
Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."

It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.


I'm sorry, I actually completely agree that double elimination shouldn't be used for serious competition. But when I started reading about Instant Runoff, it immediately struck me as a pretty sweet voting system. Why does it suck?
Moderator
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
March 15 2015 21:29 GMT
#36
On March 15 2015 22:34 eonrulz wrote:
Show nested quote +
On March 15 2015 17:39 Cascade wrote:
On March 15 2015 09:27 motbob wrote:
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.

Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?

For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.

Nonetheless, the idea of the simulation is great! I love the approach.


I actually made exactly the same remark on the LiquidDota version of this blog . Errors and standard deviation are important, regardless of how many toys you run, at the very least so we can see how significant it is.

I'd also be interested in seeing the correlation between say, ELO difference between the top two teams and the top team win rate. You'd definitely expect some correlation, but if its too strongly correlated (or the reverse, I guess), then I'd say that there's a bias there, that you'd have to take into account when dealing with the significance of the results. Or do some reweighting in your monte carlo. I mean, maybe its a small thing, but it'd be nice to see.

Edit: my knowledge of statistics comes from particle physics, where we do some weird stuff that isn't necessarily, rigorously mathematically correct. And our monte carlo samples are often >500k events, and we still worry about statistical uncertainties (not to mention systematics, which might come into play here as part of your ELO definitions). Still want to see the errors, though

Ahaha, I'm an (ex) particle physicist myself. :D wrote a minimum bias event generator. Qcd phenomenology essentially.

good to see the particle physics kind of thinking around. exactly what are you doing? (Did do?) You location is Switzerland, so I guess LHC?
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
Last Edited: 2015-03-15 21:33:40
March 15 2015 21:32 GMT
#37
On March 16 2015 03:04 Liquid`Drone wrote:
Show nested quote +
On March 15 2015 09:13 Cheren wrote:
Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."

It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.


I'm sorry, I actually completely agree that double elimination shouldn't be used for serious competition. But when I started reading about Instant Runoff, it immediately struck me as a pretty sweet voting system. Why does it suck?


IRV does not pick the Condorcet winner. Here, an example from Wikipedia:

IRV uses a process of elimination to assign each voter's ballot to their first choice among a dwindling list of remaining candidates until one candidate receives an outright majority of ballots. It does not comply with the Condorcet criterion. Consider, for example, the following vote count of preferences with three candidates {A,B,C}:

      35 A>B>C
      34 C>B>A
      31 B>C>A

In this case, B is preferred to A by 65 votes to 35, and B is preferred to C by 66 to 34, hence B is strongly preferred to both A and C. B must then win according to the Condorcet criterion. Using the rules of IRV, B is ranked first by the fewest voters and is eliminated, and then C wins with the transferred votes from B.

In cases where there is a Condorcet Winner, and where IRV does not choose it, a majority would by definition prefer the Condorcet Winner to the IRV winner.


STV (single-transferable vote) does a better job.
Photographer"nosotros estamos backamos" - setsuko
deliberate
Profile Joined November 2009
Germany5 Posts
Last Edited: 2015-03-15 21:50:52
March 15 2015 21:48 GMT
#38
Just an additional remark about the statistics in the final set for a double elimination tournament:

Assuming we are running a double elimination bracket where all sets are best-of-threes. In the final game the winner of the winners bracket and the winner of the losers bracket meet. As pointed out earlier, the consistent choice of format would be a BO3, and in the case of the participant from the winners bracket losing, another BO3. A more common choice is the BO5 with an 1:0 advantage for the participant from the winners bracket.

Assuming further, that between the two competitors the chance of one of them winning is constant (like team A has a 60% chance of winning against B for all games), we can calculate the probabilites for the total sets.

The following graph shows the chance of winning the whole set for the team from the winners bracket dependent on their chance of winning the individual matches against the team from the losers bracket. The different curves show a standard BO3 and BO5, as well as the double elimination BO3 and the BO5 with winners bracket advantage.

[image loading]

The first observation is, that the BO5 with 1:0 advantage probability curve is similar to the double elimination BO3 curve, which makes it a viable choice as the final set in terms of consistency.
The second observation is the huge advantage of the team from the winners bracket. Even with a 40% win chance against the team from the losers bracket in the individual matches, the overall chance of winning is still >50%.
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
Last Edited: 2015-03-15 22:03:01
March 15 2015 21:58 GMT
#39
Assuming that the chance of winning in a game like Dota is a constant is a very big assumption and one that cannot be made safely unless we are talking about a game that is about to be fixed and intentionally thrown or a card game like blackjack in which strength of hand can be seen and the next cards can be pretty safely predicted.

A team may be more likely to win, but the field of predicting human action is not reducible to numbers currently as much as we would love them to be and try. To pretend that we can is the height of arrogance and to tell others we can is to lie with statistics.

We can talk about likelihoods, but we must qualify that with a lot of uncertainty. If it is not qualified, it is lying.
Photographer"nosotros estamos backamos" - setsuko
micronesia
Profile Blog Joined July 2006
United States24698 Posts
Last Edited: 2015-03-15 22:05:46
March 15 2015 22:05 GMT
#40
While it's true that chance of winning is variable with time and depends on a variety of factors, it is unrealistic to try to model those variations. An example is calculating the odds of getting a 300 if you know your odds of getting a strike in bowling. When you get to frames 8, 9, 10, you most likely will get nervous (which can be exacerbated depending how the people around you react), affecting how you bowl. Of course, you are also getting more physically tired as the game progresses, and the conditions of the lane (oil) are slowly changing. The surface of your bowling ball(s) is also changing over time. On a given throw, any of those effects can have a positive or negative effect on your likelihood of throwing a strike.

You can use a simplified model and say the odds of getting a 300 are 1% if you throw strikes with a consistent success rate of about 68 percent. If you try to argue that the model does not account fully for the other variables described above, you are correct, but the only reasonable thing you can do is say there's not point in doing any calculation, then. Instead, we perform the calculation anyway and just acknowledge what was and was not modeled. It is still interesting to determine that you need a 68% chance of getting a strike to roll a 300 one game in 100.

edit: tank, the edit you made to your post while I was typing seems to already address what I was getting at
ModeratorThere are animal crackers for people and there are people crackers for animals.
deliberate
Profile Joined November 2009
Germany5 Posts
Last Edited: 2015-03-15 22:11:56
March 15 2015 22:07 GMT
#41
I didn't want to reduce the outcome of a match to purely statistics, but that is what we can calculate. That is, why I was pointing out all the assumptions made for this evaluation.

But we cannot deny that statistics plays a role. Why is the BO3 format preferred to a single match? Because the better team has a higher success rate.

Edit: We also know from experience that the winners bracket team has a big advantage. Many pointed out that this destroys the pleasure of watching a grand finale
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
Last Edited: 2015-03-15 22:09:00
March 15 2015 22:08 GMT
#42
On March 16 2015 07:05 micronesia wrote:
While it's true that chance of winning is variable with time and depends on a variety of factors, it is unrealistic to try to model those variations. An example is calculating the odds of getting a 300 if you know your odds of getting a strike in bowling. When you get to frames 8, 9, 10, you most likely will get nervous (which can be exacerbated depending how the people around you react), affecting how you bowl. Of course, you are also getting more physically tired as the game progresses, and the conditions of the lane (oil) are slowly changing. The surface of your bowling ball(s) is also changing over time. On a given throw, any of those effects can have a positive or negative effect on your likelihood of throwing a strike.

You can use a simplified model and say the odds of getting a 300 are 1% if you throw strikes with a consistent success rate of about 68 percent. If you try to argue that the model does not account fully for the other variables described above, you are correct, but the only reasonable thing you can do is say there's not point in doing any calculation, then. Instead, we perform the calculation anyway and just acknowledge what was and was not modeled. It is still interesting to determine that you need a 68% chance of getting a strike to roll a 300 one game in 100.

edit: tank, the edit you made to your post while I was typing seems to already address what I was getting at


Yes, while I lean towards saying no one should predict, prediction is fine as long as you are being honest with people about what you are actually doing and what its shortcomings are.
Photographer"nosotros estamos backamos" - setsuko
motbob
Profile Blog Joined July 2008
United States12546 Posts
Last Edited: 2015-03-15 22:46:02
March 15 2015 22:21 GMT
#43
It is not as if the methodology is a black box.

EDIT: The obvious counterargument is that the layman cannot understand the methodology and therefore cannot make a reasoned judgment as to whether to accept the outcome. However, they can read the thread, in which you have called me an incompetent liar. So laymen do have the opportunity to make a reasoned judgment on something like this since they can observe your arguments.
ModeratorGood content always wins.
itsjustatank
Profile Blog Joined November 2010
Hong Kong9154 Posts
March 16 2015 01:38 GMT
#44
The burden is on you when you present statistics as part of advocating a position to also present fully your methodology and the limitations of your design and the constraints in the applicability of your results.
Photographer"nosotros estamos backamos" - setsuko
Orcasgt24
Profile Joined August 2011
Canada3238 Posts
Last Edited: 2015-03-16 03:42:45
March 16 2015 03:39 GMT
#45
Nevermind. I found the answer in the thread.
In Hearthstone we pray to RNGesus. When Yogg-Saron hits the field, RNGod gets to work
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
March 16 2015 03:51 GMT
#46
On March 16 2015 10:38 itsjustatank wrote:
The burden is on you when you present statistics as part of advocating a position to also present fully your methodology and the limitations of your design and the constraints in the applicability of your results.

I agree in principle, but you have to consider the media he publishes in, and how important the factors he leave out are.

As this is a gamers forum, I feel we can forgive him for not going into detail of the possible inaccuracies of the ELO system for example. His main point is likely not affected by that.

However, presenting very small differences between numbers without mentioning the uncertainty of the numbers is a big deal, as it can significantly change his point (for example to "I've been measuring nose").
ZenithM
Profile Joined February 2011
France15952 Posts
Last Edited: 2015-03-16 07:52:04
March 16 2015 07:47 GMT
#47
This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair.
Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...
y0su
Profile Blog Joined September 2011
Finland7871 Posts
March 16 2015 09:18 GMT
#48
On March 16 2015 16:47 ZenithM wrote:
This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair.
Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...

something you and the OP did was not differentiate between "best team" (going in) and "winner's bracket finalist". Granted, that "should" be the best team, but for the sake of statistics the "best team" doesn't always make the winner's side...

I'm still curious to see how often the highest elo team ends up winning if the tournament is single elimination or true double elimination (compared to the bo5 at 0-0 and 1-0).
ZenithM
Profile Joined February 2011
France15952 Posts
Last Edited: 2015-03-16 09:46:55
March 16 2015 09:46 GMT
#49
Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
March 16 2015 10:14 GMT
#50
On March 16 2015 18:46 ZenithM wrote:
Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.

If you think that's the only meaningful use of the term "best player" is the winner brackets finalist, imho, I think you missed the entire point of the OP. Point is that the best team, as in the team having a larger than 50% probability to beat any other team (and THAT'S a useful definition of best), can end up in the losers bracket, and should then be given a chance to prove that they are indeed better than the winner brackets finalist.

If you claim that the winner brackets finalist is always the best team, then optimal way to have the best team win is to just give the tournament to the winner brackets finalist, ie single elimination.

I am bit confused I have to say, maybe I just misunderstand you...
y0su
Profile Blog Joined September 2011
Finland7871 Posts
March 16 2015 11:13 GMT
#51
On March 16 2015 19:14 Cascade wrote:
Show nested quote +
On March 16 2015 18:46 ZenithM wrote:
Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.

If you think that's the only meaningful use of the term "best player" is the winner brackets finalist, imho, I think you missed the entire point of the OP. Point is that the best team, as in the team having a larger than 50% probability to beat any other team (and THAT'S a useful definition of best), can end up in the losers bracket, and should then be given a chance to prove that they are indeed better than the winner brackets finalist.

If you claim that the winner brackets finalist is always the best team, then optimal way to have the best team win is to just give the tournament to the winner brackets finalist, ie single elimination.

I am bit confused I have to say, maybe I just misunderstand you...

He was responding to me bringing up that point.
eonrulz
Profile Blog Joined March 2013
United Kingdom225 Posts
March 16 2015 11:38 GMT
#52
On March 16 2015 06:29 Cascade wrote:
Show nested quote +
On March 15 2015 22:34 eonrulz wrote:
On March 15 2015 17:39 Cascade wrote:
On March 15 2015 09:27 motbob wrote:
So in this context, an appropriate objection isn't "you didn't do a proper statistical test" because we don't care about inferences and P-values here. We can get the true value, or approach it, just by cranking up the number of simulation runs.

Umm, yeah, you kinda have to do some kind of statistical test, or at least convince us in some way that your numbers are accurate enough so that we feel confident that the differences you quote are more than random noise. We can never get the true value by simulation (infinite accuracy computer simulations with infinite computing time have some practical issues unfortunately. Especially in excel. ), but we can often get close enough with enough computing time. it is incredibly important that you make sure that you actually are putting in enough computing time to get sufficiently accurate numbers out. Did you?

For example, in your first example of 51.6% vs 52.2% from 10k runs. This seems to be close enough to flipping a coin, which will have an error of around 1/sqrt(N), which for 10k runs is 1% relative uncertainty, which is exactly the difference you are seeing. So I think I need some convincing that the differences you are quoting are more than just numerical noise. Let me know if you need help.

Nonetheless, the idea of the simulation is great! I love the approach.


I actually made exactly the same remark on the LiquidDota version of this blog . Errors and standard deviation are important, regardless of how many toys you run, at the very least so we can see how significant it is.

I'd also be interested in seeing the correlation between say, ELO difference between the top two teams and the top team win rate. You'd definitely expect some correlation, but if its too strongly correlated (or the reverse, I guess), then I'd say that there's a bias there, that you'd have to take into account when dealing with the significance of the results. Or do some reweighting in your monte carlo. I mean, maybe its a small thing, but it'd be nice to see.

Edit: my knowledge of statistics comes from particle physics, where we do some weird stuff that isn't necessarily, rigorously mathematically correct. And our monte carlo samples are often >500k events, and we still worry about statistical uncertainties (not to mention systematics, which might come into play here as part of your ELO definitions). Still want to see the errors, though

Ahaha, I'm an (ex) particle physicist myself. :D wrote a minimum bias event generator. Qcd phenomenology essentially.

good to see the particle physics kind of thinking around. exactly what are you doing? (Did do?) You location is Switzerland, so I guess LHC?



Oh cool! My masters project was writing an event generator for black hole events at the LHC, was fun. Now I do experimental stuff which is far less fun Yep, working on ATLAS, a little over halfway through my PhD. Looking for SUSY - I don't hold much hope for getting a positive result. xD
Boop!
ZenithM
Profile Joined February 2011
France15952 Posts
March 16 2015 13:19 GMT
#53
On March 16 2015 19:14 Cascade wrote:
Show nested quote +
On March 16 2015 18:46 ZenithM wrote:
Yeah, "best" in my post meant "winner's bracket finalist", the only "best" that matters in respect to fairness of the competition.

If you think that's the only meaningful use of the term "best player" is the winner brackets finalist, imho, I think you missed the entire point of the OP. Point is that the best team, as in the team having a larger than 50% probability to beat any other team (and THAT'S a useful definition of best), can end up in the losers bracket, and should then be given a chance to prove that they are indeed better than the winner brackets finalist.

If you claim that the winner brackets finalist is always the best team, then optimal way to have the best team win is to just give the tournament to the winner brackets finalist, ie single elimination.

I am bit confused I have to say, maybe I just misunderstand you...

I think I understand what the OP wants to say, I just say that if you remove the 1-0 advantage, the "best team" (in the sense of "the one who didn't lose") is not rewarded at all for being the best that day, because its opponent has had the opportunity to lose once already, and this same opportunity is denied to the "best".
As I understand it, the OP claims that the statistical difference in the chances that the best team wins (the best gameplay wise this time) is negligible compared to how badly the 1-0 advantage is perceived by the viewers.
I'm just saying that if you remove that, the tournament becomes unfair, and certainly doesn't deserve to be called "double elimination".
And back in the day, it wasn't even a 1-0 advantage, it was a full Bo5 advantage (back in early MLGs). Now that shit was sad to watch ;D
Markwerf
Profile Joined March 2010
Netherlands3728 Posts
March 16 2015 16:10 GMT
#54
Interesting idea. I wasn't sure the conclusion would hold so I did a few calculations myself.
I calculated the chances of reaching the final for a team assuming they have a chance P to win a single game vs anyone (and thus 1-P to lose). To determine if the 1-0 advantage is good or not for the best team it matters what the relative probabilities of reaching the final by the winner's or loser's bracket is. So i calculated it for tournaments of size 8, 16 and 32. First graph shows it where the red lines are for tournament size 8, the bumpy one being reaching by loser's bracket, the other one being by winner's bracket. Likewise blue for 16 teams and green for 32. (All matches being bo3).
Basically the more rounds you have or the worse the team is the chance to reach the final by loser's bracket is relatively bigger. This means that for a bigger tournament even if you are a dominant team (60% to win a single game against any other team) you are still more likely to enter the final by loser's bracket than winner's bracket.

This also has the result that unless you are very dominant as team the format for the final doesn't matter much. Only for small tournaments (where double elimination is somewhat silly anyway) or if you are very dominant is it really a disadvantage to have the final start 1-0 up. The reason simply is that even the best team often enters the final by loser's bracket.
The second graph shows the chance to win the whole tournament for a 8 team double elimination if your team has P chance (shown on X-axis) to win a single game. The red line is with a 0-0 starting final, the blue one with a 1-0 for WB team final.

Basically for determining the 'fair' winner it hardly matters, only a little if the team is very dominant to begin with.

[image loading]

[image loading]

As for discussion about the format, I think double elimination with 1-0 up in the final is fine for most tournaments. It's important for the tournament to be intersting and be a bit fair. Too much luck like single elimination can cause random or weak teams to reach too far too often, but a fairer system of round robin can last too long. Double elimination gives good chances for the best teams to come out on top, while still having the thrill of elimination. Round robin has useless matches, match throwing and all other sorts of problems, double elimination has exciting matches while still having good chances for a great final.
The 1-0 up in the final is a decent method to give the WB team for having a bit of an advantage without it being too big. Double bo3 gives slightly bigger advantage but feels a bit sillier to me. For determining the fairest winner it doesn't matter much how the final is done and it's not even in a tournaments interest per se to do that. They want viewership and excitement, having the favourite roll over people stinks. You could argue tennis is doing much poorer and soccer is so popular because they are respectively too predictable and excitingly unpredictable.
hariooo
Profile Joined October 2013
Canada2830 Posts
March 16 2015 19:33 GMT
#55
You can always tell who has never taken a stats course past the high school level by who talks the loudest with the most strongly held opinions. It's a notoriously difficult field with conclusions drawn from studies usually being quite nuanced and qualified, but conclusions that are significant nonetheless.

Otherwise you get drawn into arguments with people who don't think N=25 is a significant sample size because it "feels low".

On March 16 2015 18:18 y0su wrote:
Show nested quote +
On March 16 2015 16:47 ZenithM wrote:
This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair.
Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...

something you and the OP did was not differentiate between "best team" (going in) and "winner's bracket finalist". Granted, that "should" be the best team, but for the sake of statistics the "best team" doesn't always make the winner's side...

I'm still curious to see how often the highest elo team ends up winning if the tournament is single elimination or true double elimination (compared to the bo5 at 0-0 and 1-0).


If the model only looked at who was the best team going in (statistically the highest WR) and who actually wins, the team that ends up in WF's is irrelevant. It gets into conditional probability that has nothing to do with the original hypothesis. It would be like flipping a coin 100 times to see if it's a fair coin and looking at if heads was ever flipped 10 times in a row as that would be evidence against the hypothesis even though the only relevant metric is the end number of heads and tails.
y0su
Profile Blog Joined September 2011
Finland7871 Posts
March 17 2015 10:49 GMT
#56
On March 17 2015 04:33 hariooo wrote:
You can always tell who has never taken a stats course past the high school level by who talks the loudest with the most strongly held opinions. It's a notoriously difficult field with conclusions drawn from studies usually being quite nuanced and qualified, but conclusions that are significant nonetheless.

Otherwise you get drawn into arguments with people who don't think N=25 is a significant sample size because it "feels low".

Show nested quote +
On March 16 2015 18:18 y0su wrote:
On March 16 2015 16:47 ZenithM wrote:
This OP seems to forget that if you start at 0-0 in the finals, your tournament is "double elimination" for all but the best team, which is unfair.
Way worse to have an unfair format rather than hurt the viewer's feelings a little. I understand that you may want to sacrifice fairness for quality of show, especially when it doesn't disadvantage anyone but the best teams, and who gives a fuck about fairness for the best, they're the best anyway...

something you and the OP did was not differentiate between "best team" (going in) and "winner's bracket finalist". Granted, that "should" be the best team, but for the sake of statistics the "best team" doesn't always make the winner's side...

I'm still curious to see how often the highest elo team ends up winning if the tournament is single elimination or true double elimination (compared to the bo5 at 0-0 and 1-0).


If the model only looked at who was the best team going in (statistically the highest WR) and who actually wins, the team that ends up in WF's is irrelevant. It gets into conditional probability that has nothing to do with the original hypothesis. It would be like flipping a coin 100 times to see if it's a fair coin and looking at if heads was ever flipped 10 times in a row as that would be evidence against the hypothesis even though the only relevant metric is the end number of heads and tails.

Correct, the team that was the WF is irrelevant in the OP's calculations. That's my point. The entire issue with dual elimination revolves around how the WF faces elimination in the finals, not "does it give the best team the highest chance to win". + Show Spoiler +
although I am curious about THAT statistic in various formats

People could easily misunderstand the OP and think that "best team" is synonymous with WF and incorrectly conclude that 0-0 vs 1-0 starts are statistically fair.
sixfour
Profile Blog Joined December 2009
England11061 Posts
Last Edited: 2015-03-20 22:02:24
March 20 2015 22:00 GMT
#57
On March 15 2015 09:02 GeckoXp wrote:

Btw, if you're interested in the topic itself, try to google for interviews of Barry Hearn and the PTC Snooker series. He changed tons of professional billard tournaments to shorter distances (iirc Bo9-Bo17 to Bo7 only). He tries to explain why that is - without any math - and just summarizes it as: "it's the only way to get all games done in a short time frame".


Hearn (who's ruining snooker by the way) has the issue that doesn't exist with esports in that they're limited by logistics - they only have so many tables available and so much time. With esports you can play as many games concurrently as you like - at least until you get to an offline stage. of course if you do an online stage correctly you can allow for the offline games to be of a decent length

edit:

On March 15 2015 09:13 Cheren wrote:
Every defense of double elimination I've read is tautological. "Double elimination works because teams that lose twice are eliminated." "Double elimination works because everyone gets a second chance."

It's a system with horrible flaws that isn't used in real sports and needs to get out of esports. It is to tournament formats what Instant Runoff is to voting.


again, it is rarely used in normal sports because of logistical reasons as detailed above
p: stats, horang2, free, jangbi z: soulkey, zero, shine, hydra t: leta, hiya, sea
hariooo
Profile Joined October 2013
Canada2830 Posts
March 24 2015 20:01 GMT
#58
Note that double elimination is much more reasonable for games where fortunate/unfortunate matchups in terms of characters/races in bracket are a bigger issue in terms of variation, especially fighters or games like SC. DOTA can quite reasonably start doing solo elimination brackets because you're given tools to alleviate that through the draft system.
Normal
Please log in or register to reply.
Live Events Refresh
Next event in 2h 8m
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
UpATreeSC 132
ProTech92
JuggernautJason84
CosmosSc2 45
StarCraft: Brood War
NaDa 18
Dota 2
capcasts258
Counter-Strike
Stewie2K523
flusha208
Super Smash Bros
AZ_Axe5
Heroes of the Storm
Liquid`Hasu460
Other Games
summit1g5055
Grubby2987
shahzam583
Pyrionflax159
ViBE151
C9.Mang0144
ZombieGrub42
Organizations
StarCraft 2
angryscii 27
Other Games
BasetradeTV15
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 18 non-featured ]
StarCraft 2
• musti20045 37
• RyuSc2 19
• IndyKCrew
• Migwel
• sooper7s
• AfreecaTV YouTube
• intothetv
• Kozan
• LaughNgamezSOOP
StarCraft: Brood War
• HerbMon 3
• iopq 3
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• masondota22095
League of Legends
• TFBlade780
Counter-Strike
• imaqtpie1113
• Shiphtur199
Upcoming Events
PiGosaur Monday
2h 8m
Afreeca Starleague
12h 8m
hero vs Alone
Royal vs Barracks
Replay Cast
1d 2h
The PondCast
1d 12h
WardiTV Summer Champion…
1d 13h
Replay Cast
2 days
LiuLi Cup
2 days
MaxPax vs TriGGeR
ByuN vs herO
Cure vs Rogue
Classic vs HeRoMaRinE
Cosmonarchy
2 days
OyAji vs Sziky
Sziky vs WolFix
WolFix vs OyAji
BSL Team Wars
2 days
Team Hawk vs Team Dewalt
BSL Team Wars
2 days
Team Hawk vs Team Bonyth
[ Show More ]
SC Evo League
3 days
TaeJa vs Cure
Rogue vs threepoint
ByuN vs Creator
MaNa vs Classic
Maestros of the Game
3 days
ShoWTimE vs Cham
GuMiho vs Ryung
Zoun vs Spirit
Rogue vs MaNa
[BSL 2025] Weekly
3 days
SC Evo League
4 days
Maestros of the Game
4 days
SHIN vs Creator
Astrea vs Lambo
Bunny vs SKillous
HeRoMaRinE vs TriGGeR
BSL Team Wars
4 days
Team Bonyth vs Team Sziky
BSL Team Wars
4 days
Team Dewalt vs Team Sziky
Monday Night Weeklies
5 days
Replay Cast
6 days
Sparkling Tuna Cup
6 days
Liquipedia Results

Completed

CSLAN 3
uThermal 2v2 Main Event
HCC Europe

Ongoing

Copa Latinoamericana 4
BSL 20 Team Wars
KCM Race Survival 2025 Season 3
BSL 21 Qualifiers
ASL Season 20
CSL Season 18: Qualifier 1
Acropolis #4 - TS1
SEL Season 2 Championship
WardiTV Summer 2025
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1
BLAST.tv Austin Major 2025

Upcoming

CSL Season 18: Qualifier 2
CSL 2025 AUTUMN (S18)
LASL Season 20
BSL Season 21
BSL 21 Team A
Chzzk MurlocKing SC1 vs SC2 Cup #2
RSL Revival: Season 2
Maestros of the Game
EC S1
Sisters' Call Cup
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
MESA Nomadic Masters Fall
CS Asia Championships 2025
Roobet Cup 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.