|
ABSTRACT
On the most recent State of the Game podcast, there was discussion of MLG's extended series rule in their double elimination tournament. This post explores the effects of the extended series rule on tournament outcomes, using a simplified model of players and tournaments. Several tournament formats are explored: round robin, single elimination, double elimination, and double elimination with extended series. Performance is measured by averaging over many simulations, using several distance metrics from the 'ideal ranking' of players. Results show a small but measurable improvement in performance when using the extended series rule; with 64 players in a best-of-three format, the 'best player' wins 1% more often (25% compared to 24%) using the extended series rule than with simple double elimination. However, the improvement from the extended series rule is marginal compared to the overall tournament format; in single-elimination, the best player wins 19% of the time, and in round-robin the best player wins 47% of the time.
1. INTRODUCTION
Skip this section if you are familiar with the debate about MLG's extended series rule.
MLG is the largest Starcraft II tournament in North America, and consequently its tournament format has a large impact on the competitive scene. MLG employs a fairly standard double-elimination tournament format, with each round determined by a best-of-three series. However, MLG has an additional wrinkle called 'extended series', which many people find counter-intuitive. To explain these complexities, let's start with an overview of different tournament formats.
A single elimination tournament is the simplest format that most people are familiar with. Play proceeds in rounds, with all players starting in the same round. Players are then paired in each round and play a series. The winner proceeds to the next round, and the loser is eliminated from the tournament. This format has the advantage of determining a champion in very few games (O(log(# of players))), but the disadvantage that bad luck can knock out good players at an early stage.
To help with this problem, double elimination tournaments ensure that any player must lose twice in order to knocked out of the tournament. This is done by having two brackets: 'winners' and 'losers'. All players begin in the winners' bracket, and after losing once are sent to the losers'. Players in the loser's bracket play each other, as well as all players who join the losers' bracket from the winner's bracket. Therefore, players in the losers' bracket play twice as many series as those in the winners'.
MLG has extended the double elimination format with an 'extended series' rule that is invoked when players meet twice in a single tournament. If players meet in the winners' bracket, and later again in the losers' bracket, then instead of playing a new best-of-three series, their series from the winners' bracket is resumed as a best-of-seven series. Example: If Alice beats Bob 2-1 in the winners' bracket, and they meet again in the losers' bracket, then they will play a best-of-seven series to determine the winner with a starting score 2-1 in favor of Alice. Alice has to win two games to proceed, and Bob has to win three.
This rule is intended to avoid some paradoxical outcomes, as well as statistically increase the likelihood that the 'better player' continues in the tournament. It is possible in standard double elimination for Alice to defeat Bob 2-0 in the winners', and Bob to defeat Alice in the losers' 2-1. The "overall series" between Alice and Bob is 3-2 in Alice's favor, but Bob continues and Alice does not.
Similarly, another argument is that double elimination exists in order to give better players a 'second chance' to continue in the tournament when defeated by inferior players, but this logic does not apply when the same players meet again. In this case, it makes more sense (so the argument goes) to extend the series to determine the 'better player'.
Despite these arguments, the extended series has generated controversy because in many instances the tournament setting is very different when the series resumes, and many people find it unentertaining and counter-intuitive.
In particular, the extended series between Liquid`Tyler and PainUser at MLG Dallas demonstrates some of the problems. In their series in the winners' bracket, Liquid`Tyler fell victim to a mistake of the tournament organizers, and was forced to restart a game that he had a clear advantage. Liquid`Tyler subsequently lost the series 2-0, which some have argued was due to the psychological effect of the game restart. When they later met in the losers' bracket, Liquid`Tyler was at a significant disadvantage, and lost the extended series 2-4, but would have won a best-of-three.
This post is organized into several sections. Section 2 describes how these results were gathered, and the various models used. Section 3 describes the experimental setup. Section 4 presents the results. Section 5 concludes, and Section 6 shows where to follow up on this if you are interested.
1.1 SCOPE
This post is an in-depth analysis of the statistical performance of different tournament formats. It is not concerned with many other important questions, for example:
* What is the purpose of tournaments, beyond determining skill of players?
* Is the extended series rule entertaining?
* Is the extended series rule morally justified?
* Players aren't strictly 'better' or 'worse' than each other -- or, at least, this relationship isn't transitive between players.
* The tournament setting can change when an extended series resumes.
These questions will and have been addressed elsewhere.
2. DESCRIPTION
This post explores the accuracy of several tournament formats, focusing on the impact of the extended series rule. This is done using simulation, running through many thousands of tournaments and comparing the average results. This section describes the player model, tournament model, and accuracy metrics used in the results.
2.1 PLAYER MODEL
Players are modeled using a simple randomized model. The goal is to have players of greater or lesser skill, but have each player vary somewhat in their performance. Players therefore consist of two numbers: mean performance and deviation. Performance for a single player is randomly generated each game, and lies in the range [mean - dev, mean + dev].
The mean performance lies between 0 and 2, and the deviation is always 1. This ensures that the worst player can always beat the best player, however at the extremes this is unlikely.
A players performance is calculated as follows:
performance = mean + dev * rand^2 * plusminus
Where rand is a uniformly-distributed number in [0,1] and plusminus is seleted from {-1,1} with even probability. This formula makes the mass of the probability distributed concentrated around the mean, making the better players win more often.
To generate a set of players for a tournament, each player's mean is selected uniformly from [0,2]. This is probably inaccurate -- player's mean performance is likely distributed on a normal curve. The player model is probably the biggest weakness in this study, however I still believe the first-order effects are well captured in the analysis.
2.1 TOURNAMENT MODEL
The rules for each tournament are faithfully replicated in the simulation, however there are some modelling choices here as well. The most significant is the seeding of players in each tournament. I have chosen to use the "ideal seeding", as determined by players' mean performance, as the initial seeding for players. This removes a source of inaccuracy from elimination tournaments, and so the results should be taken as an upper bound for their performance.
Four tournament types are considered: single elimination, double elimination, double elimination with extended series, and round robin. The focus of this post is on the effect of extended series, but single elimination and round robin are included in order to give some context for these results.
A round robin tournament is one where every player plays every other. Players are then ranked according to their number of wins. This tournament produces a complete ranking, first through last, and because everyone plays everyone, it is very accurate. The down side is that it requires a lot of games (O(# players)) and is less exciting than other tournament formats. However, because it is so accurate, it can be used to calibrate the accuracy of elimination tournaments by showing a "speed of light" for tournament efficacy.
Similarly, single elimination tournaments show the other end of the spectrum. They are very fickle in their results, and show relatively how much of an improvement the extended series rule makes over standard double elimination.
2.2 MEASURING ACCURACY
One of the principle challenges is determining how to measure performance of a tournament -- how can we say that one tournament is "better" than another? The approach taken is to have each tournament produce a ranking of players, first through last, and compare this ranking to the ideal ranking, as determined by players' mean performance.
This produces its own challenges, as elimination tournaments do not strictly produce a ranking. However, taking seeding into account, an elimination tournament does sort players into categories based on how far they made it through the tournament. The ranking of players is determined as players are eliminated from the tournament -- first eliminated places last, and so on.
Three metrics are used to measure performance: winner, depth, and 2^depth.
* The 'winner' metric determines performance based on a very simple, intuitive rule: Did the best player win? This metric is simple, but unfortunately not very useful, because for even moderately-sized tournaments, the best player rarely wins.
* The 'depth' metric determines performance based on how deep each player made it in the tournament. Specifically, the player ranking is divided into groups according to a single-elimination bracket (first, second, top four, top eight, top sixteen, etc..). Then each player's expected placement is calculated based on which group they fall into within the ideal ranking -- the fifteenth-best player should place into the top sixteen. These results are compared against the actual placement from simulation, and the difference from depth for all players is added to produce the final "distance from ideal".
* The '2^depth' metric is similar to the depth metric, however before adding up all of the depth-differences, we first calculate 2^(delta)-1. This is done because, intuitively, it is more significant if the first player is eliminated in the round of 64 than if the 33rd, 34th, 35th, and 36th players make it to the round of 32, but the 'depth' metric calculates these as being equally bad. Essentially, this metric exaggerates says that big differences in depth are more important than many small differences.
3. METHODOLOGY
Results are gathered by running simulations of one million tournaments and averaging the results for each tournament. It is generally found that the trends in each metric are reflected in the others, except for the 'winner' metric, which is very sensitive to random factors and sometimes fluctuates independently.
4. RESULTS
4.1 OVERVIEW
Because this discussion was inspired by MLG Dallas, the first result to consider the overall performance of each tournament format in a 128-player, best-of-three tournament:
Format | Winner | Depth | 2^Depth ---------------+--------+-------+-------- Single | 0.91 | 52.09 | 110.07 Double | 0.88 | 48.31 | 89.83 DoubleExtended | 0.88 | 46.01 | 87.42 RoundRobin | 0.72 | 22.29 | 28.85
Note that these are distance metrics, so lower is always better. For the 'winner' metric, this number indicates the fraction of the time that the best player did not win. So, 1 - 'winner' is the chance of the best player winning the entire tournament.
A slight improvement can be seen from using the extended series in the depth metrics, however it is marginal compared to the large difference between single elimination and round robin tournaments. These results also indicate that double elimination does perform significantly better than single elimination, however neither come close the performance a round robin tournament.
4.2 VARYING NUMBER OF GAMES
We can also explore the effect on tournament outcomes when the number of games in each series is varied. (In this case, the extended series is also varied.) These results are graphed below.
These results all show pretty much what one would expect -- using more games in each series improves the accuracy of the tournament format. However, this also visually show that the elimination tournaments all perform similarly, and none approach the accuracy of a round robin tournament. The ordering of performance is very consistent, however: round robin is best, followed by double elimination with extended series, double elimination, and single elimination.
The depth metric doesn't show much separation between the different elimination tournament formats, but the winner and 2^depth metrics both show significant separation between single and double elimination formats. This indicates that the single elimination format produces more big differences in outcome than the double elimination tournament. That is, more often the best player does not win, and more often good players don't make it as far as they should. In this respect, the extended series seems to make very little difference.
4.3 VARYING NUMBER OF PLAYERS
In this section, we compare the effect on accuracy when changing the number of players in the tournament. I have to break methodology here a bit, because I don't have the time to wait for a million simulations of a 512-player round robin tournament to finish. So instead, I simulated fifty thousand simulations. Consequently, there is a little more noise in these results.
These graphs don't show anything particularly revealing compared with the last section, but they do confirm that the trends hold over a variety of tournament sizes. Single elimination does worse than double elimination formats, and round robin is much better than the elimination formats. This is particularly true with large numbers of players -- but in this range, it is an unfair comparison, because round robin plays many more games. Most relevant to this post, extended series seems to have minimal effect on results for large numbers of players, particularly when considering 2^depth.
4.4 EFFECT OF EXTENDED SERIES
We now consider the effect of the extended series in isolation. Specifically, how often is the extended series used, and how often does is "correct an injustice" from the winners' bracket?
In this case, we consider a 64-player tournament in double elimination with extended series format. In a standard double-elimination format, 127 matches will be played.
Simulation shows that, on average, 18.8 extended series will be played in a 64-player tournament. This means that 15% of matches, on average, will be rematches of players.
Similarly, of these 18.8 matches, 3.03 of them will result in "corrections". A correction is when the better player loses in the winners' bracket and wins the extended series to continue in the tournament. In 2.17 of the matches, the worse player won in the winners' bracket and won the extended series, meaning the extended series failed to "correct" the result from the winners' bracket.
The worst possible outcome is when the better play wins in the winners' bracket and loses the extended series. The extended series does well here, only introducing 0.55 such results per tournament, or 4% of the extended series.
Considering the disadvantage that the better player has when entering the extended series, it does surprisingly well at correcting these results, succeeding 58% of the time. At the same time, it only introduces bad results 4% of the time.
I am tempted to conlude that extended series is successful at letting the better player continue in the tournament, however data is missing to compare against a standard double elimination tournament. A good area of extension for this study would be measuring the outcome if a regular best-of-three were done, and comparing its correction/injustice rate to the extended series. The ratio from the extended series (58%/4%) seems pretty hard to beat -- I would expect a best-of-three to allow the better play to proceed more often, but have a much higher injustice rate.
5. CONCLUSION
Whe considering individual matches, the extended series appears to perform well to make sure the better player continues in the tournament. In this sense, it fulfills its purpose.
But when looking at the larger picture, it appears that the extended series has little effect on the outcome. While the extended series rule does slightly improve outcomes, these differences are not particularly significant compared to the overall double elimination format.
What is clear from these results is that both elimination formats leave much to be desired when compared to a round robin tournament. Although round-robin is impractical due its large number of games, other tournament formats such as swiss-style or those with rounds play deserve further consideration.
Another future area of work is considering the performance of a points-based system of several double elimination tournaments, like MLG employs for its full Starcraft II season.
6. SEE ALSO
Wikipedia on tournament formats: http://en.wikipedia.org/wiki/Single-elimination_tournament http://en.wikipedia.org/wiki/Swiss_style_tournament
6.1 SOURCE CODE
The source code is available via git at:
git://github.com/nathanbeckmann/Tournament.git
It is written in Go. Have fun!
EDIT 1: Corrected problem with injustice rate. It is 4%, not 3%.
EDIT 2: Fix example in intro (corrected by Cyber_Cheese).
|
I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized?
|
On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized?
I agree. Even more interesting, lets say (hypothetically) that..
IdrA > Tyler Tyler > SeleCT SeleCT > IdrA
There is no "best player" in this group, and now their seeding basically determines who faces who first, and therefore which of them has an advantage in the extended series.
I'd call this one of those things that falls outside the scope of my post.
|
awesome awesome study. Thanks for the hardwork. Good to know that extended series does have some value... although minimal.
|
On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized?
IdrA's argument is one that has been explicitly excluded from the scope of this analysis (that the "better" player might not be transitive).
|
I just took a closer look at all your work, and that's actually really awesome. The statistics do make sense when put out in such an organized manor.
I appreciate your hard work, I hope this will get some eyes from MLG haters. I still disagree with it at the core of its concept, but in terms of your statistics, the math points in the right direction.
|
In a higher level arena the better player isn't always transitive. That is because there are too many variables that must be taken into consideration such as race matchups, maps, player conditioning and etc.
|
Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc?
|
On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized?
IdrA's arguement is irrelevant to the actual statistics or logic in the argument.
Extended series exist to make contest between 2 player fairer, how these guys play 3rd player has no effect.
Also, in his argument, how does he know he wouldn't have made stupid mistake if he were to advance over nony?
|
On November 12 2010 10:07 Shakes wrote:Show nested quote +On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized? IdrA's argument is one that has been explicitly excluded from the scope of this analysis (that the "better" player might not be transitive). Not really.
The nontransitivity is taken in account since performance was measured using a mean +/- and random number. And that allows for the possibility that player A will beat player B, player B beats player C, and player C beats player A.
|
On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc?
This is definitely possible, you would need some kind of relation for each player to every other. The problem with this is you would end up with a lot of choices in terms of modeling -- because the relationship, while not perfectly transitive, is pretty close. (That is, although the cream of the crop might be extremely intransitive, they are definitely better than most of the other players). Therefore the relation you come up with shouldn't be completely random. This kind of data would probably have to be pulled from actual player statistics, which would actually be a huge improvement to the study overall.
But until that happens, I think keeping it simple is better because you avoid a lot of complexities that don't necessarily improve the results.
|
On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis.
|
I think your study would only be meaningful if people actually assumed a bo7 series does not determine the best player as well as a bo3 series.
|
On November 12 2010 10:23 paralleluniverse wrote:Show nested quote +On November 12 2010 10:07 Shakes wrote:On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized? IdrA's argument is one that has been explicitly excluded from the scope of this analysis (that the "better" player might not be transitive). Not really. The nontransitivity is taken in account since performance was measured using a mean +/- and random number. And that allows for the possibility that player A will beat player B, player B beats player C, and player C beats player A.
In this sense, the intransitivity is a random fluctuation, and if you played a long enough series you would expect it to go away.
But in reality, there probably are cases of "true intransitivity", where people's play styles match up in weird ways so that A > B, B > C, and C > A.
|
On November 12 2010 10:26 rasnj wrote:Show nested quote +On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis.
Although reality isn't exactly transitive, it is pretty close.
That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order.
|
On November 12 2010 10:26 zulu_nation8 wrote: I think your study would only be meaningful if people actually assumed a bo7 series does not determine the best player as well as a bo3 series.
I'm not really sure what you are responding to ...
The point of this is to determine exactly how much of an effect extended series has, both for individual matches and for an entire tournament. I'm pretty sure I haven't seen anyone talk about this with real numbers to back up what they are saying
|
On November 12 2010 10:27 nzb wrote:Show nested quote +On November 12 2010 10:23 paralleluniverse wrote:On November 12 2010 10:07 Shakes wrote:On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized? IdrA's argument is one that has been explicitly excluded from the scope of this analysis (that the "better" player might not be transitive). Not really. The nontransitivity is taken in account since performance was measured using a mean +/- and random number. And that allows for the possibility that player A will beat player B, player B beats player C, and player C beats player A. In this sense, the intransitivity is a random fluctuation, and if you played a long enough series you would expect it to go away. But in reality, there probably are cases of "true intransitivity", where people's play styles match up in weird ways so that A > B, B > C, and C > A. But these *are* random fluctuations in real life. If A > B > C, we would expect that A will beat B will beat C most of the time, and on some few random occasions for this not to hold. I think your model captures this fact well.
Although I wonder why you used such an archaic setup to simulate player performance instead of just simulating from a normal distribution, which can be done in 1 line in any statistical package, and would probably be more correct.
|
On November 12 2010 10:30 nzb wrote:Show nested quote +On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order.
Well non-transitivity can occur especially if you are comparing between a non-team mate and 2 team mates.
Incontrol might be better than machine because he knows his teammate well, but machine might be better than Painuser but Painuser is better than Incontrol. (random names)
So its not necessarily clear in reality. =/
|
On November 12 2010 10:32 paralleluniverse wrote:Show nested quote +On November 12 2010 10:27 nzb wrote:On November 12 2010 10:23 paralleluniverse wrote:On November 12 2010 10:07 Shakes wrote:On November 12 2010 10:00 Durn wrote: I think IdrA summed it up quite well in the State of the Game. Statistics aside, it goes like this hypothetical they used:
IdrA makes a stupid mistake and gets knocked out by NoNy in an early round. 3 rounds later, NoNy makes a silly mistake that idrA wouldn't have made. They meet in the losers bracket, they've both made silly mistakes that the other one wouldn't have made. Why should IdrA be penalized? IdrA's argument is one that has been explicitly excluded from the scope of this analysis (that the "better" player might not be transitive). Not really. The nontransitivity is taken in account since performance was measured using a mean +/- and random number. And that allows for the possibility that player A will beat player B, player B beats player C, and player C beats player A. In this sense, the intransitivity is a random fluctuation, and if you played a long enough series you would expect it to go away. But in reality, there probably are cases of "true intransitivity", where people's play styles match up in weird ways so that A > B, B > C, and C > A. But these *are* random fluctuations in real life. If A > B > C, we would expect that A will beat B will beat C most of the time, and on some random occasions for this not to hold. I think your model captures this fact well. Although I wonder why you used such an archaic setup to simulate player performance instead of just simulating from a normal distribution, which can be done in 1 line in any statistical package, and would probably be more correct.
Haha, touche. The reason is that I did this in order to have something fun to code in Go, which I've wanted to learn for a while, so doing it in Mathematica or R or something would have defeated my purpose.
|
|
On November 12 2010 10:34 scion wrote:Show nested quote +On November 12 2010 10:30 nzb wrote:On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. Well non-transitivity can occur especially if you are comparing between a non-team mate and 2 team mates. Incontrol might be better than machine because he knows his teammate well, but machine might be better than Painuser but Painuser is better than Incontrol. (random names) So its not necessarily clear in reality. =/
I agree completely, but I think these are second order effects, and a simple model will capture the main effects that one is interested in. That is, I suspect non-transitivity is "lost in the noise".
|
Why can't they just use double elimination into the Ro8 then do round robin? That would keep the number of games played low but make placement among the top 8 players more accurate.
|
I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. It puts a lot of pressure on the loser that wouldn't be there if it were simply another best of 3.
+++
|
There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
|
On November 12 2010 10:37 Zzoram wrote: Why can't they just use double elimination into the Ro8 then do round robin? That would keep the number of games played low but make placement among the top 8 players more accurate.
People usually tend to the do the opposite, funnily enough...
Most professional sports have a round-robin regular season, followed by single-elimination tournament for the play offs.
FIFA world cup has round-robin-based group play, followed by single elimination.
The main reason why people prefer elimination tournaments for getting a champion is its more exciting and tense, because you could be eliminated for good at any point.
Also, round robin tournaments have the HUGE flaw that someone can secure #1 spot with several games left in the tournament -- this usually unacceptable for determining a champion because then the final games of the season are literally irrelevant. Swiss-style tourneys have the same issue. Wikipedia has good info on this.
|
On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match.
On November 12 2010 10:40 darmousseh wrote: There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
I agree, but I honestly have no idea how to model this, so I just threw it out from the beginning. Take it or leave it. :/
That's what the "Scope" section was there for.
|
On November 12 2010 10:40 darmousseh wrote: There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
Oh, and I would have done swiss, but I was getting bored. It'd be awesome if you did it.
|
On November 12 2010 10:30 nzb wrote:Show nested quote +On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. I disagree that reality is close. For instance pre-tourny idra said in an interview that Sen was unlikely to win because he did not obey the law of transitivity. Well what he said was that Sen wasn't that good at ZvT I believe, but great at other matchups, but because it was inevitable for him to run into decent Ts he had a very low chance of winning. So for instance we might have Sen > Machine, Machine > Drewbie, Drewbie > Sen and this completely ruins any kind of argument that assumes transitivity (these examples are to some extent arbitrary and people shouldn't take offense if they feel I got the inequality sign the wrong way, they are just for demonstration).
If Sen could beat any protoss or zerg 7-0 95% of the time, but had 50% chance against Ts, would it then be a failure or success for him to win? If you allow race-specific skill-levels (or even player-player specific skill-levels) then we suddenly can't define what a good outcome for the tournament is since we can't say whether hypothetical Sen should rank higher than hypothetical Idra who wins 75% against anyone except Sen.
EDIT: I realize you purposefully excluded this since it is hard to model, but I feel that any model that doesn't even take race-specific stats into account is bound to be flawed, and even if it ends up agreeing with a more thorough analysis you had no sound reason to believe that to be the case.
|
On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. It puts a lot of pressure on the loser that wouldn't be there if it were simply another best of 3.
+++
By same logic, I can say if it was clean Bo3, the guy coming from winners have a huge pressure because he beat him once but face possible elimination by a guy you beat already.
There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
You can never have the perfect stats model nor should you even attempt that. The best/most useful stat model is the simplest model. I think the OP did a great job in showing extended series is only slightly more accurate/fair than just simple double elimination, questioning its need for existence considering all other factors.
|
On November 12 2010 10:49 rasnj wrote:Show nested quote +On November 12 2010 10:30 nzb wrote:On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. I disagree that reality is close. For instance pre-tourny idra said in an interview that Sen was unlikely to win because he did not obey the law of transitivity. Well what he said was that Sen wasn't that good at ZvT I believe, but great at other matchups, but because it was inevitable for him to run into decent Ts he had a very low chance of winning. So for instance we might have Sen > Machine, Machine > Drewbie, Drewbie > Sen and this completely ruins any kind of argument that assumes transitivity (these examples are to some extent arbitrary and people shouldn't take offense if they feel I got the inequality sign the wrong way, they are just for demonstration). If Sen could beat any protoss or zerg 7-0 95% of the time, but had 50% chance against Ts, would it then be a failure or success for him to win? If you allow race-specific skill-levels (or even player-player specific skill-levels) then we suddenly can't define what a good outcome for the tournament is since we can't say whether hypothetical Sen should rank higher than hypothetical Idra who wins 75% against anyone except Sen. EDIT: I realize you purposefully excluded this since it is hard to model, but I feel that any model that doesn't even take race-specific stats into account is bound to be flawed, and even if it ends up agreeing with a more thorough analysis you had no sound reason to believe that to be the case.
My point is: do you think Sen (or any player) has 50% against ALL terrans? No, he has 50% against T's above a certain caliber. I don't think there is any player that is 95% against good protoss and 50% against terrible terrans.
So, you will see this effect at the top of the tournament. But for very large tournaments, this effect won't be particularly significant. And my results include 512-player tournaments, which show the same trends, so I think the results can be considered although there are problems with the models.
EDIT: I can't speak English.
|
On November 12 2010 10:43 nzb wrote:Show nested quote +On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. Show nested quote +On November 12 2010 10:40 darmousseh wrote: There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two. I agree, but I honestly have no idea how to model this, so I just threw it out from the beginning. Take it or leave it. :/ That's what the "Scope" section was there for.
Well, in the tournament do the following. Give each player 3 ratings and deviations and assign each player a random race.
So Player -race = 0 (0, 1 or 2) -Mean = [1500,1400,1600] (Mean against each race) -dev = [100,100,100] (deviation against each race)
Then make some maps. On each map do the following Map1 -bonus = [0,0,100] (bonus mean for each race)
Add that value to the races mean if they are playing on that map. (it doesn't have to be overly complex, but this works as a generalization) And then during the match either have a static map pool, or have the loser or lower seed choose the map that gives them the most extra points against their opponent.
It is highly likely that the end result will deviate more based on the choice of maps (check results when giving a race a heavily favored map pool) compared to dynamic maps or more balanced maps.
|
On November 12 2010 10:52 scion wrote:Show nested quote +On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. It puts a lot of pressure on the loser that wouldn't be there if it were simply another best of 3.
+++ By same logic, I can say if it was clean Bo3, the guy coming from winners have a huge pressure because he beat him once but face possible elimination by a guy you beat already. Show nested quote + There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
You can never have the perfect stats model nor should you even attempt that. The best/most useful stat model is the simplest model. I think the OP did a great job in showing extended series is only slightly more accurate/fair than just simple double elimination, questioning its need for existence considering all other factors.
Very true, i agree that his thread shows that it doesn't help too much, but there is another issue that is unchecked which is that the map pool is more likely to influence results than choice of tournament format.
Which also leads me to believe that if MLG truly wants to punish a player, if they lose, then the forfeit the right to remove a map, and if the need punishment again, let the opponent choose another map to eliminate. That is a sure way to punish players.
|
I just wanted to say thank you for doing all of this leg work. I feel like this is what I've had going through my head since listening to the most recent State of the Game, but neither had the means, talent, nor time to put it all together in such a great post. It makes me just as happy that the shortcomings (race matchup, psychology, maps) are readily explained as such. It makes for a much more civil discussion. Anyway, this post makes me happy, so thank you.
|
On November 12 2010 10:23 paralleluniverse wrote: The nontransitivity is taken in account since performance was measured using a mean +/- and random number. And that allows for the possibility that player A will beat player B, player B beats player C, and player C beats player A.
As the OP noted, the statistics do not take that case into account. The reason is that the way you model a random process statistically is inherently different than how you treat a causal process. The math's not the same.
|
On November 12 2010 10:58 nzb wrote:Show nested quote +On November 12 2010 10:49 rasnj wrote:On November 12 2010 10:30 nzb wrote:On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. I disagree that reality is close. For instance pre-tourny idra said in an interview that Sen was unlikely to win because he did not obey the law of transitivity. Well what he said was that Sen wasn't that good at ZvT I believe, but great at other matchups, but because it was inevitable for him to run into decent Ts he had a very low chance of winning. So for instance we might have Sen > Machine, Machine > Drewbie, Drewbie > Sen and this completely ruins any kind of argument that assumes transitivity (these examples are to some extent arbitrary and people shouldn't take offense if they feel I got the inequality sign the wrong way, they are just for demonstration). If Sen could beat any protoss or zerg 7-0 95% of the time, but had 50% chance against Ts, would it then be a failure or success for him to win? If you allow race-specific skill-levels (or even player-player specific skill-levels) then we suddenly can't define what a good outcome for the tournament is since we can't say whether hypothetical Sen should rank higher than hypothetical Idra who wins 75% against anyone except Sen. EDIT: I realize you purposefully excluded this since it is hard to model, but I feel that any model that doesn't even take race-specific stats into account is bound to be flawed, and even if it ends up agreeing with a more thorough analysis you had no sound reason to believe that to be the case. My point is: do you think Sen (or any player) has 50% against ALL terrans? No, he has 50% against T's above a certain caliber. I don't think there is any player that is 95% against good protoss and 50% against terrible terrans. So, you will see this effect at the top of the tournament. But for very large tournaments, this effect won't be particularly significant. And my results include 512-player tournaments, which show the same trends, so I think the results can be considered although there are problems with the models. EDIT: I can't speak English.
No I dont believe that. That was an extreme example to illustrate an issue with the model. But I do believe there are players that are considerably better at some matchups than others. I do believe there exists at least some triples A,B,C of players such that A is favored when against B, B is favored when against C, and C is favored when against A.
For example Huk's PvP is 70%, but his PvZ is 50%. Kiwikaki's PvP is 54.17%, but his PvZ is 63.64%.
In Huk vs Kiwki, Huk is probably favored (their current score is 6-2 in favor of Huk), but against a good zerg Kiwi has a considerable higher chance of passing him. [Statistics from TLPD, so only high-profile matches included, but the general trend of their results is correct I believe]
|
On November 12 2010 10:26 rasnj wrote:Show nested quote +On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis.
Well, that is precisely what I am wondering. Is it possible that rather than specifying a 'goal' for the tournament, you instead want to find the tournament framework that minimises the impact of the order in which the players face each other (i.e. ideally it shouldn't matter if Idra plays Huk in the first game or HDStarcraft) when determining the winner (or rankings, or whatever goal you want to pick for the tournament).
The point of going to these lengths is to address exactly the difficulty you talk about - it's not realistic to impose a total order on the skill levels of the players (though it's obviously not terribly far out; we just want to know if it will affect the analysis or not).
|
On November 12 2010 10:26 nzb wrote:Show nested quote +On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? This is definitely possible, you would need some kind of relation for each player to every other. The problem with this is you would end up with a lot of choices in terms of modeling -- because the relationship, while not perfectly transitive, is pretty close. (That is, although the cream of the crop might be extremely intransitive, they are definitely better than most of the other players). Therefore the relation you come up with shouldn't be completely random. This kind of data would probably have to be pulled from actual player statistics, which would actually be a huge improvement to the study overall. But until that happens, I think keeping it simple is better because you avoid a lot of complexities that don't necessarily improve the results.
All agreed, but I'm trying to think if there's some simple way of making a strong argument that the current analysis is unaffected by these complications, given our new rephrasing of the point of interest.
|
First and foremost, this is an excellent post, and thanks for putting it out there in such a formal manner. I definitely dig it.
What I wonder is: if instead of the extended series in the format they currently have, what if instead of going on to a best of 7 with the previous results carrying over, it just becomes a clean best of 7. This addresses the problem of each match being a separate event, but also takes into account that it the better player should win.
I don't think anyone doubts that the winner of a best of 7 is a fairly good indicator of who is the better player. This way, too, you're not stuck in a situation where you may be one or two games behind and have to overcome a huge psychological road block as well as a clear disadvantage in terms of having to win more games than the other person. I think this would help mitigate the issues of getting unlucky when it comes to the map pools as well. If you're 2 games behind and the next two maps are your worst maps, that is going to be a tough break.
|
The OP is very interesting albeit somewhat distracting to those who don't understand the point of the extended series. B/c every tournament has a 1st, 2nd (...) and last place every tournament employs some kind of sorting algorithm to determine the final standings.
Your analysis shows what I gave up trying to argue in the extended series thread: the extended series (however marginally) improves the confidence in the final result. Amusingly in the SotG cast the only top player to demonstrate a modicum of understanding was Tyler. I was really surprised Idra, Day9, and JP were completely clueless with regards to the internal logic behind the tournament format. Especially a player like Idra since the extended series would benefit him the most.
Also serious lol @ Day9 for arguing against a specific tournament format based on his "feelings." I love Day9 but sometimes he is wrong.
EDIT: Great OP btw nzb, I appreciate your work.
|
I do a lot of tabletop gaming, and we use swiss pairing for our tournaments.
I am not a big math guy, but I'm under the impression that swiss pairing would not work well for Starcraft, because it operates best when you can gain a varying number of points for a victory. This can make it so in a four round tournament I've literally won the tournament already at the end of round 3, as you've pointed out, but it also means that if I'm on the top table in round 4, and I win but only slightly, a person on the second table who wins big can still place higher than me.
I'm not sure how you would set up a points system for Starcraft games that people would be able to agree on or that would really be fair.
On a different note, what I don't really like about the extended series rule, which is not part of your study, is that it makes the final games of the tournament less exciting.
Lets say for example Jinro beats TT1 in the winners semifinals 2-0, but then TT1 comes back to play him in the finals. We now have an extended series for the finals, but Jinro is already up 2-0. This makes the statistical chance of him winning the tournament from that point on much higher, and no matter how exciting the games are, the fact that I am aware of this makes it less fun for me.
Further, I'd imagine that the chances of any given game in the tournament being an extended series becomes much higher the longer the tournament goes on and the more people are eliminated. It'd be nice to see some numbers on what the chances of the finals being an extended series compared to the lower rounds would be. I'd imagine they're much higher.
These two problems together make for a tournament end that's very anticlimactic, in my opinion.
|
nzb,
Bravo! This is awesome! I was listening to the state of the game yesterday and after hearing Liquid'Tyler's explanation I was sure this could easily be proved statistically. I was going to go off and write a program to do exactly this. I'm glad you beat me to it as it saves a great amount of time. Great presentation too!
The discussion on the state of the game podcast got me totally interested in tournament theory. For anyone else interested I ran into another great discussion about tournament theory here:
http://www.vrbones.com/2009/07/designing-tournament-part-1.html
I agree with Liquid'Tyler that the primary goal of a tournament should be to find the best player. Or to quote from the above site:
The primary goal of a tournament is to provide an objective method for finding the competitor with the highest true skill.
I think it's interesting that people dislike the extended series rule because it's not used in other tournaments. A better question might be why isn't the extended series rule used more often in other tournament formats? It clearly does a better job at determining more accurate results.
One thing that I'm really curious about is what would be the optimum tournament format. For example say you wanted the most accurate results after playing a total of 'N' games, what's the best way to do that. What about tweaking the format so that the first round of games is a Bo1 and make it a triple elimination tournament or quadruple elimination tournament? Would it be better to have fewer games played across a wider variety of players? Also how efficient is the pool play system that the GSL tournament uses. It seems like a lot of very good players failed to qualify out of the pool play.
Anyways, great post nzb!
|
On November 12 2010 11:13 Dragar wrote:Show nested quote +On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Well, that is precisely what I am wondering. Is it possible that rather than specifying a 'goal' for the tournament, you instead want to find the tournament framework that minimises the impact of the order in which the players face each other (i.e. ideally it shouldn't matter if Idra plays Huk in the first game or HDStarcraft) when determining the winner (or rankings, or whatever goal you want to pick for the tournament). The point of going to these lengths is to address exactly the difficulty you talk about - it's not realistic to impose a total order on the skill levels of the players (though it's obviously not terribly far out; we just want to know if it will affect the analysis or not). This is a good idea. I think I will try to see if I can work out anything from this point of view tomorrow (3:30am here). Does anyone know where we can find the official mlg rules for map sets, rankings, brackets, map elimination etc.? Would like to analyze it using the mlg format.
|
My general thought is this. At this skill level, in best of series, talent shakes out, and the more games 2 players play, the better sample is generated to truly determine the better player. Looking at an extended series as a bo7 (ie, removing the time between games element), it does a better job of getting a winner, and avoids ambiguous results and prevents "incorrect" results.
For example, there is simply no way Idra should lose to HD in a best of 7 if he was much more talented. Hence, if you play an early set and HD beats one 2-1, the reasonable expectation should be that either way, if Idra truly more talented, the second match up should see a 2-0 or 2-1 result for Idra, making the series 3-2 or 3-3, which if we assume that the vastly better player wins the best of 7, even if they played an extra match, Idra should win and move on, meaning in an extended series or second bo3 Idra should win- there is no difference between the 2. This also avoids the problem of idra winning the first set and losing the second- allowing the worse player to move on, even if the series was 3-2. Again, this is working under the assumption that the vastly more talented player should win every best of 7.
However, in even games, the extended series does a better job of truly showing who the better player is by increasing the population of games. For example, if Idra and Nony play, most people would say the skill difference is negligible- whoever would win a bo7 would be the one who happened to be playing better that day, had a special build, or something like that. Whoever wins the first series, lets assume idra, and they play again, the extended series does a better job of determining the better player the outcome is acceptable. Take the normal non extended series If idra wins again, he was the better player that day, no problems, he won 4 of the 7 games. However, should nony win with the same score from the first series, say both were 2-0, we have no way of telling who the better player was that day, just that nony happened to win later on, but overall, with a 2-2 score, there is no actual result or closure- ambiguity remains. But with an extended series, playing out the rest, someone would have to definitively prove they were playing better that day. There is no ambiguous result, someone would win a bo7.
Now lets assume 2 players of slightly different skill levels- say Idra and machine. Idra should win a series the majority of the time. In this case, the more games the 2 play, the more the match shifts in idra's favor- as the better player, given infinite matches, he should win the majority. So increasing the number of matches they play increases the chance the better player moves on.
From a purely analytical standpoint- extending the series and playing more games simply increases the better players opportunity to win the series.
|
On November 12 2010 10:32 nzb wrote:Show nested quote +On November 12 2010 10:26 zulu_nation8 wrote: I think your study would only be meaningful if people actually assumed a bo7 series does not determine the best player as well as a bo3 series. I'm not really sure what you are responding to ... The point of this is to determine exactly how much of an effect extended series has, both for individual matches and for an entire tournament. I'm pretty sure I haven't seen anyone talk about this with real numbers to back up what they are saying
I don't understand why numbers are needed to argue that a format with lots of games played has less variance than a format with few games. What does your conclusion show that we don't already know? Moreover what can you interpret from the data you collected? What does the 1% increase mean? Is it significant? What does it reveal about extended series other than it slightly increases the chance that the "best" player wins, which everyone should already understand since extended series is a bo7 compared to the normal bo3. I also don't agree with your performance model, skill level is extremely difficult to quantify.
|
Your model is incomplete, as it ignores the fact that the extended series suddenly changes the rules. While it seems insignificant for sports based on (physical) skill, because players always play 'straight up', it heavily messes up a strategy game like SC2.
In a bo3 both players play 'honest', because one loss puts you on the verge of losing the match. And the best player should win, especially because he can increase his chances by picking the final map, but sometimes the weaker player has a good day and takes the win.
When they meet again and the series resumes as bo7, that completely messes up the rules. P1 starts with 4 losses to elimination, which enables him to use very risky tactics. He can all-in/cheese 3 times until it gets serious and he gets to pick 2 maps, which is very convenient because he only needs 2 wins.
And that's the problem it heavily increases his chances to win, way more than it should, especially if its a non mirror matchup. That's something unique to SC2 (or strategy games in general), these all-in tactics allow a player to reduce the decision making to a minimum, which means the better player can't take advantage from his higher skill level and let's be honest even the best players won't be able to scout every cheesy tactic in every game.
So what should happen in your simulation is a lot more false-positives where a inferior player advances one round just because of that rule. It will happen several times per tournament (depending on the size) and it will cause other superior players to rank lower than expected, because they dropped out since they didn't get a extended series in their favor and got matched against another high class player instead.
That's why I don't buy this 58% success rate of the extended series, but the problem is you can't model the effect of map picking/risky tactics that the player from the winners bracket gets, because there are no statistics to get these % from.
Now, what you could do is replace the extended series with a fresh bo5, because in the worst case they had to play 4-5 games anyway and that should increase the success of the better player, because they start on equal ground and he actually benefits from his higher skill regardless if he won or lost the first bo3.
Of course I can't prove my point that the extended series is reaaally bad, because only mlg uses it for SC2 and 3 tournaments are not enough to get results from statistics. (it would be hard anyway because the real skill level of each player remains unknown)
But why even risk to use a flawed system like that (flawed for strategy games) when a more reasonable solution like a fresh bo5 is available? Especially when your simulations seems to prove that the effect on the final ranking is minimal, even in a perfect world that assumes that players never play cheesy.
|
To me the point of contention is clear:
If you believe that you are 'playing the field' in the tournament, then you should count them as separate series.
If you believe that that tournament is a series of head-to-head match-ups, which seems to appeal more strongly to a sense of personal fairness, then the best of 7 provides a better picture of relative skill.
Personally, I prefer the best of 7 because I prefer to at least keep the head to head matches fair. The separate series format seems to go back into the teeth of the flaws of the system with the randomness of seeding, etc.
I also do not like when we start discussing 'what a player deserves'. Seems like a bad route to travel.
|
On November 12 2010 11:51 Nienordir wrote: Your model is incomplete, as it ignores the fact that the extended series suddenly changes the rules. While it seems insignificant for sports based on (physical) skill, because players always play 'straight up', it heavily messes up a strategy game like SC2.
In a bo3 both players play 'honest', because one loss puts you on the verge of losing the match. And the best player should win, especially because he can increase his chances by picking the final map, but sometimes the weaker player has a good day and takes the win.
When they meet again and the series resumes as bo7, that completely messes up the rules. P1 starts with 4 losses to elimination, which enables him to use very risky tactics. He can all-in/cheese 3 times until it gets serious and he gets to pick 2 maps, which is very convenient because he only needs 2 wins.
And that's the problem it heavily increases his chances to win, way more than it should, especially if its a non mirror matchup. That's something unique to SC2 (or strategy games in general), these all-in tactics allow a player to reduce the decision making to a minimum, which means the better player can't take advantage from his higher skill level and let's be honest even the best players won't be able to scout every cheesy tactic in every game.
So what should happen in your simulation is a lot more false-positives where a inferior player advances one round just because of that rule. It will happen several times per tournament (depending on the size) and it will cause other superior players to rank lower than expected, because they dropped out since they didn't get a extended series in their favor and got matched against another high class player instead.
That's why I don't buy this 58% success rate of the extended series, but the problem is you can't model the effect of map picking/risky tactics that the player from the winners bracket gets, because there are no statistics to get these % from.
Now, what you could do is replace the extended series with a fresh bo5, because in the worst case they had to play 4-5 games anyway and that should increase the success of the better player, because they start on equal ground and he actually benefits from his higher skill regardless if he won or lost the first bo3.
Of course I can't prove my point that the extended series is reaaally bad, because only mlg uses it for SC2 and 3 tournaments are not enough to get results from statistics. (it would be hard anyway because the real skill level of each player remains unknown)
But why even risk to use a flawed system like that (flawed for strategy games) when a more reasonable solution like a fresh bo5 is available? Especially when your simulations seems to prove that the effect on the final ranking is minimal, even in a perfect world that assumes that players never play cheesy.
You are right that there all kinds of effects that aren't being captured in the model. That's why I said the player model was definitely the weakest part of the analysis. What you bring up is interesting, because unlike many other objections, it is a systematic error that would favor the winner of the winners' round game. However, statistically speaking, this person is likely to be the 'better' player, so it probably doesn't actually change things that much. It would decrease the 58%, but it would also decrease the 4%.
I guess in a larger sense, you can't take any of the exact numbers from the original post literally. This is a model, and it is a simplified one. I absolutely, 100% guarantee that every individual number in the original post is wrong. That wasn't the point, though. The point was the overall trends, and I still think they are correct.
Notice that the conclusion doesn't reference a single number from the body of the post. Instead, it draws lessons from the numbers and states those. I think they are still, by and large, correct:
- Extended series will increase the likelihood that the better player advances. (Pending your objection, probably by less than the analysis shows.) - However, it won't have much impact on overall tournament settings. - If we want to improve tournament outcomes, we should modify the tournament format.
I didn't talk about this in the main post, because its just my opinion and wasn't backed by any numbers, but I think a good format would be:
- Play swiss-style tournament to determine the top 8-16 players. - Play single elimination to get champion.
This would be a very reliable way to determine the top 8 or 16, and then would switch into overdrive to determine the champ. It would be very exciting, similar to how the NCAA does March Madness. I would love it if we could get someone to do some special event using this format just to try it out.
|
I feel like in the numbery statisticy way of thinking, yes the extended series makes sense. but you have to think of the tourney scenario. later in the tournament, there is more pressure or money at stake. so then the players may play different. this alone should me the new best of three, as incontrol says, an isolated event. this should be completely separate from the first best of three. there is also the momentum aspect. if in the first BO3 - player A wins the first game. he will then have a psychological advantage. so then he has momentum going into the second game. since the second best of three doesnt have that same momentum because the games are played at different times, i dont think they should be considered of the same series.
|
Lol what's with people saying "THIS MODEL IS IMCOMPLETE"
well of course it is. You can NEVER have a perfect model, and realize having simplest model is the best if you took any kind of studies on stats.
The numbers are but a tool for whether determining the meaning of series of data. It's not supposed to be the end all description of everything.
This is a great model for determining the value of extended series statistically. It's simple and straight forward, and for the most part, describes different kind of tournament formats within reasons. Besides, even if you try to add all these "effects", given that he did a million tries, the data will probably not shift in any meaningful ways.
I think people should stop discussing the validity of it.
|
On November 12 2010 13:12 italiangymnast wrote: I feel like in the numbery statisticy way of thinking, yes the extended series makes sense. but you have to think of the tourney scenario. later in the tournament, there is more pressure or money at stake. so then the players may play different. this alone should me the new best of three, as incontrol says, an isolated event. this should be completely separate from the first best of three. there is also the momentum aspect. if in the first BO3 - player A wins the first game. he will then have a psychological advantage. so then he has momentum going into the second game. since the second best of three doesnt have that same momentum because the games are played at different times, i dont think they should be considered of the same series.
I think one of the points of my post that have been lost on most people is that extended series, although it seems to not negatively effect outcomes, doesn't really seem to help much either in the "macro" sense. Therefore, other considerations become more important.
I think considerations such as: entertainment value, counter-intuitiveness, different tournament settings, etc.. are all very important, and this post makes it clear that the statistics do not support the extended series as a must-have for the double-elimination tournament format.
My conclusion from all of this is that the extended series rule is really a judgement call based on other, subjective qualities. From my reading of public opinion on TL.net, it seems that most people do not like it, and therefore maybe it should be reconsidered.
I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then?
|
nzb thanks for doing this. It seems clear that the doubleExtended is slightly more accurate, which is to be expected as a single bo7 would yield better results than 2 x bo3 if all done stand alone.
The unfactorable variation on this is the situation in which the games have been played. The classic example being Liquid`Tyler vs. PainUser where external issues affected the game in an unknow way, and thus resulted in a (highly) possibly modified outcome. I suppose one could argue this in both ways, with two seperate bo3 might favour one player due to completely uncontrolable cirmustances.
Another simple situation comes to mind with standard doubleElimination: Player A beats Player B, player B is now in loser bracket. Player A then loses to Player C and gets knocked down to the loser bracket. Player A and Player B have their 2nd bo3 in this tournament, and this time, Player B wins. Player A is now knocked out of the tournamet.
So, while both players have won 1 bo3, because of the order, Player A has been knocked out, and Player B continues.
I have a question with your data: In your abstract, you state that single elim yeilds a 19% champion rate for the best player, while double elim gives 24%, double elim+ gives 25% and round robin shows 47%
how did you come to that conclusion from this?:
Format | Winner | Depth | 2^Depth ---------------+--------+-------+-------- Single | 0.91 | 52.09 | 110.07 Double | 0.88 | 48.31 | 89.83 DoubleExtended | 0.88 | 46.01 | 87.42 RoundRobin | 0.72 | 22.29 | 28.85
I understand the results are 1 - winner% , but wouldn't that mean that this shows single elim having a winner of 9% (where 1 - winner% = 0.91)
|
On November 12 2010 13:24 nzb wrote:
I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then?
^This is probably what most people are missing when they simply get caught up about tournament format.
But it's still valid to say Tyler got eliminated because he lost 2 Bo3 series where as Painuser only lost 1 Bo3 in the tournament.
Just different way of looking at things.
|
On November 12 2010 13:33 voss wrote: nzb thanks for doing this. It seems clear that the doubleExtended is slightly more accurate, which is to be expected as a single bo7 would yield better results than 2 x bo3 if all done stand alone.
The unfactorable variation on this is the situation in which the games have been played. The classic example being Liquid`Tyler vs. PainUser where external issues affected the game in an unknow way, and thus resulted in a (highly) possibly modified outcome. I suppose one could argue this in both ways, with two seperate bo3 might favour one player due to completely uncontrolable cirmustances.
As with all of these objections to the model, you have to determine if they systematically favor the winner of the previous series or not. It doesn't seem like random outside factors, such as this, would have any systematic effect. Example: What if Painuser had been the one with a re-game, except in the losers' bracket match after defeating Tyler 2-0 in the winners' bracket? It seems like that would benefit Tyler.
Another simple situation comes to mind with standard doubleElimination: Player A beats Player B, player B is now in loser bracket. Player A then loses to Player C and gets knocked down to the loser bracket. Player A and Player B have their 2nd bo3 in this tournament, and this time, Player B wins. Player A is now knocked out of the tournamet.
So, while both players have won 1 bo3, because of the order, Player A has been knocked out, and Player B continues.
Exactly. This is something I tried to highlight in the intro, but I bet most people didn't read that.
I have a question with your data: In your abstract, you state that single elim yeilds a 19% champion rate for the best player, while double elim gives 24%, double elim+ gives 25% and round robin shows 47%
how did you come to that conclusion from this?:
Format | Winner | Depth | 2^Depth ---------------+--------+-------+-------- Single | 0.91 | 52.09 | 110.07 Double | 0.88 | 48.31 | 89.83 DoubleExtended | 0.88 | 46.01 | 87.42 RoundRobin | 0.72 | 22.29 | 28.85
I understand the results are 1 - winner% , but wouldn't that mean that this shows single elim having a winner of 9% (where 1 - winner% = 0.91)
This is something that I realized was confusing after posting -- the numbers in the abstract are from a tournament with 64 players, and the numbers in Section 4.1 are from a 128-player tournament. I generated the numbers for 64 players to make the graphs for varying #'s of games, and then I realized since I was talking about MLG Dallas, it would be a good idea to use 128-players for my main results. I forgot to change the ones in the abstract. :/ Oh well.
In some sense, its better this way because the 128-player numbers don't show the (slight) benefit of the extended series in the 'winner' metric at the second decimal point. I would need to include more precision.
|
On November 12 2010 13:35 scion wrote:Show nested quote +On November 12 2010 13:24 nzb wrote:
I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then? ^This is probably what most people are missing when they simply get caught up about tournament format. But it's still valid to say Tyler got eliminated because he lost 2 Bo3 series where as Painuser only lost 1 Bo3 in the tournament. Just different way of looking at things.
I think, ultimately, you are screwed either way:
Option A: Use the extended series rule -- its awkward and people don't like it.
Option B: Keep things with BO3, and deal with the weird paradoxes like getting knocked out even though you "beat" the other player, but at least it is consistent.
Because the statistics aren't conclusive, you are left with a judgement call.
In my opinion what this really says is that double elimination has problems, and maybe we should use a different format that does a better job of ranking most of the players, and concludes with an exciting tourney to determine the champ. (See previous post about swiss/single elim hybrid.)
|
Very interesting, The only thing missing would be some measurement of tournament efficiency, that is which model produces the most accurate result in the lowest number of games
|
On November 12 2010 13:47 fenixauriga wrote: Very interesting, The only thing missing would be some measurement of tournament efficiency, that is which model produces the most accurate result in the lowest number of games
If you read the wikipedia pages on tournaments (at bottom of OP), they have a good discussion of this. Swiss seems to have good results, but it has other issues....
|
Thank you for doing this. I had to cringe so hard hearing Nony explain the point behind extended series and then Day9 going "Gee, I didn't think of that, wow."
Then later on both him and Idra claimed the purpose of a tournament format is not to have the best player win because eventually someone not the best player will win. Yeah, it will. Unless you play an infinite round robin. But you can't play so many games.
So then Day9 went into this line that as a math graduate he knew that it doesn't matter what structure you use since the variance of the coins will never be evened out by a tournament structure. This is obviously very wrong as it is very likely for a person to win once when he is only 30% likely to win. But if you need to get that 30% odds several times then that's not going to be likely. The argument that it was fundamentally wrong to even think that a tournament structure would have any effect was so obviously wrong, I was sad Nony got a little intimidated. Yeah, it would be hard for him to make the actual argument as he couldn't run these simulations on that spot in his head and use that as evidence. Anyway he would have to do some handwaving and it wouldn't have looked strong to many viewers, who were already in favour of normal double elim.
People just hate to accept that what they did for years wasn't that good. It's hard for people to accept that in the past people went out of tournaments, eliminated by people they had a wining record against. They have to accept that that was somehow just.
I don't understand why numbers are needed to argue that a format with lots of games played has less variance than a format with few games. What does your conclusion show that we don't already know?
Didn't you hear what Day9 said on that podcast? And, there's still people here that dispute the result.
I remember when people didn't understand why the person in coming to the finals of the loser bracket had to win twice. Some people aren't very good at this stuff.
|
On November 12 2010 13:58 nzb wrote:Show nested quote +On November 12 2010 13:47 fenixauriga wrote: Very interesting, The only thing missing would be some measurement of tournament efficiency, that is which model produces the most accurate result in the lowest number of games If you read the wikipedia pages on tournaments (at bottom of OP), they have a good discussion of this. Swiss seems to have good results, but it has other issues....
Personally I like the idea of elimination swiss. Basically you use the swiss pairing system and then eliminate players once they have lost 3 rounds. Eventually when there are 4 or less players left, the do a playoff for the top spot.
|
On November 12 2010 14:06 Almeisan wrote: So then Day9 went into this line that as a math graduate he knew that it doesn't matter what structure you use since the variance of the coins will never be evened out by a tournament structure. This is obviously very wrong as it is very likely for a person to win once when he is only 30% likely to win. But if you need to get that 30% odds several times then that's not going to be likely. The argument that it was fundamentally wrong to even think that a tournament structure would have any effect was so obviously wrong, I was sad Nony got a little intimidated. Yeah, it would be hard for him to make the actual argument as he couldn't run these simulations on that spot in his head and use that as evidence. Anyway he would have to do some handwaving and it wouldn't have looked strong to many viewers, who were already in favour of normal double elim.
Hahahah... This is exactly why I was inspired to do this. I was like... "C'mon Day[9]! You are representing all science grad students in the universe on this show, and you come up with this crap?" I'd like to think that Nony was just chillin' and didn't want to fight about it anymore. Anyway, I had to run the numbers myself and show the (admittedly small) advantage of the extended series. I thought it might make a bigger difference than it did, but facts are facts.
|
I didn't read all of the statistics stuff, but really interesting analysis. I think the only major tournament that uses group stages followed by single elim Ro16 is WCG, and it seems like the best players always win there (Korea won every single BW WCG since forever). It'd be cool if more tournaments picked up that kind of format if it's indeed more accurate in less # of games.
|
On November 12 2010 14:06 Almeisan wrote:Thank you for doing this. I had to cringe so hard hearing Nony explain the point behind extended series and then Day9 going "Gee, I didn't think of that, wow." Then later on both him and Idra claimed the purpose of a tournament format is not to have the best player win because eventually someone not the best player will win. Yeah, it will. Unless you play an infinite round robin. But you can't play so many games. So then Day9 went into this line that as a math graduate he knew that it doesn't matter what structure you use since the variance of the coins will never be evened out by a tournament structure. This is obviously very wrong as it is very likely for a person to win once when he is only 30% likely to win. But if you need to get that 30% odds several times then that's not going to be likely. The argument that it was fundamentally wrong to even think that a tournament structure would have any effect was so obviously wrong, I was sad Nony got a little intimidated. Yeah, it would be hard for him to make the actual argument as he couldn't run these simulations on that spot in his head and use that as evidence. Anyway he would have to do some handwaving and it wouldn't have looked strong to many viewers, who were already in favour of normal double elim. People just hate to accept that what they did for years wasn't that good. It's hard for people to accept that in the past people went out of tournaments, eliminated by people they had a wining record against. They have to accept that that was somehow just. Show nested quote +I don't understand why numbers are needed to argue that a format with lots of games played has less variance than a format with few games. What does your conclusion show that we don't already know? Didn't you hear what Day9 said on that podcast? And, there's still people here that dispute the result. I remember when people didn't understand why the person in coming to the finals of the loser bracket had to win twice. Some people aren't very good at this stuff.
If I'm interpreting what Day9 said correctly (which its possible I'm not) his point that was that you would have to play an infinite number of games to see who is the best player if each player was a weighted coin flip (e.g 70% win, 30% lose rate). A player who had a 99% chance of winning each game could still lose 1000 games in a row. He is correct in saying that a tournament doesn't provide an absolute rank of player skill, it provides a poor estimation of player skill. That is assuming you believe that win probability is the same thing as skill too.
The main issue of contention appears to the purpose of double elimination. Some people believe that the purpose is to help determine the best player, other people believe it is to give players a second chance if they simply screw up in one Bo3.
|
On November 12 2010 14:15 nzb wrote: Hahahah... This is exactly why I was inspired to do this. I was like... "C'mon Day[9]! You are representing all science grad students in the universe on this show, and you come up with this crap?"
Yeah, it was like Incontrol and Idra hammering down the dogma of the bracket: "That's not how it works so it's artificial and it shouldn't matter."
Then Nony says: "Extended series gives you are more accurate ranking. And if you guys don't care about how accurate a ranking a tournament produces, I am confused. Day9 please help me out."
Then Day9 comes in and says: "I am going to step in and be like super fuking mathematical on everybody. A tournament does not determine a ranking of players, period." Then he does come up in his head with the idea that you can actually simulate this stuff and find out. But then he does "I personally feel that regular old double elim clicks a little bit better in my head."
Then Nony respond, repeating his point.
Day9 again "That again is a series of arbitrary judgments."
That made me facepalm so hard...
|
this experiment is awesome and it was a fun read, thanks OP
|
On November 12 2010 14:27 Gudeldar wrote:Show nested quote +On November 12 2010 14:06 Almeisan wrote:Thank you for doing this. I had to cringe so hard hearing Nony explain the point behind extended series and then Day9 going "Gee, I didn't think of that, wow." Then later on both him and Idra claimed the purpose of a tournament format is not to have the best player win because eventually someone not the best player will win. Yeah, it will. Unless you play an infinite round robin. But you can't play so many games. So then Day9 went into this line that as a math graduate he knew that it doesn't matter what structure you use since the variance of the coins will never be evened out by a tournament structure. This is obviously very wrong as it is very likely for a person to win once when he is only 30% likely to win. But if you need to get that 30% odds several times then that's not going to be likely. The argument that it was fundamentally wrong to even think that a tournament structure would have any effect was so obviously wrong, I was sad Nony got a little intimidated. Yeah, it would be hard for him to make the actual argument as he couldn't run these simulations on that spot in his head and use that as evidence. Anyway he would have to do some handwaving and it wouldn't have looked strong to many viewers, who were already in favour of normal double elim. People just hate to accept that what they did for years wasn't that good. It's hard for people to accept that in the past people went out of tournaments, eliminated by people they had a wining record against. They have to accept that that was somehow just. I don't understand why numbers are needed to argue that a format with lots of games played has less variance than a format with few games. What does your conclusion show that we don't already know? Didn't you hear what Day9 said on that podcast? And, there's still people here that dispute the result. I remember when people didn't understand why the person in coming to the finals of the loser bracket had to win twice. Some people aren't very good at this stuff. If I'm interpreting what Day9 said correctly (which its possible I'm not) his point that was that you would have to play an infinite number of games to see who is the best player if each player was a weighted coin flip (e.g 70% win, 30% lose rate). A player who had a 99% chance of winning each game could still lose 1000 games in a row. He is correct in saying that a tournament doesn't provide an absolute rank of player skill, it provides a poor estimation of player skill. That is assuming you believe that win probability is the same thing as skill too. The main issue of contention appears to the purpose of double elimination. Some people believe that the purpose is to help determine the best player, other people believe it is to give players a second chance if they simply screw up in one Bo3.
The purpose of a tournament is to determine a "winner" by sorting the players based off their performance in the tournament. People will naturally try and correlate the rankings produced by the tournament to their own estimations of player skill. Maybe b/c I take a computer science approach to the discussion I can't help but view the tournament itself as just another sorting algorithm.
Philosophically skill isn't quantifiable in any abstract sense and characterizing it in terms of probability creates some issues.
|
On November 12 2010 14:33 Almeisan wrote:Show nested quote +On November 12 2010 14:15 nzb wrote: Hahahah... This is exactly why I was inspired to do this. I was like... "C'mon Day[9]! You are representing all science grad students in the universe on this show, and you come up with this crap?"
Yeah, it was like Incontrol and Idra hammering down the dogma of the bracket: "That's not how it works so it's artificial and it shouldn't matter." Then Nony says: "Extended series gives you are more accurate ranking. And if you guys don't care about how accurate a ranking a tournament produces, I am confused. Day9 please help me out." Then Day9 comes in and says: "I am going to step in and be like super fuking mathematical on everybody. A tournament does not determine a ranking of players, period." Then he does come up in his head with the idea that you can actually simulate this stuff and find out. But then he does "I personally feel that regular old double elim clicks a little bit better in my head." Then Nony respond, repeating his point. Day9 again "That again is a series of arbitrary judgments." That made me facepalm so hard...
What really made me facepalm was Day9 saying that regardless of what Tyler said it doesn't "feel" right. Thankfully we were able to prove the world was round, if we went by our feelings we'd still think it was flat.
|
I hate the extended series model, primarily because it adds such a huge element of luck to any one player's individual performance. If you lose to a player in the WB and he subsequently losts his next round, the future of your tournament now depends on you randomly not getting paired with that one player. Likewise, his performance will get a boost with that random pairing. Besides, he already beat you in a Bo3 series, so he is more likely to have the advantage anyway, why exaggerate it?
|
I want to ask why extended series would let the better player win more often in your model if the worse player won the first series and starts out with a lead in the extended series. Wouldn't the extended series format in this case make it harder for the better player to win?
If the better player wins the first series, then the extended series format merely protects the player's skill advantage so that upsets are less likely to occur compared to a normal double elim. format.
If the better player loses the first series, then wouldn't the extended series format make it less likely for the overall result to adjust itself meaning for the better player wins the second series?
I assume the slight edge extended series has over double elimination only signifies because the better player wins the first series most of the time, the extended series thus ends up protecting a lead of the better player most of the time. However in reality where skill level is much more difficult to quantify than with a few variables, the extended series format merely adds extra protection for whoever wins the first series, rather than for whoever is truly more skilled.
In this case the debate between the two format becomes only an ethical consideration of whether the player who wins the first series deserves an advantage despite having the same status as his opponent, that is both players are in the loser's bracket.
I'm not very good at stats so you'll have to explain exactly where the % difference between double elim and extended series comes from. Again I disagree with your player model. If skill level needs to be protected then a seed format would be more than enough. I agree with a standard bo7.
|
Maybe Day9 can be 'excused' because he wasn't listening to the earlier discussion because in the context he said it in it basically sounded like he meant that any tournament is an inaccurate ranking so the accurateness of a tournament system doesn't matter. Incontrol and Idra earlier had already said that tournaments are never perfectly accurate. Which of course is an obvious point in which Nony responded that it matters how much accuracy you can get with a certain number of games. It really sounded to me that he said that because there's no 100% win coins, tournament results are never accurate so the effect of a tournament structure is irrelevant.
I think tournament accuracy is a huge problem for SC2. And I think it will get worse in the future. In SC BW really strong players can consistently get into the top4 of starleagues. And in SC BW there's a really strong player field. All those B teamers are really strong and there isn't that much skill difference between them and say Flash or Jaedong. But in SC BW the more skilled player wins most of the games. SC2 is a lot more like WC3 where there's just a lot of luck. We already see this in MLG and GSL results. The top 8 has been quite different every time even in such a short time period. And when players get better and skill margins get even smaller we may have 500 people or more who are really all no worse than 45-55 vs each other. And especially without prize money this will mean that when the initial hype of SC2 as a new game is gone no one will really be able to be professional.
When in more than 50% of the cases the top 8 best players all won't be in the top8 of this 512 player tournament, that is a problem.
|
On November 12 2010 14:56 Almeisan wrote: Maybe Day9 can be 'excused' because he wasn't listening to the earlier discussion because in the context he said it in it basically sounded like he meant that any tournament is an inaccurate ranking so the accurateness of a tournament system doesn't matter. Incontrol and Idra earlier had already said that tournaments are never perfectly accurate. Which of course is an obvious point in which Nony responded that it matters how much accuracy you can get with a certain number of games. It really sounded to me that he said that because there's no 100% win coins, tournament results are never accurate so the effect of a tournament structure is irrelevant.
I think tournament accuracy is a huge problem for SC2. And I think it will get worse in the future. In SC BW really strong players can consistently get into the top4 of starleagues. And in SC BW there's a really strong player field. All those B teamers are really strong and there isn't that much skill difference between them and say Flash or Jaedong. But in SC BW the more skilled player wins most of the games. SC2 is a lot more like WC3 where there's just a lot of luck. We already see this in MLG and GSL results. The top 8 has been quite different every time even in such a short time period. And when players get better and skill margins get even smaller we may have 500 people or more who are really all no worse than 45-55 vs each other. And especially without prize money this will mean that when the initial hype of SC2 as a new game is gone no one will really be able to be professional.
When in more than 50% of the cases the top 8 best players all won't be in the top8 of this 512 player tournament, that is a problem.
You are incorrect in thinking there isn't much of a skill disparity between some random B-teamer and Flash or Jaedong. The skill disparity in BW is significantly greater than SC2.
EDIT: I understand you're trying to say the playing field is generally strong in BW which is true and you're right about SC2 having more luck involved, I don't want to detract from the point you're trying to make ^^
|
On November 12 2010 14:59 space_yes wrote:Show nested quote +On November 12 2010 14:56 Almeisan wrote: Maybe Day9 can be 'excused' because he wasn't listening to the earlier discussion because in the context he said it in it basically sounded like he meant that any tournament is an inaccurate ranking so the accurateness of a tournament system doesn't matter. Incontrol and Idra earlier had already said that tournaments are never perfectly accurate. Which of course is an obvious point in which Nony responded that it matters how much accuracy you can get with a certain number of games. It really sounded to me that he said that because there's no 100% win coins, tournament results are never accurate so the effect of a tournament structure is irrelevant.
I think tournament accuracy is a huge problem for SC2. And I think it will get worse in the future. In SC BW really strong players can consistently get into the top4 of starleagues. And in SC BW there's a really strong player field. All those B teamers are really strong and there isn't that much skill difference between them and say Flash or Jaedong. But in SC BW the more skilled player wins most of the games. SC2 is a lot more like WC3 where there's just a lot of luck. We already see this in MLG and GSL results. The top 8 has been quite different every time even in such a short time period. And when players get better and skill margins get even smaller we may have 500 people or more who are really all no worse than 45-55 vs each other. And especially without prize money this will mean that when the initial hype of SC2 as a new game is gone no one will really be able to be professional.
When in more than 50% of the cases the top 8 best players all won't be in the top8 of this 512 player tournament, that is a problem. You are incorrect in thinking there isn't much of a skill disparity between some random B-teamer and Flash or Jaedong. The skill disparity in BW is significantly greater than SC2. EDIT: I understand you're trying to say the playing field is generally strong in BW which is true and you're right about SC2 having more luck involved
Yes and in rare cases where top level players have to go through a large player pool in offline prelims for examples, the more accomplished ones are given byes. I don't follow the SC2 scene but the normal double elim. format has worked for years and years in competitive gaming, I would love to see why this extended series format was introduced. If top players are constantly getting cheesed, then make the series longer. If the player pool is too big, then start assigning seeds. I see no purpose for extended series.
|
On November 12 2010 14:48 space_yes wrote:Show nested quote +On November 12 2010 14:33 Almeisan wrote:On November 12 2010 14:15 nzb wrote: Hahahah... This is exactly why I was inspired to do this. I was like... "C'mon Day[9]! You are representing all science grad students in the universe on this show, and you come up with this crap?"
Yeah, it was like Incontrol and Idra hammering down the dogma of the bracket: "That's not how it works so it's artificial and it shouldn't matter." Then Nony says: "Extended series gives you are more accurate ranking. And if you guys don't care about how accurate a ranking a tournament produces, I am confused. Day9 please help me out." Then Day9 comes in and says: "I am going to step in and be like super fuking mathematical on everybody. A tournament does not determine a ranking of players, period." Then he does come up in his head with the idea that you can actually simulate this stuff and find out. But then he does "I personally feel that regular old double elim clicks a little bit better in my head." Then Nony respond, repeating his point. Day9 again "That again is a series of arbitrary judgments." That made me facepalm so hard... What really made me facepalm was Day9 saying that regardless of what Tyler said it doesn't "feel" right. Thankfully we were able to prove the world was round, if we went by our feelings we'd still think it was flat.
Don't be so dramatic, we are talking about a tournament not the objective nature of reality. In a way what "feels right" or what the viewing public wants is the only factor that should be important to MLG. From their perspective the only reason they even put on the tournament is so people will watch it so they can make money from advertising and HD pass purchases.
|
On November 12 2010 14:51 zulu_nation8 wrote: I want to ask why extended series would let the better player win more often in your model if the worse player won the first series and starts out with a lead in the extended series. Wouldn't the extended series format in this case make it harder for the better player to win?
First of all the stats show it's true. I assume you accept that. So let me give an example of what could happen.
Ok, I know you know the BW scene so let me make a BW player analogy as it's hard to judge absolute ranking in SC2.
Say we have a bo3 double elim tournament with Flash, Idra, G5 and a bunch of irrelevant amateurs. If we would do an infinite round robin Flash would clearly be 1, Idra would clearly be 2, G5 would clearly be 3 and then the rest. That's the expected result.
Say Idra runs into G5 early. Idra is the better player and expected to win. He wins 2-0. Next round Idra plays vs Flash. Idra loses as he would most of the time. In a single elim he would be out and he wouldn't even reach top 16. And he wouldn't not reach top 16 because he played below his standard. He was expected to lose vs Flash and expected to beat anyone else and he did exactly that. He game the performance that should give him no.2 but he doesn't get that. Double elim gives players in this case a change to show their no.2 performance.
Now the tournament is double elim. Flash makes it all the way to the finals and wins that too. But what happens to Idra and G5. They meet again in the loser bracket. Now G5 has a nice abusive style Idra is weak against. G5 beats Idra 2-1. Idra is out. He is eliminated twice. Once by Flash and once by G5.
But paradoxally the stats of the the games show Idra performed as he should vs G5. He won most of his games vs G5. He gave his no.2 performance. Yet he is out and G5 is the player Flash beats in the finals.
So it's not that Idra performed worse than G5. He was expected to perform better and he did. He went 3-2 vs G5 which translates to winning a Bo5. Yet the double elim judges G5 to be the better player.
The point of the double elim is that you need to be eliminated twice. You need to lose vs 2 players that are better than you, to even out no1 and no2 meeting early. Flash eliminated Idra once. That's clear. G5 also got eliminated once, vs Flash. And then Idra got eliminated by a player that performed worse vs him, namely G5. That's odd and against the idea of double elim in the first place. That's why they have extended series.
Now you can argue for whatever reason that the first bo3 is rightly thrown out. But that's besides the point. It is information that was available. It is information that can be used to judge more accurately what the playing strength of each player actually is. We have this coin flip that is maybe 65% in favour of Idra. When you flip it 5 times you get more info than when you flip it 3 times. If you already thrown it twice and you discard those results you are just going to have more incomplete info compared to using the info of the first bo3 as well. Now the difference may be not so big. But if you have a certain coin that has anywhere from 1/99 to 99/1 probability and you throw it 1000 times, you know a lot about that coin. You can estimate what the probability of that coin is. But if you throw it 1000 times and then only look at the last 3 flips, you are throwing out valuable information and you are going to be way less accurate in your estimate. Normal double elim does the same thing, just with a smaller error margin.
|
Almeisan, hilarious you picked Idra and G5. Your post reminded me of their game on Destination lolololol (if you've seen it).
|
On November 12 2010 14:51 zulu_nation8 wrote: I want to ask why extended series would let the better player win more often in your model if the worse player won the first series and starts out with a lead in the extended series. Wouldn't the extended series format in this case make it harder for the better player to win?
If the better player wins the first series, then the extended series format merely protects the player's skill advantage so that upsets are less likely to occur compared to a normal double elim. format.
If the better player loses the first series, then wouldn't the extended series format make it less likely for the overall result to adjust itself meaning for the better player wins the second series?
I assume the slight edge extended series has over double elimination only signifies because the better player wins the first series most of the time, the extended series thus ends up protecting a lead of the better player most of the time. However in reality where skill level is much more difficult to quantify than with a few variables, the extended series format merely adds extra protection for whoever wins the first series, rather than for whoever is truly more skilled.
In this case the debate between the two format becomes only an ethical consideration of whether the player who wins the first series deserves an advantage despite having the same status as his opponent, that is both players are in the loser's bracket.
I'm not very good at stats so you'll have to explain exactly where the % difference between double elim and extended series comes from. Again I disagree with your player model. If skill level needs to be protected then a seed format would be more than enough. I agree with a standard bo7.
I think this is where the 58%/4% comes in. As someone else has already stated, these numbers can't be taken literally, but it gives some frame of reference for what can happen.
Basically, if the better player loses initially, then he has a 58% chance in my model to come back and win.
However, if the better player loses initially, then he has only a 4% chance of losing in the extended series.
So it is heavily skewed towards protecting the better player, regardless of the outcome of the first BO3.
|
On November 12 2010 15:09 Almeisan wrote:Show nested quote +On November 12 2010 14:51 zulu_nation8 wrote: I want to ask why extended series would let the better player win more often in your model if the worse player won the first series and starts out with a lead in the extended series. Wouldn't the extended series format in this case make it harder for the better player to win?
First of all the stats show it's true. I assume you accept that. So let me give an example of what could happen. Ok, I know you know the BW scene so let me make a BW player analogy as it's hard to judge absolute ranking in SC2. Say we have a bo3 double elim tournament with Flash, Idra, G5 and a bunch of irrelevant amateurs. If we would do an infinite round robin Flash would clearly be 1, Idra would clearly be 2, G5 would clearly be 3 and then the rest. That's the expected result. Say Idra runs into G5 early. Idra is the better player and expected to win. He wins 2-0. Next round Idra plays vs Flash. Idra loses as he would most of the time. In a single elim he would be out and he wouldn't even reach top 16. And he wouldn't not reach top 16 because he played below his standard. He was expected to lose vs Flash and expected to beat anyone else and he did exactly that. He game the performance that should give him no.2 but he doesn't get that. Now the tournament is double elim. Flash makes it all the way to the finals and wins that too. But what happens to Idra and G5. They meet again in the loser bracket. Now G5 has a nice abusive style Idra is weak against. G5 beats Idra 2-1. Idra is out. He is eliminated twice. Once by Flash and once by G5. But paradoxally the stats of the the games show Idra performed as he should vs G5. He won most of his games vs G5. He gave his no.2 performance. Yet he is out and G5 is the player Flash beats in the finals. So it's not that Idra performed worse than G5. He was expected to perform better and he did. He went 3-2 vs G5 which translates to winning a Bo5. Yet the double elim judges G5 to be the better player. The point of the double elim is that you need to be eliminated twice. You need to lose vs 2 players that are better than you, to even out no1 and no2 meeting early. Flash eliminated Idra once. That's clear. G5 also got eliminated once, vs Idra. And then Idra got eliminated by a player that performed worse vs him, namely G5. That's odd and against the idea of double elim in the first place. That's why they have extended series.
Thanks for the explanation, I see the point now. However as I'm sure others have mentioned, meeting in the loser's bracket in a single elimination format is different from meeting under double elimination, therefore it should not be obvious the Idra in your scenario should be given an advantage upon meeting G5 again.
I don't necessarily think both players should start out even when they meet again, but if an advantage is to be given to the player who won the first series, I think the better format is to have a best of three series on top of the loser bracket series, with Idra starting out 1-0 so that G5 would have to win two bo3 or bo5 series to advance, similar to the format of the grand final in a normal double elimination format. In the multiple BO system the player who lost the first series would be allowed to lose more games than in an extended series bo7.
Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
Edit: Actually I still disagree with any kind of advantage given to the initial winner. My opinion is only influenced by BW however. I think winning the first series already gives enough of a psychological advantage that no further changes in the format is necessary to protect it. If two players are close enough in skill that one can come back to win a BO3 series after having already lost a series earlier, the format should have no responsibility to give extra advantage to either player. The most important aspect is that the players play a BO series, and not single games. The BO series itself should do enough to even out variance. In your example, Idra might win 3 out of 4 games he plays vs G5 in normal circumstances, but because of various factors he loses the single elim series 1-2, the result isn't really that farfetched, certainly not enough so that a change in the tournament format should be instated so that Idra will always win at least 2 out of 3 no matter what.
|
Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
You are, obviously, right in some sense. But the purpose of the study is to find the "big picture" statistical behavior, and to capture the effects that influence this most heavily.
While I'm sure that my model is missing things, objections need to have systematic effect -- that is, in the long run they favor the better players or worse ones, or the winner of the winners' bracket game, etc.. Otherwise, you would expect that they would balance out after enough simulation (and I ran it a million times). Effects that just increase the randomness will change the results somewhat, but they probably will not change the trends, which is what we care about anyway.
|
Double Extended is definitely more accurate a measure then plain double and is efficient.
Arguing that each round is isolated and shouldn't cause an extended series goes against the whole principal of an isolated tournament where its the battle to have the best winning streak. You can't have it both ways.
Either you have only single elimination or double extended. Plain double is never valid in any circumstance. How this isn't understood by tournament veterans is odd.
Over multiple single eliminations the players rank will become more accurate but the double & double extended speed up the process.
The Idra arguement of his zvt is better the anothers persons matchup is completely irrelevent. Race doesn't matter as you either play to win in a tournament setting beating everyone or you want a league where you can lose but overall your average skill level will be shown.
Round robins are the most accurate most fair but no one has that kind of time.
|
In my opinion, the purpose of a tournament is not to "find the best player". It is to provide entertainment for both players and fans. If "finding the best player" were the only criteria, then we would just use a round-robin format containing many games.
Thus, since the extended series is deemed as confusing and is disliked by many fans, then it should not be used.
|
On November 12 2010 15:56 Adeeler wrote: Double Extended is definitely more accurate a measure then plain double and is efficient.
Arguing that each round is isolated and shouldn't cause an extended series goes against the whole principal of an isolated tournament where its the battle to have the best winning streak. You can't have it both ways.
Either you have only single elimination or double extended. Plain double is never valid in any circumstance. How this isn't understood by tournament veterans is odd.
Over multiple single eliminations the players rank will become more accurate but the double & double extended speed up the process.
The Idra arguement of his zvt is better the anothers persons matchup is completely irrelevent. Race doesn't matter as you either play to win in a tournament setting beating everyone or you want a league where you can lose but overall your average skill level will be shown.
Round robins are the most accurate most fair but no one has that kind of time.
In the context of Almeisan's example, you can look at it as G5 obtaining an advantage for having gone on a win streak in the loser's bracket before meeting Idra, where the advantage is for their series to start out 0-0. In the grand final the winner's bracket winner starts out with an advantage because he went undefeated.
|
Zulu, I added a bit later about how normal double elim ignores available information. Without that part it is too much an argument about why it's better rather than about why it performs worse in simulations.
There is also the problem that G5 got only eliminated once by Idra. That is a flaw, imo. But it's a flaw of double elim in general and the same flaw single elim has. When the two player meet again in the loser bracket one of them is going to be eliminated out of the tournament. There's no way around that. And in some cases it's just impossible to not have the same match in the loser bracket you had in the winner bracket.
It's possible Idra meets Flash early on and that G5 cheesed vs Flashes 12 CC and wins 2-0. Then Idra loses to Flash in an extended series in the loser bracket and is eliminated technically only once and by Flash. That's a flaw that is in both systems that you can only fix by making it a round robin. But it's different from discarding info.
[edit]
The dogma of a single elim is that if you are eliminated you are out. The dogma of the double elim is that you need to be eliminated twice.
That going through the loser brackets to make it to the finals is harder and takes more games is not accounted for. The reason there are two finals is that the person from the winner bracket has to be eliminated twice to too.
Also, I think that the simulations show that you can expect the better player to be the one in the winner bracket and not the one in the loser bracket. Someone has to come out of the loser bracket, no matter how much harder you make it. And that person is going to have lost to someone in the winner bracket.
Also, I don't understand the argument "If you want accuracy, why not use round robin so let's use normal double elim". If you don't care about that why use double elim in the first place. Double elim is a compromise. Extended series is a minor fix. It doesn't take many more games and it adds a bit of accuracy as well as avoiding the strange case where you win more games but are deemed to be the inferior player.
|
On November 12 2010 16:05 zulu_nation8 wrote:Show nested quote +On November 12 2010 15:56 Adeeler wrote: Double Extended is definitely more accurate a measure then plain double and is efficient.
Arguing that each round is isolated and shouldn't cause an extended series goes against the whole principal of an isolated tournament where its the battle to have the best winning streak. You can't have it both ways.
Either you have only single elimination or double extended. Plain double is never valid in any circumstance. How this isn't understood by tournament veterans is odd.
Over multiple single eliminations the players rank will become more accurate but the double & double extended speed up the process.
The Idra arguement of his zvt is better the anothers persons matchup is completely irrelevent. Race doesn't matter as you either play to win in a tournament setting beating everyone or you want a league where you can lose but overall your average skill level will be shown.
Round robins are the most accurate most fair but no one has that kind of time. In the context of Almeisan's example, you can look at it as G5 obtaining an advantage for having gone on a win streak in the loser's bracket before meeting Idra, where the advantage is for their series to start out 0-0. In the grand final the winner's bracket winner starts out with an advantage because he went undefeated.
If you stayed in the Winners by knocking someone down beating them you proved you are better in a single elim fashion, the double isn't about giving losers a second chance but placing final standings more accurately.
So the winner between players that have already met should always have the advantage; otherwise you are looking to only have a single elim.
The very late stages should give much less advantage to maintain entertainment value.Semi' maybe quarters onwards.
|
On November 12 2010 15:51 nzb wrote:Show nested quote + Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
You are, obviously, right in some sense. But the purpose of the study is to find the "big picture" statistical behavior, and to capture the effects that influence this most heavily. While I'm sure that my model is missing things, objections need to have systematic effect -- that is, in the long run they favor the better players or worse ones, or the winner of the winners' bracket game, etc.. Otherwise, you would expect that they would balance out after enough simulation (and I ran it a million times). Effects that just increase the randomness will change the results somewhat, but they probably will not change the trends, which is what we care about anyway.
My concern is that because skill is so hard to quantify and subject to so many variables, that the extended series format would actually be more detrimental to two players close in skill level than it helps a better player to advance. I'm not sure how valid this concern is, but say for example Bisu is a 3.0 and Jaedong is a 2.9, in your model Bisu will forever be the better player no matter what. But in practice, if Bisu won the first series 2-1, it's entirely possible when facing elimination JD will be pull 0.1 points ahead in skill, so that over the long run JD will perform better than Bisu when facing elimination but not in regular competition. However because of the enormous advantage provided by the extended series format, Bisu will advance an unfair number of times despite being the worse player. In this case I see no other possible format than to have both players start out 0-0.
|
On November 12 2010 16:15 Adeeler wrote:Show nested quote +On November 12 2010 16:05 zulu_nation8 wrote:On November 12 2010 15:56 Adeeler wrote: Double Extended is definitely more accurate a measure then plain double and is efficient.
Arguing that each round is isolated and shouldn't cause an extended series goes against the whole principal of an isolated tournament where its the battle to have the best winning streak. You can't have it both ways.
Either you have only single elimination or double extended. Plain double is never valid in any circumstance. How this isn't understood by tournament veterans is odd.
Over multiple single eliminations the players rank will become more accurate but the double & double extended speed up the process.
The Idra arguement of his zvt is better the anothers persons matchup is completely irrelevent. Race doesn't matter as you either play to win in a tournament setting beating everyone or you want a league where you can lose but overall your average skill level will be shown.
Round robins are the most accurate most fair but no one has that kind of time. In the context of Almeisan's example, you can look at it as G5 obtaining an advantage for having gone on a win streak in the loser's bracket before meeting Idra, where the advantage is for their series to start out 0-0. In the grand final the winner's bracket winner starts out with an advantage because he went undefeated. If you stayed in the Winners by knocking someone down beating them you proved you are better in a single elim fashion, the double isn't about giving losers a second chance but placing final standings more accurately. So the winner between players that have already met should always have the advantage; otherwise you are looking to only have a single elim.
Fair point
On November 12 2010 16:11 Almeisan wrote: Zulu, I added a bit later about how normal double elim ignores available information. Without that part it is too much an argument about why it's better rather than about why it performs worse in simulations.
There is also the problem that G5 got only eliminated once by Idra. That is a flaw, imo. But it's a flaw of double elim in general and the same flaw single elim has. When the two player meet again in the loser bracket one of them is going to be eliminated out of the tournament. There's no way around that. And in some cases it's just impossible to not have the same match in the loser bracket you had in the winner bracket.
It's possible Idra meets Flash early on and that G5 cheesed vs Flashes 12 CC and wins 2-0. Then Idra loses to Flash in an extended series in the loser bracket and is eliminated technically only once and by Flash. That's a flaw that is in both systems that you can only fix by making it a round robin. But it's different from discarding info.
You're right, I see why the winner of the first series deserves an advantage. I still feel like a 2-0 lead is too much though for the reasons I mentioned in the earlier post.
|
Playing more games always gives you more accuracy. Never less. More info is more info, not less. This doesn't become false the harder skill becomes to quantify.
And if skill is so hard to quantify and the tournament doesn't aim to have the best player win then why not play a tournament, never mind the structure, and then at the end just randomly draw a lot to determine the 'winner'? I mean, where do you draw the line? You wouldn't want to call the first player out the winner, would you? You really do want to know who is the best in that specific tournament.
|
On November 12 2010 16:16 zulu_nation8 wrote:Show nested quote +On November 12 2010 15:51 nzb wrote: Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
You are, obviously, right in some sense. But the purpose of the study is to find the "big picture" statistical behavior, and to capture the effects that influence this most heavily. While I'm sure that my model is missing things, objections need to have systematic effect -- that is, in the long run they favor the better players or worse ones, or the winner of the winners' bracket game, etc.. Otherwise, you would expect that they would balance out after enough simulation (and I ran it a million times). Effects that just increase the randomness will change the results somewhat, but they probably will not change the trends, which is what we care about anyway. My concern is that because skill is so hard to quantify and subject to so many variables, that the extended series format would actually be more detrimental to two players close in skill level than it helps a better player to advance. I'm not sure how valid this concern is, but say for example Bisu is a 3.0 and Jaedong is a 2.9, in your model Bisu will forever be the better player no matter what. But in practice, if Bisu won the first series 2-1, it's entirely possible when facing elimination JD will be pull 0.1 points ahead in skill, so that over the long run JD will perform better than Bisu when facing elimination but not in regular competition. However because of the enormous advantage provided by the extended series format, Bisu will advance an unfair number of times despite being the worse player. In this case I see no other possible format than to have both players start out 0-0.
You can't count in decimals when your base counting measure is 1 as a game is either won(1) or lost (0) in terms of rounds.
Your previous skill in touraments overall (Jaedong 2.9) can never effect your games in you next game, otherwise lottery balls that fell one week would physically effect the next weeks balls. The chance is still between winning and losing there isn't possible to be a partial 0.9 win at the end only full win or loss.
|
On November 12 2010 16:23 Almeisan wrote: Playing more games always gives you more accuracy. Never less. More info is more info, not less. This doesn't become false the harder skill becomes to quantify.
And if skill is so hard to quantify and the tournament doesn't aim to have the best player win then why not play a tournament, never mind the structure, and then at the end just randomly draw a lot to determine the 'winner'? I mean, where do you draw the line? You wouldn't want to call the first player out the winner, would you? You really do want to know who is the best in that specific tournament.
Skill is hard to quantify thus a format should try to do at little of it as possible. I understand the advantage provided by extended series is necessary, but trying to determine how much advantage should be given requires some kind of measurement of skill.
|
Why don't they just do a bo5 with nobody at an advantage? Best sides of both spectrum.
There. Problem solved.
|
On November 12 2010 16:28 Adeeler wrote:Show nested quote +On November 12 2010 16:16 zulu_nation8 wrote:On November 12 2010 15:51 nzb wrote: Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
You are, obviously, right in some sense. But the purpose of the study is to find the "big picture" statistical behavior, and to capture the effects that influence this most heavily. While I'm sure that my model is missing things, objections need to have systematic effect -- that is, in the long run they favor the better players or worse ones, or the winner of the winners' bracket game, etc.. Otherwise, you would expect that they would balance out after enough simulation (and I ran it a million times). Effects that just increase the randomness will change the results somewhat, but they probably will not change the trends, which is what we care about anyway. My concern is that because skill is so hard to quantify and subject to so many variables, that the extended series format would actually be more detrimental to two players close in skill level than it helps a better player to advance. I'm not sure how valid this concern is, but say for example Bisu is a 3.0 and Jaedong is a 2.9, in your model Bisu will forever be the better player no matter what. But in practice, if Bisu won the first series 2-1, it's entirely possible when facing elimination JD will be pull 0.1 points ahead in skill, so that over the long run JD will perform better than Bisu when facing elimination but not in regular competition. However because of the enormous advantage provided by the extended series format, Bisu will advance an unfair number of times despite being the worse player. In this case I see no other possible format than to have both players start out 0-0. You can't count in decimals when your base counting measure is 1 as a game is either won(1) or lost (0) in terms of rounds. Your previous skill in touraments overall (Jaedong 2.9) can never effect your games in you next game, otherwise lottery balls that fell one week would physically effect the next weeks balls. The chance is still between winning and losing there isn't possible to be a partial 0.9 win at the end only full win or loss.
I just made up random numbers but why can't it be probability as in JD wins 55% of all single elimination games vs Bisu, Bisu 45%.
|
On November 12 2010 16:31 ghostsquall wrote: Why don't they just do a bo5 with nobody at an advantage? Best sides of both spectrum.
There. Problem solved.
You still give the loser an undeserved advantage of resettling their losses so you suggestion changes nothing.
|
On November 12 2010 16:31 ghostsquall wrote: Why don't they just do a bo5 with nobody at an advantage? Best sides of both spectrum.
There. Problem solved.
Can people stop posting "why not just do another BoX?" It really shows you haven't read the thread..
EDIT: doing another BoX reproduces all of the problems inherent in non-extended series double elim i.e. the player with the worse record advancing, or there is a tie in the net record making the order the series were played in more important than the individual results
|
Your so called statistical analysis is filled with biased side notes and totally neglects a number of the the real reasons behind using the extended series. Notably that it provides a better ranking for all players not simply better chances of just the best player winning, it prevents things like the 2nd best players going out in the first round. Which your analysis doesn't take into account.
Your "scope" contains "questions" that are not questions, not even rhetorical, but statements that haven't been proven within your analysis or even supported by other statements that are proven.
Your math is solid but it's isolated and applies or has been applied to only certain circumstances ranges without really taking into account the varying factors that have to be considered. Such as were does seeding fit into this? It is used and without that being factored into any math it is essentially worthless.
That's all mostly negative criticism but you don't need anyone to tell you what you got right, you already know that.
|
On November 12 2010 16:33 zulu_nation8 wrote:Show nested quote +On November 12 2010 16:28 Adeeler wrote:On November 12 2010 16:16 zulu_nation8 wrote:On November 12 2010 15:51 nzb wrote: Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
You are, obviously, right in some sense. But the purpose of the study is to find the "big picture" statistical behavior, and to capture the effects that influence this most heavily. While I'm sure that my model is missing things, objections need to have systematic effect -- that is, in the long run they favor the better players or worse ones, or the winner of the winners' bracket game, etc.. Otherwise, you would expect that they would balance out after enough simulation (and I ran it a million times). Effects that just increase the randomness will change the results somewhat, but they probably will not change the trends, which is what we care about anyway. My concern is that because skill is so hard to quantify and subject to so many variables, that the extended series format would actually be more detrimental to two players close in skill level than it helps a better player to advance. I'm not sure how valid this concern is, but say for example Bisu is a 3.0 and Jaedong is a 2.9, in your model Bisu will forever be the better player no matter what. But in practice, if Bisu won the first series 2-1, it's entirely possible when facing elimination JD will be pull 0.1 points ahead in skill, so that over the long run JD will perform better than Bisu when facing elimination but not in regular competition. However because of the enormous advantage provided by the extended series format, Bisu will advance an unfair number of times despite being the worse player. In this case I see no other possible format than to have both players start out 0-0. You can't count in decimals when your base counting measure is 1 as a game is either won(1) or lost (0) in terms of rounds. Your previous skill in touraments overall (Jaedong 2.9) can never effect your games in you next game, otherwise lottery balls that fell one week would physically effect the next weeks balls. The chance is still between winning and losing there isn't possible to be a partial 0.9 win at the end only full win or loss. I just made up random numbers but why can't it be probability as in JD wins 55% of all single elimination games vs Bisu, Bisu 45%.
Because a game can only result in a win or loss. I.e. you can't cause a win by only killing 55% of the enemies base only by killing 100% a full win.
Draws are counted as win for both or neither.
|
On November 12 2010 16:39 Adeeler wrote:Show nested quote +On November 12 2010 16:33 zulu_nation8 wrote:On November 12 2010 16:28 Adeeler wrote:On November 12 2010 16:16 zulu_nation8 wrote:On November 12 2010 15:51 nzb wrote: Atm I'm not sure which format I agree with more. But in the study done in the OP, the player model has way too few variables. Maybe Idra performs worse than G5 when facing elimination, or maybe G5 performs better when warmed up after having played a few BO series. In any case if two players are close enough in skill level that they can go 1-1 in two BO series, I can't imagine the regular double elim system to be so unfair that the advantage given by extended series is required to correct an "injustice" within the format. Having a 2-0 lead in a bo7 is too much of an advantage imo.
You are, obviously, right in some sense. But the purpose of the study is to find the "big picture" statistical behavior, and to capture the effects that influence this most heavily. While I'm sure that my model is missing things, objections need to have systematic effect -- that is, in the long run they favor the better players or worse ones, or the winner of the winners' bracket game, etc.. Otherwise, you would expect that they would balance out after enough simulation (and I ran it a million times). Effects that just increase the randomness will change the results somewhat, but they probably will not change the trends, which is what we care about anyway. My concern is that because skill is so hard to quantify and subject to so many variables, that the extended series format would actually be more detrimental to two players close in skill level than it helps a better player to advance. I'm not sure how valid this concern is, but say for example Bisu is a 3.0 and Jaedong is a 2.9, in your model Bisu will forever be the better player no matter what. But in practice, if Bisu won the first series 2-1, it's entirely possible when facing elimination JD will be pull 0.1 points ahead in skill, so that over the long run JD will perform better than Bisu when facing elimination but not in regular competition. However because of the enormous advantage provided by the extended series format, Bisu will advance an unfair number of times despite being the worse player. In this case I see no other possible format than to have both players start out 0-0. You can't count in decimals when your base counting measure is 1 as a game is either won(1) or lost (0) in terms of rounds. Your previous skill in touraments overall (Jaedong 2.9) can never effect your games in you next game, otherwise lottery balls that fell one week would physically effect the next weeks balls. The chance is still between winning and losing there isn't possible to be a partial 0.9 win at the end only full win or loss. I just made up random numbers but why can't it be probability as in JD wins 55% of all single elimination games vs Bisu, Bisu 45%. Because a game can only result in a win or loss. I.e. you can't cause a win by only killing 55% of the enemies base only by killing 100% a full win. Draws are counted as win for both or neither. Are you trolling or are you just honestly clueless about what he is talking about?
|
On November 12 2010 16:30 zulu_nation8 wrote: Skill is hard to quantify thus a format should try to do at little of it as possible.
The harder skill is to qualify the harder you ought to try. Otherwise the tournament becomes meaningless. You really want a game and some tournament setup where skilled players don't win more than unskilled players? Do you really want to know who is lucky on that given day rather than something else?
I understand the advantage provided by extended series is necessary, but trying to determine how much advantage should be given requires some kind of measurement of skill.
Where is an advantage given? You either count all games or you count only those in the loser bracket. Counting all games doesn't make the first games count for more. It's just that with a bo3 in the loser bracket while ignoring the previous games you get more deviation from the expected results. It favours the less skilled player over the more skilled one because if there were no deviation we would have 100% chance for the most skilled player to win.
|
On November 12 2010 16:38 Kazang wrote: Your so called statistical analysis is filled with biased side notes and totally neglects a number of the the real reasons behind using the extended series. Notably that it provides a better ranking for all players not simply better chances of just the best player winning, it prevents things like the 2nd best players going out in the first round. Which your analysis doesn't take into account.
Your "scope" contains "questions" that are not questions, not even rhetorical, but statements that haven't been proven within your analysis or even supported by other statements that are proven.
Your math is solid but it's isolated and applies or has been applied to only certain circumstances ranges without really taking into account the varying factors that have to be considered. Such as were does seeding fit into this? It is used and without that being factored into any math it is essentially worthless.
That's all mostly negative criticism but you don't need anyone to tell you what you got right, you already know that.
Wow people still posting to point out flaws in model? Read the thread before posting, YES there are flaws in the model, we can't have a perfect model. Again, simplest model is best model to make decisions with, because we can't just add in these non-quantifiable factors in the model and expect it to work. Sounds like you just wanted to pint out flaws.
Where is the bias in his analysis or scope? Read the scope, which specifically says its NOT concerned with the questions listed.
Can we actually talk about his conclusion, that is, this simple model tells us that while extended series contributes a little bit to making the tournament outcome "fairer", in reality, it does not have big enough effect to be absolutely certain, considering all other factors.
So whether they should continue with what they are doing, or just do regular double elimination since statistically its not SO different, or come up with a whole new tournament method.
|
On November 12 2010 17:30 scion wrote:Show nested quote +On November 12 2010 16:38 Kazang wrote: Your so called statistical analysis is filled with biased side notes and totally neglects a number of the the real reasons behind using the extended series. Notably that it provides a better ranking for all players not simply better chances of just the best player winning, it prevents things like the 2nd best players going out in the first round. Which your analysis doesn't take into account.
Your "scope" contains "questions" that are not questions, not even rhetorical, but statements that haven't been proven within your analysis or even supported by other statements that are proven.
Your math is solid but it's isolated and applies or has been applied to only certain circumstances ranges without really taking into account the varying factors that have to be considered. Such as were does seeding fit into this? It is used and without that being factored into any math it is essentially worthless.
That's all mostly negative criticism but you don't need anyone to tell you what you got right, you already know that. Wow people still posting to point out flaws in model? Read the thread before posting, YES there are flaws in the model, we can't have a perfect model. Again, simplest model is best model to make decisions with, because we can't just add in these non-quantifiable factors in the model and expect it to work. Sounds like you just wanted to pint out flaws. Where is the bias in his analysis or scope? Read the scope, which specifically says its NOT concerned with the questions listed. Can we actually talk about his conclusion, that is, this simple model tells us that while extended series contributes a little bit to making the tournament outcome "fairer", in reality, it does not have big enough effect to be absolutely certain, considering all other factors. So whether they should continue with what they are doing, or just do regular double elimination since statistically its not SO different, or come up with a whole new tournament method.
Yeah no wonder people are pointing out flaws in the model...... It is kind of important if you are going to use it as evidence or basis for a decision.
You don't need a mathematical model to point out the logical benefits of a extended series. So why apply a flawed mathematical model at all if you are going to then factor in other external factors? If the model is flawed, which it most certainly is, how can you logically use it as an argument for anything?
The scale of the model is also quite ridiculous, round robin for a 128 man tournament like MLG Dallas is 8128 games, how the hell can you compare that to single elimination of 126 games in the same graph? The scale is insane, the accuracy difference is far bigger than than those little jpgs show since the number of games is the biggest factor at work in a live tournament. The comparison is more misleading than anything to someone who hasn't already thought about this.
I also read the scope clearly, I'm pointing out a mistake in the writing; it says "here are questions" then lists statements. Of course I want to point out the flaws, duh..... You cannot base an argument or discussion on a flawed premise, if you do it's just pointless. As it is the model shows nothing of value, other than the fact this the extended series is better even when not taking into account the full range of benefits the extended series offers. So then what is the point of it?
|
I'd be interested to know how much of the extra accuracy in the extended series comes from the extra games played, and how much from it specifically being an extended series. We already know that playing more games will result in greater accuracy in results.
If you compare it to a normal double elimination where people meeting each other again play a Bo5 instead, would it eliminate the accuracy difference between normal and extended double elimination? That system would have a small chance of having an extra game compared to the extended series, so if the effect was simply from extra games, it should be more accurate than the extended series format.
Sorry if you answered this already, and I missed it somewhere.
|
Surely the important thing is that extended series SUCK for the viewer, at the end of the day thats all that should matter - making an entertaining tournament.
|
Wow, a lot of haters in this topic that don't understand the point of modelling. Models are not meant to be a 100% perfect representation of reality but are used to draw out insights into the behaviour of a system. Every model has limiting assumptions and these should be should considered when interpreting the results and drawing conclusions, just as nzb has done. Also, just because some assumptions are limiting does not mean they will necessarily have a large effect on the results - this should be reasoned through. Anyways good job nzb.
On November 12 2010 13:07 nzb wrote: I didn't talk about this in the main post, because its just my opinion and wasn't backed by any numbers, but I think a good format would be:
- Play swiss-style tournament to determine the top 8-16 players. - Play single elimination to get champion.
This would be a very reliable way to determine the top 8 or 16, and then would switch into overdrive to determine the champ. It would be very exciting, similar to how the NCAA does March Madness. I would love it if we could get someone to do some special event using this format just to try it out.
For interest, the 'Magic: the Gathering' Pro Tour has used that format for the last ~10 years: http://www.wizards.com/Magic/Magazine/Events.aspx The format is exciting with the single elimination Top 8, however it can be ruthless to some competitors who dominate the swiss rounds and then get knocked out early in the single elim. One of the issues is that the later stages of the swiss rounds can be pretty boring, as most matches are drawn between the top players who have already secured a Top 8 spot, and luck-based, when a number of players can draw into the Top 8 but one gets matched against a lower player and has to play it out (or against a higher player who refuses to draw to help a friend advance in the rankings) while the others draw amongst themselves.
|
This really isn't a question of mathematics imo it's a question of preference. I agree with both camps but I'd say extended series aren't what I prefer. You can't be statistical when it comes to this I'd say the best thing would be to have a poll with the pro gamers to determine how they feel about it and decide that way.
|
On November 12 2010 16:03 Azzur wrote: In my opinion, the purpose of a tournament is not to "find the best player". It is to provide entertainment for both players and fans. If "finding the best player" were the only criteria, then we would just use a round-robin format containing many games.
Thus, since the extended series is deemed as confusing and is disliked by many fans, then it should not be used. I'd argue that making sure good players get into the quarter/semi/finals is pretty important in providing entertainment for both fans and players. If a tournament was extremely random and you end up with two extremely mediocre players in the finals or a lopsided matchup, that wouldn't be very fun to watch would it? As the OP pointed out, the extended series doesn't do too much really either way, so it's really a judgment call.
|
I just love teamliquid! :D
This is noteworthy for sure
|
On November 12 2010 20:18 teamsolid wrote:Show nested quote +On November 12 2010 16:03 Azzur wrote: In my opinion, the purpose of a tournament is not to "find the best player". It is to provide entertainment for both players and fans. If "finding the best player" were the only criteria, then we would just use a round-robin format containing many games.
Thus, since the extended series is deemed as confusing and is disliked by many fans, then it should not be used. I'd argue that making sure good players get into the quarter/semi/finals is pretty important in providing entertainment for both fans and players. If a tournament was extremely random and you end up with two extremely mediocre players in the finals or a lopsided matchup, that wouldn't be very fun to watch would it? As the OP pointed out, the extended series doesn't do too much really either way, so it's really a judgment call. Of course, finding good players in the latter stages of the tournament is part of the entertainment itself. That's why single elimination bo1 may be exciting but doesn't fare well on the entertainment criteria.
In my mind, a double elimination bo3 does serve this purpose (finding good players, resulting in entertainment) well. Hence, for me, there's no reason of using the extended series (since it's confusing and disliked).
|
Please delete this double post
|
You know what would sort this issue out.. a league lol. There's a reason most competitions start as a round robin or a league, and then either use that to determine rank or then go into single elimination. We saw with machine getting unlucky 2 tournaments in a row.. which left him unseeded. In a league format this would not have been an issue, but in a knockout tournament even though he is better than other players who advanced further his rank and therefore his seeding in the next tournament didn't show this.
Its unreasonable to run a full league system such as is employed in football. But the most successful spectator sports all involve tournaments based upon round robin into single elimination, NFL, World Cup Football, European Champions League. Admittedly this may be easier if MLG were to run a team comp but i believe If they ran league systems on the friday and saturday. 8 people per group.. round robin. so everyone plays 7 games over 2 days. top 2 advance from each group decided by games won if there's a draw.. (same as world cup) into a single elimination.. either pre decided like world cup, seeded b position like american sports ie.. nba's playoff system. or staright up draw out the hat.
This is a world recognised format. easy to follow, allows a lot of exciting matchups and upsets, and still should allow the best players to at least get through to the single elims. None of this looking at the brackets completely lost for 10 minutes figuring out who's playing who.
I mean to add.. best of 3's in round robin.. best of 5's for KO stages until final BO7. Which makes it at least similar to how GSL run too, making the standard SC2 format easy.
|
On November 12 2010 18:06 Kazang wrote:Show nested quote +On November 12 2010 17:30 scion wrote:On November 12 2010 16:38 Kazang wrote: Your so called statistical analysis is filled with biased side notes and totally neglects a number of the the real reasons behind using the extended series. Notably that it provides a better ranking for all players not simply better chances of just the best player winning, it prevents things like the 2nd best players going out in the first round. Which your analysis doesn't take into account.
Your "scope" contains "questions" that are not questions, not even rhetorical, but statements that haven't been proven within your analysis or even supported by other statements that are proven.
Your math is solid but it's isolated and applies or has been applied to only certain circumstances ranges without really taking into account the varying factors that have to be considered. Such as were does seeding fit into this? It is used and without that being factored into any math it is essentially worthless.
That's all mostly negative criticism but you don't need anyone to tell you what you got right, you already know that. Wow people still posting to point out flaws in model? Read the thread before posting, YES there are flaws in the model, we can't have a perfect model. Again, simplest model is best model to make decisions with, because we can't just add in these non-quantifiable factors in the model and expect it to work. Sounds like you just wanted to pint out flaws. Where is the bias in his analysis or scope? Read the scope, which specifically says its NOT concerned with the questions listed. Can we actually talk about his conclusion, that is, this simple model tells us that while extended series contributes a little bit to making the tournament outcome "fairer", in reality, it does not have big enough effect to be absolutely certain, considering all other factors. So whether they should continue with what they are doing, or just do regular double elimination since statistically its not SO different, or come up with a whole new tournament method. Yeah no wonder people are pointing out flaws in the model...... It is kind of important if you are going to use it as evidence or basis for a decision. You don't need a mathematical model to point out the logical benefits of a extended series. So why apply a flawed mathematical model at all if you are going to then factor in other external factors? If the model is flawed, which it most certainly is, how can you logically use it as an argument for anything? The scale of the model is also quite ridiculous, round robin for a 128 man tournament like MLG Dallas is 8128 games, how the hell can you compare that to single elimination of 126 games in the same graph? The scale is insane, the accuracy difference is far bigger than than those little jpgs show since the number of games is the biggest factor at work in a live tournament. The comparison is more misleading than anything to someone who hasn't already thought about this. I also read the scope clearly, I'm pointing out a mistake in the writing; it says "here are questions" then lists statements. Of course I want to point out the flaws, duh..... You cannot base an argument or discussion on a flawed premise, if you do it's just pointless. As it is the model shows nothing of value, other than the fact this the extended series is better even when not taking into account the full range of benefits the extended series offers. So then what is the point of it?
NOT concerned with. No you didn't read it.
This post is an in-depth analysis of the statistical performance of different tournament formats. It is not concerned with many other important questions, for example:
He is NOT using this model for an argument. It whether you think flawed or not, does a good job of simplifying how a league works. And he's not using math to show benefit of extended series, he's showing how much benefit it provides.
Also he put round robin and double elimination in the same graph so that we can just compare between the two. Hardly a suggestion that MLG should adopt a round robin system. Sigh, if you don't get stats, don't post about it.
Wow, a lot of haters in this topic that don't understand the point of modelling. Models are not meant to be a 100% perfect representation of reality but are used to draw out insights into the behaviour of a system. Every model has limiting assumptions and these should be should considered when interpreting the results and drawing conclusions, just as nzb has done. Also, just because some assumptions are limiting does not mean they will necessarily have a large effect on the results - this should be reasoned through. Anyways good job nzb.
this guy gets it.
Also, if people are gona point out flaws, it has to be an effect that affects one side. As in, if you can't prove that in Extended Bo7, Winner's bracket guy has a clear psychological advantage over loser guy, its not going to effect the stat itself in any meaningful way.
I feel like we've already concluded this thread in page 3 and now people are just adding meaningless objections to the model.
|
Nice work on the analysis. It proves what most probably expected: that double elimination with extended series is just a minor tweak to the normal double elimination system and doesn't dramatically change the results. I feel that the main point that people want to discuss was left unaddressed - there was no comparison between the extended series and a normal bo3 in letting the more skilled player advance. Anyway, people seem to emphasize these hypothetical "skill scores" that they assign players in their heads a lot. Tournaments will always be about whoever wins the matches on that day, and the best we can do is give them fair chances to win the tournament if they just win games. If you want to know who is the best player, no system will allow you to determine it from just 1 tournament. What you are looking for are point/Elo systems that take the results of multiple tournaments.
I've probably followed nearly 100 double elimination tournaments in various games since the year 2000 and this is the first time I've seen anybody complain about unfairness in a standard bracket and start counting single maps from previous series to try to point out some kind of injustice. Before MLG I never saw anybody use a system that would count the scores of the previous match in a later match. I see it a bit like starting a soccer match in the finals from the score you finished a group stage match at, or (from my post in the previous thread):
It's OSL, and Flash and Jaedong are in the same group in the group stage. Stork is in another group, and all 3 advance with Flash beating Jaedong 2-0. If Flash and Jaedong meet in the finals, should Flash start 2-0 up in a bo7? If he meets Stork, should it start from 0-0? What you are arguing is that Jaedong needs to prove he's the better player by resuming the previous series and winning 4 games out of 5. Also you could imagine a double-elimination bracket played as bo1's to eliminate this argument of losing 0:2 then winning 2:1 to eliminate someone. In this case a score of 1:1 is the only one that eliminates the player that won the first match. Looking at their other results, the winner of the first match lost another game while the loser didn't, so that breaks the tie. The reason we choose to go to bo3's from this system is to allow players to overcome an unfavorable map or build order loss, not to write map wins down and sometimes use them later depending on who the players play against.
|
On November 12 2010 13:07 nzb wrote: I didn't talk about this in the main post, because its just my opinion and wasn't backed by any numbers, but I think a good format would be:
- Play swiss-style tournament to determine the top 8-16 players. - Play single elimination to get champion.
This would be a very reliable way to determine the top 8 or 16, and then would switch into overdrive to determine the champ. It would be very exciting, similar to how the NCAA does March Madness. I would love it if we could get someone to do some special event using this format just to try it out.
Lets explore this idea, because I can see there being problems with using a swiss pairing system for Starcraft.
+ Show Spoiler + For those who are unfamiliar with a swiss pairing system, it works by having a scheduled number of rounds, with every player playing every round.
For example, we have IdrA, Nony, Ret and Nazgul (names off the top of my head)
First round IdrA plays Nony, Ret plays Nazgul. IdrA and Ret win. Second round IdrA plays Ret because they won, Nony plays Nazgul because they lost. IdrA and Nony wins. The third round Ret would play Nony, because they each have 1 win, and so on.
Over a predetermined number of rounds, this system will sort the players from best to worst theoretically, and requires fewer games than a round robin system.
The first and easiest to solve problem would be doing the pairings for the first round. You could do something like use peoples sc2ranks scores, with the caviat that teammates won't play each other. Pretty simple.
After that, lets say we have a 64 player tournament. Swiss pairing requires 6 rounds to sort 64 players, or 5 rounds if you use accelerated pairing.
You'd then have to figure out a points system for wins and losses. Probably the easiest way is to give 2 points for a win, 1 point for a draw (however rare) and 0 points for a loss.
The problem then becomes that the point numbers are too low to realistically sort players. Assuming each pairing plays 3 games, you will after the first round have 32 winners who have either 4 or 6 points. The system is supposed to pair those players together, but it can't because there will be too many tie scores.
Sorting ties by countback would be impossible for round 2, because everyone with 6 points opponent would have 0 points, and everyone with 4 points opponent would have 2 points. You can't sort the ties by their sc2ranks stats because those were just used to sort them for the first round, and there won't be enough tie games to balance the system because ties in Starcraft almost never happen at high level play.
You could increase the number of games played to increase variance in points totals, but this puts you closer and closer into the territory of having far too many games played to organize a tournament. As is, a single elimination BO3 series tournament requires between 96 and 144 games to get from the round of 64 to round of 16 like you want, but a swiss pairing tournament of this size requires 480 games using accelerated pairing, or 576 without. Increasing the number of rounds to say 5 to mitigate the issue of ties in round 2 jumps you to 800 games needed to sort your players.
Another possibility would be to have people compete in the swiss pairing bracket by team. In a 64 player tournament, if each team were allowed to send 8 players, you could have 8 teams in the tournament, with teams playing other teams, and the score could be a combined team score. Then the top 2 teams could compete single elimination style in the final brackets. This system would definitely work with swiss pairing, but only teams that could send the required number of people would be able to participate.
The other possibility would be to create some system that can give people more or fewer points for a win, and losing players some points for losing depending on how well they did. This would be nebulous territory, because there'd be a lot of arguing over it within the community, and any system imposed would change the way the game is played. Count points by number of expansions? Well that favors Zerg. Count points by how long the game goes on, assuming that a longer game is a closer game? That disadvantages (or possibly advantages depending on the round in the tournament) early rush strategies.
In short, I think a swiss pairing starcraft tournament would be very difficult to do for single players. For teams I think it would work quite well.
|
Swiss rounds kind of fails in SC2 because you need to wait for the longest match in a round to finish before you know the matchups for the next round, and you can't exactly set a time limit for a match. You could do it in a long-term tournament like the GSL but then you cut into the practice time of the players since you'll announce matchups later.
|
On November 13 2010 01:47 Teddyman wrote: Swiss rounds kind of fails in SC2 because you need to wait for the longest match in a round to finish before you know the matchups for the next round, and you can't exactly set a time limit for a match. You could do it in a long-term tournament like the GSL but then you cut into the practice time of the players since you'll announce matchups later.
I <3 Teddyman, he seems to always have my back.
This is an additional issue with Swiss Pairing, it would be extremely difficult to play the entire thing out in one day. It would have to be a multi day tournament.
You could speed it up by having all 32 pairs play at the same time, but then you'd only be able to cast 3% of the games, which would be unacceptable to the average viewer. Unacceptable to me at least.
|
In short, I think a swiss pairing starcraft tournament would be very difficult to do for single players. For teams I think it would work quite well.
Another thing to consider is that for a swiss-style tournament, you could drop the BO3's and just accelerate the pairing between players based on individual games (not series).
This would let you ahve 3x the games to sort people with, and rely on the swiss format to correct the additional randomness of not using BO3's. I think it might be able to work.
|
On November 13 2010 01:52 Ketara wrote:Show nested quote +On November 13 2010 01:47 Teddyman wrote: Swiss rounds kind of fails in SC2 because you need to wait for the longest match in a round to finish before you know the matchups for the next round, and you can't exactly set a time limit for a match. You could do it in a long-term tournament like the GSL but then you cut into the practice time of the players since you'll announce matchups later. I <3 Teddyman, he seems to always have my back. This is an additional issue with Swiss Pairing, it would be extremely difficult to play the entire thing out in one day. It would have to be a multi day tournament. You could speed it up by having all 32 pairs play at the same time, but then you'd only be able to cast 3% of the games, which would be unacceptable to the average viewer. Unacceptable to me at least.
Well, I'd agree with you except that this seems to be what MLG does anyway.
|
That would not solve the problem of having 32 players with the same score in round 2.
I've used swiss pairing to organize tournaments for different games before, and I do not see how you'd be able to sort that.
Edit: I gotta go but we should continue this discussion!
|
On November 13 2010 01:58 Ketara wrote: That would not solve the problem of having 32 players with the same score in round 2.
I've used swiss pairing to organize tournaments for different games before, and I do not see how you'd be able to sort that.
Edit: I gotta go but we should continue this discussion!
I imagine you would just use the initial seeds to sort between people with equivalent scores, and having 3x the number of rounds would quickly break people into categories.
I know that Google used this exact format (swiss->single elim) for their internal tournament at the end of the beta.
|
On November 12 2010 13:24 nzb wrote: I would caution, however, what would people's opinions be if instead Liquid`Tyler had beat Painuser 2-0 in the winners' bracket, and then lost 1-2 to him in the losers'. He would have gone 3-2 against him in the tournament, but been knocked out. Where would TL.net stand on the issue then? It doesn't matter if the effective result is 3:2 or 4:0, as individual games are not important.
At the start of the tournament both players have a record of 0:0. After the first match, first round player A can be considered the better player, because he's ahead with 1:0. But when they meet again later both players have played n matches, have n-1 wins and 1 loss. They both have beat the same amount of players, have won the same amount of matches and technically no one should be favored, because they're back to equal. Every other matchup in the tournament has exactly these initial conditions n-1:1 (or n:0 in the winners bracket).
If one of them wins he's the better player no matter what, because his record will remain n-1:1 while the other will be eliminated with n-2:2. How he lost at that point or would have performed in a different tournament mode isn't important. But, but..just no.
Take a look at other players they can 2:0 every game and lose one 2:1, but in the losers bracket the players are treated equal even if his opponent lost one match 0:2 and barely clawed back with 2:1 in every other match. Yet the 'better' player won't get favored through extended series unless they've met before and that makes it complete BS. You either favor every player that is considered better or none at all, everything else isn't fair..period.
The only one who benefits from ES is the lucky player who gets this huge advantage, eventhough the match record clearly shows that he isn't better and he wouldn't be favored if he's unlucky and meets a different player. And because of the way how SC2 works a weaker player can put a better player into serious trouble if he's favored by ES, because he can utilize dirty tactics to force wins and reduce the decision making process.
A tournament needs to be fair to everyone and give them the same conditions to win, but the ES fails to do that and artifically favors a single player for absolutely no reason. By that logic the ES must be enforced on every match where a player has a better individual game win/loss ratio, because he has to be the better player, but that would be really stupid. Because like I said only the match record is a consistent measurement of skill in a DE tournament.
The reason why ES probably works well for halo is, because a game takes 15-30m and contains thousands of individual decisions, where the better team probably makes more good decisions and takes the win. But SC2 doesn't work that way a cheese/all-in reduces it to exactly one decision, you either scout it in time and win or you lose. The ability to force or deny a map pick is more subtle, but all these things heavily favor the winner of the first match without any justification to claim he's the better player.
If they played 2 bo3 in a row, yes it would be dumb, but they don't. They meet again several rounds later in the tournament with exactly the same match record, which resets their individual win/loss back to 0:0, because that's the rule for everyone in the tournament and there can't be exceptions or else it won't be fair to all players.
On November 12 2010 22:17 scion wrote: He is NOT using this model for an argument. It whether you think flawed or not, does a good job of simplifying how a league works. And he's not using math to show benefit of extended series, he's showing how much benefit it provides. If you use math to prove a point, then people need to be allowed to question&critizise the method. You can't just pull numbers out of a flawed simulation and say. "Look, that's the benefit it will have" and ignore everything else, because that isn't more scientific than claiming evolution doesn't exist because the bible says so.^^
So, the simulation either gets adjusted for these issues or the conclusion needs to be changed to something that makes clear that this simulation only works for generic tournaments and can't be applied to SC2, because of these issues.
Or else people will take these numbers as truth and assume that the extended series works fine, which it doesn't.
The only good thing about ES is, that most players are probably not aware of their own skill level and don't recognize that they just were lucky in the first match. And since only mlg has this rule, they probably haven't thought about how powerful a instant 2:0 lead in a bo7 is.
But of course that doesn't makes ES any better, because the claimed minimal increase in result accuracy doesn't make up for the risk of abuse/false-positives..it just isn't worth it and messes up the fairness of the tournament for everyone that wasn't favored in a ES. And if a tournament isn't fair then there is no point to have one at all.
I'd rather have the best player in the world kicked out once in a while instead of frequently getting weak players finishing better than they should. The initial seeding already causes the bracket to be a bit more unfair to players at times, why introduce even more luck into the tournament to make it even worse? A clean bo5 is good if you want higher accuracy between 2 players, but a extended series that resumes the first bo3 is just bad in SC2.
|
On November 13 2010 02:07 Nienordir wrote: But of course that doesn't makes ES any better, because the claimed minimal increase in result accuracy doesn't make up for the risk of abuse/false-positives..it just isn't worth it and messes up the fairness of the tournament for everyone that wasn't favored in a ES. And if a tournament isn't fair then there is no point to have one at all.
I'd rather have the best player in the world kicked out once in a while instead of frequently getting weak players finishing better than they should. The initial seeding already causes the bracket to be a bit more unfair to players at times, why introduce even more luck into the tournament to make it even worse? A clean bo5 is good if you want higher accuracy between 2 players, but a extended series that resumes the first bo3 is just bad in SC2.
The beauty of this post is then, from your perspective, that ES doesn't help much anyway, so we should get rid of it.
|
idra and incontrol were arguing against extended series, but they actually didn't realize that their arguments contradicted each other while they verbally agreed with each other.
idra explained how because if player A screwed up against player B, and player B screws up against player C, why should player A be penalized when playing player B in the losers' bracket? But then incontrol said that each of these Bo3's are isolated events. Any previous result should not affect the current Bo3. When you look at these two arguments, they clearly contradict each other as idra uses events elsewhere in the tournament as an argument, while incontrol states that each Bo3 is isolated.
Sorry, I just wanted to throw that out there lol
|
On November 13 2010 02:20 cHaNg-sTa wrote: idra and incontrol were arguing against extended series, but they actually didn't realize that their arguments contradicted each other while they verbally agreed with each other.
idra explained how because if player A screwed up against player B, and player B screws up against player C, why should player A be penalized when playing player B in the losers' bracket? But then incontrol said that each of these Bo3's are isolated events. Any previous result should not affect the current Bo3. When you look at these two arguments, they clearly contradict each other as idra uses events elsewhere in the tournament as an argument, while incontrol states that each Bo3 is isolated.
Sorry, I just wanted to throw that out there lol
You're an idiot. These statements don't contradict each other at all...
|
On November 13 2010 02:13 nzb wrote:Show nested quote +On November 13 2010 02:07 Nienordir wrote: But of course that doesn't makes ES any better, because the claimed minimal increase in result accuracy doesn't make up for the risk of abuse/false-positives..it just isn't worth it and messes up the fairness of the tournament for everyone that wasn't favored in a ES. And if a tournament isn't fair then there is no point to have one at all.
I'd rather have the best player in the world kicked out once in a while instead of frequently getting weak players finishing better than they should. The initial seeding already causes the bracket to be a bit more unfair to players at times, why introduce even more luck into the tournament to make it even worse? A clean bo5 is good if you want higher accuracy between 2 players, but a extended series that resumes the first bo3 is just bad in SC2. The beauty of this post is then, from your perspective, that ES doesn't help much anyway, so we should get rid of it. Well, yeah it's a fancy way to say it, but I think my post explains the flaws of the extended series if you apply it to a game like SC2. =)
But I don't mind the idea that's behind the extended series. It's in everyones interest to have the best players go far into a tournament. If 2 players meet again, then increase the amount of games played to bo5 or bo7 if the tournament schedule allows it and we'll get a result with higher accuracy.
At the same time I would propose a motion to change the grand finals too and make it a straight up bo5/7, because both players earned their way throught the tournament and the extended series there is even worse than inside the rest of the tournament. There is no reason why the player from the winners bracket should lose two matches, because it's the grand final. The losers bracket ends there and there isn't a losers grand final that the player could drop down to..that just doesn't make sense and you don't want to get the finals messed up by the flaws that ES inherits.^^
|
Fucking excellent OP. I'm sorry for the profanity, but seriously... phenomenal. Wish more people were top-notch analysts, but I'm sure with examples like this some will get better.
|
The extended series makes the finals of the tournament feel alot less epic. To be honest, I dont care whats fair to the players involved because the ONLY reason there is money involved is the viewership. No fans = No esports = no sponsors = no money for tournaments = tournaments would only be based on entrance fees etc.
If they want to make esports bigger, faster, they will understand that you want your finals to feel epic and last more than like 2 games. Its a really big let down to watch people battle it out to get to the end of this big tournament then have one guy playing with a handicap. I guess thats what happens with a double elimination style though... the extended series sucks.
|
On November 13 2010 02:32 Risen wrote:Show nested quote +On November 13 2010 02:20 cHaNg-sTa wrote: idra and incontrol were arguing against extended series, but they actually didn't realize that their arguments contradicted each other while they verbally agreed with each other.
idra explained how because if player A screwed up against player B, and player B screws up against player C, why should player A be penalized when playing player B in the losers' bracket? But then incontrol said that each of these Bo3's are isolated events. Any previous result should not affect the current Bo3. When you look at these two arguments, they clearly contradict each other as idra uses events elsewhere in the tournament as an argument, while incontrol states that each Bo3 is isolated.
Sorry, I just wanted to throw that out there lol You're an idiot. These statements don't contradict each other at all...
They contradict each other because idrA and INControl said they wanted each series to be treated as an isolated event and then brought in the overall tournament performance of the players as an argument. I'm not saying it kills their argument completely, but it's odd that they switched paradigms.
Another seeming contradiction is that they all agreed that the tournament doesn't do a particularly good job of ranking the players and then decided that it was better to let the system do the ranking. You can gauge relative skill between two players better than you can 128, yet the argument against extended series seems to play to the deficiencies of the system.
|
On November 13 2010 02:32 Risen wrote:Show nested quote +On November 13 2010 02:20 cHaNg-sTa wrote: idra and incontrol were arguing against extended series, but they actually didn't realize that their arguments contradicted each other while they verbally agreed with each other.
idra explained how because if player A screwed up against player B, and player B screws up against player C, why should player A be penalized when playing player B in the losers' bracket? But then incontrol said that each of these Bo3's are isolated events. Any previous result should not affect the current Bo3. When you look at these two arguments, they clearly contradict each other as idra uses events elsewhere in the tournament as an argument, while incontrol states that each Bo3 is isolated.
Sorry, I just wanted to throw that out there lol You're an idiot. These statements don't contradict each other at all...
Uh, how so? Idra said that why should one player be penalized over the other because they both lost once in the losers bracket. But then incontrol states that each of the Bo3 are isolated events and should be treated as a single entity. Nothing that happened previously in the tournament should have any effect on the current Bo3. Yet, idra is pulling the example that player B screwed up against someone else in the tournament, so he shouldn't have an extended series with player A because of an event outside of the "isolated Bo3".
So basically incontrol said that the isolated Bo3 is an entirely new event that has absolutely no reflection on any of the previous results in the tournament. But idra is declaring that one shouldn't be penalized over the other because of a previous result in the tournament outside of the matchup between player A and B. Sounds like a contradiction to me.
|
Was just re-listening to all the arguments made in the State of the Game podcast and thought of something that I think is actually a pretty good compromise.
Not sure if this idea has yet been brought up, but what do people think of this?
Instead of an extended series a new series is played but a best of 7(or 5) instead of a new best of 3.
The logic being that Tyler is correct in saying the purpose of double elimination really is to give a better player who slips a chance to get a higher place than he otherwise would have had. I don't really see how you can dispute this. So if two players who already have to face each other it's probably best (read: most fair) to be as accurate as possible in determining who moves on.
It's true that you could still have a case where one person could overall be 4-3 or 5-4 and be the "loser" but would it at least feel better to people to go home after losing a fresh best of 5 or 7?
|
On November 13 2010 02:06 nzb wrote:Show nested quote +On November 13 2010 01:58 Ketara wrote: That would not solve the problem of having 32 players with the same score in round 2.
I've used swiss pairing to organize tournaments for different games before, and I do not see how you'd be able to sort that.
Edit: I gotta go but we should continue this discussion! I imagine you would just use the initial seeds to sort between people with equivalent scores, and having 3x the number of rounds would quickly break people into categories. I know that Google used this exact format (swiss->single elim) for their internal tournament at the end of the beta.
The problem here is that sorting ties when using swiss pairing becomes astronomically more difficult the more ties you have.
Lets say we have 64 players, and we're using their sc2ranks scores to sort ties. The first round is sorted that way, which is fine. The nature of the system is such that the first round has to be somewhat arbitrary.
After the first round, you have 32 players with 1 win and 32 players with 0 wins. The 32 with 1 win haven't played each other yet, so their sc2ranks stats can be used to sort them, and the same for those with 0 points.
However, the problem then occurs in round 3, where we have 16 people with 2 points who haven't played each other, 16 with 0 points who haven't played each other, but 32 with 1 point who may have played the person with the closest rank in the first round. In fact, if the people with the higher ranks are consistently beating the people with the lower ranks, it is very likely that you will have a large portion of these 32 who's closest ranked opponent is the person they played in the first round.
Since by the rules of the system people cannot play each other twice (you could change that rule I suppose!) you are then looking for the second closest ranked person for every single one of your problem matchups. It doesn't take a math whiz to see that at this point, every matchup that is altered in turn alters every other matchup, and it becomes very ungainly to organize it if you have a large number of problem matchups. The pairing is supposed to be done by the system and the organizer is supposed to be unable to influence it, but at this point you have to be influencing it just in order to settle the pairings.
Doing accelerated pairings and having multiple games in a series, allowing for people to have a variance in their number of points per round, would mitigate this issue to a degree, but without seeing some math I'm not convinced that it would prevent it.
Further, the concept of using someones ranked ladder stats to sort teams is going to be full of problems, since not everybody ladders, the best players in tournaments are often not the best players on ladder, and Blizzard has stated that sc2ranks is not 100% accurate at sorting player skill level because we don't have the full equation on how the hidden MMR rating works.
It would probably be fine for the first round, because that has to be somewhat arbitrary, but by using it repeatedly round after round you're undermining the accuracy of your sorting method.
Obviously if you could get a tournament system going and seed players by earlier tournament scores this would stop being an issue, but there would have to be an initial tournament at some point, and it would be nice to create a system that could be used for one-off tournaments and not require them to be part of some sort of league.
|
On November 13 2010 03:06 Jayrod wrote: The extended series makes the finals of the tournament feel alot less epic. To be honest, I dont care whats fair to the players involved because the ONLY reason there is money involved is the viewership. No fans = No esports = no sponsors = no money for tournaments = tournaments would only be based on entrance fees etc.
If they want to make esports bigger, faster, they will understand that you want your finals to feel epic and last more than like 2 games. Its a really big let down to watch people battle it out to get to the end of this big tournament then have one guy playing with a handicap. I guess thats what happens with a double elimination style though... the extended series sucks.
The finals would of been 2 games if it was standard double elimination. The anti climatic finals is more down to using bo3 the whole way through, when switching to bo5 near the end would be more exciting.
|
early on in the article when explaining the series types a typo
"This rule is intended to avoid some paradoxical outcomes, as well as statistically increase the likelihood that the 'better player' continues in the tournament. It is possible in standard double elimination for Alice to defeat Bob 2-1 in the winners', and Bob to defeat Alice in the losers' 2-0. The "overall series" between Alice and Bob is 3-2 in Alice's favor, but Bob continues and Alice does not."
those should be 2-0, 2-1 and 3-2 in order to make sense
|
On November 13 2010 04:20 Cyber_Cheese wrote: early on in the article when explaining the series types a typo
"This rule is intended to avoid some paradoxical outcomes, as well as statistically increase the likelihood that the 'better player' continues in the tournament. It is possible in standard double elimination for Alice to defeat Bob 2-1 in the winners', and Bob to defeat Alice in the losers' 2-0. The "overall series" between Alice and Bob is 3-2 in Alice's favor, but Bob continues and Alice does not."
those should be 2-0, 2-1 and 3-2 in order to make sense
You are correct, sir.
|
On November 13 2010 04:11 Ketara wrote:Show nested quote +On November 13 2010 02:06 nzb wrote:On November 13 2010 01:58 Ketara wrote: That would not solve the problem of having 32 players with the same score in round 2.
I've used swiss pairing to organize tournaments for different games before, and I do not see how you'd be able to sort that.
Edit: I gotta go but we should continue this discussion! I imagine you would just use the initial seeds to sort between people with equivalent scores, and having 3x the number of rounds would quickly break people into categories. I know that Google used this exact format (swiss->single elim) for their internal tournament at the end of the beta. The problem here is that sorting ties when using swiss pairing becomes astronomically more difficult the more ties you have. Lets say we have 64 players, and we're using their sc2ranks scores to sort ties. The first round is sorted that way, which is fine. The nature of the system is such that the first round has to be somewhat arbitrary. After the first round, you have 32 players with 1 win and 32 players with 0 wins. The 32 with 1 win haven't played each other yet, so their sc2ranks stats can be used to sort them, and the same for those with 0 points. However, the problem then occurs in round 3, where we have 16 people with 2 points who haven't played each other, 16 with 0 points who haven't played each other, but 32 with 1 point who may have played the person with the closest rank in the first round. In fact, if the people with the higher ranks are consistently beating the people with the lower ranks, it is very likely that you will have a large portion of these 32 who's closest ranked opponent is the person they played in the first round. Since by the rules of the system people cannot play each other twice (you could change that rule I suppose!) you are then looking for the second closest ranked person for every single one of your problem matchups. It doesn't take a math whiz to see that at this point, every matchup that is altered in turn alters every other matchup, and it becomes very ungainly to organize it if you have a large number of problem matchups. The pairing is supposed to be done by the system and the organizer is supposed to be unable to influence it, but at this point you have to be influencing it just in order to settle the pairings. Doing accelerated pairings and having multiple games in a series, allowing for people to have a variance in their number of points per round, would mitigate this issue to a degree, but without seeing some math I'm not convinced that it would prevent it.
I haven't worked this all out myself, so bear with me... As the number of rounds proceeds, you end up with this spread:
Round | # of wins (starting from 0, increasing) 1 | 64 2 | 32 32 3 | 16 32 16 4 | 8 24 24 8 5 | 4 16 24 16 4 6 | 2 10 20 20 10 2 7 | 1 6 15 20 15 6 1
I think you could adopt the algorithm of, starting from the top ranked player ,choose the next best play who he hasn't played, and continue down from the top selecting players until you have everyone paired.
Dammit, now you have me interesting writing this all up and simulating it again.
Further, the concept of using someones ranked ladder stats to sort teams is going to be full of problems, since not everybody ladders, the best players in tournaments are often not the best players on ladder, and Blizzard has stated that sc2ranks is not 100% accurate at sorting player skill level because we don't have the full equation on how the hidden MMR rating works.
It would probably be fine for the first round, because that has to be somewhat arbitrary, but by using it repeatedly round after round you're undermining the accuracy of your sorting method.
Obviously if you could get a tournament system going and seed players by earlier tournament scores this would stop being an issue, but there would have to be an initial tournament at some point, and it would be nice to create a system that could be used for one-off tournaments and not require them to be part of some sort of league.
I don't think this is much of an issue, because you can use season ranks (presuming that you are a league and have multiple tournaments in a season). And presumably these would be reasonably accurate.
|
For our round 3 where the 32 players come up, lets see how sorting the next best player works. I'm going to use an 8 person example because that way I don't have to type as much.
We have 16 players, all with 1 point in the tournament so far:
Player A: 225 MMR, played E in round 1 Player B: 230 MMR, played H in round 1 Player C: 250 MMR, played a player who now has 2 points in round 1 Player D: 200 MMR, played F in round 1 Player E: 220 MMR, played A in round 1 Player F: 180 MMR, played D in round 1 Player G: 240 MMR, played a player who now has 2 points in round 1 Player H: 235 MMR, played B in round 1
Lets sort them from the top down.
C (250 MMR) plays G (240), which is straightforward. H (235) should play B (230) but can't, so he goes to the next player down, A (225) B (230) plays the #4 player, E (220) This leaves D (200) and F (180), but they cannot play each other.
So now we have to sort it from the bottom up instead.
When you sort it that way it works out, but results in C and G not playing each other which was your only obvious matchup, and gives you F at 180 MMR playing E at 220 MMR, which is not entirely fair, because at that skill differential it is likely a free point for player E that was caused by ties in the system.
And that is only with 8 people. I dunno. It has the potential to work, it just seems difficult to me. I am betting that games that only count wins as either a win (1) or a loss (0) generally do not have 64 players in their swiss tournaments. The game that we use it for, Field of Glory, has a 25 point scoring system and a very reliable ELO ranking for players, and only one league for every tournament. Plus, our tournaments rarely break 20-25 people.
|
On November 13 2010 06:05 Ketara wrote:For our round 3 where the 32 players come up, lets see how sorting the next best player works. I'm going to use an 8 person example because that way I don't have to type as much. We have 16 players, all with 1 point in the tournament so far: Player A: 225 MMR, played E in round 1 Player B: 230 MMR, played H in round 1 Player C: 250 MMR, played a player who now has 2 points in round 1 Player D: 200 MMR, played F in round 1 Player E: 220 MMR, played A in round 1 Player F: 180 MMR, played D in round 1 Player G: 240 MMR, played a player who now has 2 points in round 1 Player H: 235 MMR, played B in round 1 Lets sort them from the top down. C (250 MMR) plays G (240), which is straightforward. H (235) should play B (230) but can't, so he goes to the next player down, A (225) B (230) plays the #4 player, E (220) This leaves D (200) and F (180), but they cannot play each other. So now we have to sort it from the bottom up instead. When you sort it that way it works out, but results in C and G not playing each other which was your only obvious matchup, and gives you F at 180 MMR playing E at 220 MMR, which is not entirely fair, because at that skill differential it is likely a free point for player E that was caused by ties in the system. And that is only with 8 people. I dunno. It has the potential to work, it just seems difficult to me. I am betting that games that only count wins as either a win (1) or a loss (0) generally do not have 64 players in their swiss tournaments. The game that we use it for, Field of Glory, has a 25 point scoring system and a very reliable ELO ranking for players, and only one league for every tournament. Plus, our tournaments rarely break 20-25 people.
Well, I went ahead and implemented the swiss style I was talking about and I'm running a million iterations right now. Basically, if there isn't a "valid match", then you drop the requirement that players can't play each other, and there is a re-match. This doesn't happen very often, though -- about 2% of the time. Also, this only impacts people in the bottom of the ranking (because the best get priority), so its probably not too much of a concern.
The bad news is that initial results from running 50k iterations didn't look very good. I think that after running the tournament with a lot of rounds, you end up having the top players play bad players in the final rounds because they have already played all the other good players ... I'll have to follow up on this to see exactly whats going on.
|
Swiss ranking has a system for how many rounds you are supposed to have based on how many people have entered the tournament in order to achieve the best sorting, which as I understand it is because of that issue.
I think this is better than the wikipedia article: http://vtchess.info/Results/Swiss_Pairing_System.htm
"The rule of thumb is that it can handle 2n players, where n is the number of rounds. Therefore, 8 players needs 3 rounds, 16 players needs 4 rounds, 32 players needs 5 rounds, and so forth. (These numbers are approximations - due to draws and other variables, sometimes it works with more players than expected.)"
Using accelerated pairings allows you to have a tournament with 1 fewer rounds than what is necessary, but requires you to have a skill approximation of your players, and requires that you sort your initial round by that approximation.
Another note about swiss ranking is that there are actual literal ways to game the system. If you have access to the rankings and know everybodies score and who played whom and can do some quick math, you can at times figure out that if you lose a game on purpose, your next two opponents will be ones that you know cannot defeat you.
We call it "submarine-ing" and it's not a very honorable thing to do in our competitions but people do do it.
It is also sometimes possible to cheat the system by arranging a draw on purpose, but I imagine in Starcraft that would not be possible since draws are so difficult to create.
|
On November 13 2010 07:49 Ketara wrote:Swiss ranking has a system for how many rounds you are supposed to have based on how many people have entered the tournament in order to achieve the best sorting, which as I understand it is because of that issue. I think this is better than the wikipedia article: http://vtchess.info/Results/Swiss_Pairing_System.htm"The rule of thumb is that it can handle 2n players, where n is the number of rounds. Therefore, 8 players needs 3 rounds, 16 players needs 4 rounds, 32 players needs 5 rounds, and so forth. (These numbers are approximations - due to draws and other variables, sometimes it works with more players than expected.)" Using accelerated pairings allows you to have a tournament with 1 fewer rounds than what is necessary, but requires you to have a skill approximation of your players, and requires that you sort your initial round by that approximation. Another note about swiss ranking is that there are actual literal ways to game the system. If you have access to the rankings and know everybodies score and who played whom and can do some quick math, you can at times figure out that if you lose a game on purpose, your next two opponents will be ones that you know cannot defeat you. We call it "submarine-ing" and it's not a very honorable thing to do in our competitions but people do do it. It is also sometimes possible to cheat the system by arranging a draw on purpose, but I imagine in Starcraft that would not be possible since draws are so difficult to create.
Yeah, after hearing from all these people that have actually played in tournaments that use swiss style, I'm not sure it is actually preferable to simple double elimination. I still think it is a cool idea though. I'll try running my simulation again using normal best of three once it finishes, instead of increasing the # of games.
It seems to me like tournaments like the GSL, who have literally thousands of people enter the qualifier, need a better system than single elimination in order to determine who qualifies. Since they take the top 64 anyway, it seems like swiss might be useful there ... But it really isn't acceptable that Tester, OGSTop, July, even Jinro can't qualify because they hit good players in the randomly-seeded qualifiers.
|
I think Swiss ranking is an excellent system if you A - Have a small number of people competing, B - Have a scoring system that makes identical scores rare, and C - Have to finish the tournament in a timely fashion such that round robin is impossible.
I too am shocked at the way GSL does its qualifiers, but I am under the impression that they are only doing these 3 with this system in order to create seeds, and the system they use next year is going to be markedly different and (presumably) better. Any time you're creating a league the first event is bumpy.
I do think that for a team vs. team competition swiss pairing would work great though. It'd be fun to know how the SC BW team leagues work. I'm sure there's a Liquipedia article on it and I just don't care enough to read it.
If you've got say 8 teams with 4 players each. The pairings for the first round could be random, with 4 Liquids playing 4 EG say, and then count the number of wins as the score, with however many rounds. This would necessitate that the team do well as a whole in order to win the tournament, because the teams best players win is only worth as many points as their worst players win.
Round Robin would probably work well for a team league too however since the number of participating teams would not be huge, and it would likely be more accurate.
|
So curious results for the swiss implementation -- with 64 players it does:
Winner - 0.67 Depth - 33.38 2^Depth - 43.73
To put in perspective, both the depth metrics are slightly worse than the single-elimination tournament format (which does 24.13/40.13), but the winner metric is better than double with extended series and almost as good as round robin (which get .79 and .55, respectively). So thats certainly unexpected -- so far the trends in every metric had been pretty consistent.
I bet its doing poorly because there are too many games, I'll try running it again.
|
|
|
|