|
On November 12 2010 10:34 scion wrote:Show nested quote +On November 12 2010 10:30 nzb wrote:On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. Well non-transitivity can occur especially if you are comparing between a non-team mate and 2 team mates. Incontrol might be better than machine because he knows his teammate well, but machine might be better than Painuser but Painuser is better than Incontrol. (random names) So its not necessarily clear in reality. =/
I agree completely, but I think these are second order effects, and a simple model will capture the main effects that one is interested in. That is, I suspect non-transitivity is "lost in the noise".
|
Why can't they just use double elimination into the Ro8 then do round robin? That would keep the number of games played low but make placement among the top 8 players more accurate.
|
I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. It puts a lot of pressure on the loser that wouldn't be there if it were simply another best of 3.
+++
|
There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
|
On November 12 2010 10:37 Zzoram wrote: Why can't they just use double elimination into the Ro8 then do round robin? That would keep the number of games played low but make placement among the top 8 players more accurate.
People usually tend to the do the opposite, funnily enough...
Most professional sports have a round-robin regular season, followed by single-elimination tournament for the play offs.
FIFA world cup has round-robin-based group play, followed by single elimination.
The main reason why people prefer elimination tournaments for getting a champion is its more exciting and tense, because you could be eliminated for good at any point.
Also, round robin tournaments have the HUGE flaw that someone can secure #1 spot with several games left in the tournament -- this usually unacceptable for determining a champion because then the final games of the season are literally irrelevant. Swiss-style tourneys have the same issue. Wikipedia has good info on this.
|
On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match.
On November 12 2010 10:40 darmousseh wrote: There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
I agree, but I honestly have no idea how to model this, so I just threw it out from the beginning. Take it or leave it. :/
That's what the "Scope" section was there for.
|
On November 12 2010 10:40 darmousseh wrote: There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
Oh, and I would have done swiss, but I was getting bored. It'd be awesome if you did it.
|
On November 12 2010 10:30 nzb wrote:Show nested quote +On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. I disagree that reality is close. For instance pre-tourny idra said in an interview that Sen was unlikely to win because he did not obey the law of transitivity. Well what he said was that Sen wasn't that good at ZvT I believe, but great at other matchups, but because it was inevitable for him to run into decent Ts he had a very low chance of winning. So for instance we might have Sen > Machine, Machine > Drewbie, Drewbie > Sen and this completely ruins any kind of argument that assumes transitivity (these examples are to some extent arbitrary and people shouldn't take offense if they feel I got the inequality sign the wrong way, they are just for demonstration).
If Sen could beat any protoss or zerg 7-0 95% of the time, but had 50% chance against Ts, would it then be a failure or success for him to win? If you allow race-specific skill-levels (or even player-player specific skill-levels) then we suddenly can't define what a good outcome for the tournament is since we can't say whether hypothetical Sen should rank higher than hypothetical Idra who wins 75% against anyone except Sen.
EDIT: I realize you purposefully excluded this since it is hard to model, but I feel that any model that doesn't even take race-specific stats into account is bound to be flawed, and even if it ends up agreeing with a more thorough analysis you had no sound reason to believe that to be the case.
|
On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. It puts a lot of pressure on the loser that wouldn't be there if it were simply another best of 3.
+++
By same logic, I can say if it was clean Bo3, the guy coming from winners have a huge pressure because he beat him once but face possible elimination by a guy you beat already.
There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
You can never have the perfect stats model nor should you even attempt that. The best/most useful stat model is the simplest model. I think the OP did a great job in showing extended series is only slightly more accurate/fair than just simple double elimination, questioning its need for existence considering all other factors.
|
On November 12 2010 10:49 rasnj wrote:Show nested quote +On November 12 2010 10:30 nzb wrote:On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. I disagree that reality is close. For instance pre-tourny idra said in an interview that Sen was unlikely to win because he did not obey the law of transitivity. Well what he said was that Sen wasn't that good at ZvT I believe, but great at other matchups, but because it was inevitable for him to run into decent Ts he had a very low chance of winning. So for instance we might have Sen > Machine, Machine > Drewbie, Drewbie > Sen and this completely ruins any kind of argument that assumes transitivity (these examples are to some extent arbitrary and people shouldn't take offense if they feel I got the inequality sign the wrong way, they are just for demonstration). If Sen could beat any protoss or zerg 7-0 95% of the time, but had 50% chance against Ts, would it then be a failure or success for him to win? If you allow race-specific skill-levels (or even player-player specific skill-levels) then we suddenly can't define what a good outcome for the tournament is since we can't say whether hypothetical Sen should rank higher than hypothetical Idra who wins 75% against anyone except Sen. EDIT: I realize you purposefully excluded this since it is hard to model, but I feel that any model that doesn't even take race-specific stats into account is bound to be flawed, and even if it ends up agreeing with a more thorough analysis you had no sound reason to believe that to be the case.
My point is: do you think Sen (or any player) has 50% against ALL terrans? No, he has 50% against T's above a certain caliber. I don't think there is any player that is 95% against good protoss and 50% against terrible terrans.
So, you will see this effect at the top of the tournament. But for very large tournaments, this effect won't be particularly significant. And my results include 512-player tournaments, which show the same trends, so I think the results can be considered although there are problems with the models.
EDIT: I can't speak English.
|
On November 12 2010 10:43 nzb wrote:Show nested quote +On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. Show nested quote +On November 12 2010 10:40 darmousseh wrote: There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two. I agree, but I honestly have no idea how to model this, so I just threw it out from the beginning. Take it or leave it. :/ That's what the "Scope" section was there for.
Well, in the tournament do the following. Give each player 3 ratings and deviations and assign each player a random race.
So Player -race = 0 (0, 1 or 2) -Mean = [1500,1400,1600] (Mean against each race) -dev = [100,100,100] (deviation against each race)
Then make some maps. On each map do the following Map1 -bonus = [0,0,100] (bonus mean for each race)
Add that value to the races mean if they are playing on that map. (it doesn't have to be overly complex, but this works as a generalization) And then during the match either have a static map pool, or have the loser or lower seed choose the map that gives them the most extra points against their opponent.
It is highly likely that the end result will deviate more based on the choice of maps (check results when giving a race a heavily favored map pool) compared to dynamic maps or more balanced maps.
|
On November 12 2010 10:52 scion wrote:Show nested quote +On November 12 2010 10:40 KevinIX wrote: I think your analysis is missing the biggest problem.
In an extended series, you have to consider the psychological impact. You're playing against a guy who has already shown he can beat you, and you are down two games. This is a terrible way to start a match. It puts a lot of pressure on the loser that wouldn't be there if it were simply another best of 3.
+++ By same logic, I can say if it was clean Bo3, the guy coming from winners have a huge pressure because he beat him once but face possible elimination by a guy you beat already. Show nested quote + There is a fairly huge problem with doing this analysis. You assume that performance is the same in every game, however, i would argue that each player has a different performance value on a different set of maps and matchups. Idra might be 75% on Metalopolis in zvp, but i bet he's like 20% on kulas ravine in the matchup. This is why map choice plays a significantly bigger factor than what is going on here and why each series should be considered a single event. Other than what you showed through your simulation (which is correct) tournaments are only good at ordering when there is a low deviation in performance, such as in basketball or in chess, compared to a game with a high deviation (such as sc2 or football). This is why in sports like football, predictions about who will win highly depends on who has home field advantage. For fun you should throw in a swiss tournament style to see the results. If you want i can write up some code, although i don't know the go language i'm sure i can learn it in an hour or two.
You can never have the perfect stats model nor should you even attempt that. The best/most useful stat model is the simplest model. I think the OP did a great job in showing extended series is only slightly more accurate/fair than just simple double elimination, questioning its need for existence considering all other factors.
Very true, i agree that his thread shows that it doesn't help too much, but there is another issue that is unchecked which is that the map pool is more likely to influence results than choice of tournament format.
Which also leads me to believe that if MLG truly wants to punish a player, if they lose, then the forfeit the right to remove a map, and if the need punishment again, let the opponent choose another map to eliminate. That is a sure way to punish players.
|
I just wanted to say thank you for doing all of this leg work. I feel like this is what I've had going through my head since listening to the most recent State of the Game, but neither had the means, talent, nor time to put it all together in such a great post. It makes me just as happy that the shortcomings (race matchup, psychology, maps) are readily explained as such. It makes for a much more civil discussion. Anyway, this post makes me happy, so thank you.
|
On November 12 2010 10:23 paralleluniverse wrote: The nontransitivity is taken in account since performance was measured using a mean +/- and random number. And that allows for the possibility that player A will beat player B, player B beats player C, and player C beats player A.
As the OP noted, the statistics do not take that case into account. The reason is that the way you model a random process statistically is inherently different than how you treat a causal process. The math's not the same.
|
On November 12 2010 10:58 nzb wrote:Show nested quote +On November 12 2010 10:49 rasnj wrote:On November 12 2010 10:30 nzb wrote:On November 12 2010 10:26 rasnj wrote:On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis. Although reality isn't exactly transitive, it is pretty close. That is, you can pretty confident saying that IdrA > Gretorp > HDstarcraft (random names, don't take offense). So although there are players near each players skill that confuse the issue slighly, the large-scale picture is still pretty clear because there is actually some order. I disagree that reality is close. For instance pre-tourny idra said in an interview that Sen was unlikely to win because he did not obey the law of transitivity. Well what he said was that Sen wasn't that good at ZvT I believe, but great at other matchups, but because it was inevitable for him to run into decent Ts he had a very low chance of winning. So for instance we might have Sen > Machine, Machine > Drewbie, Drewbie > Sen and this completely ruins any kind of argument that assumes transitivity (these examples are to some extent arbitrary and people shouldn't take offense if they feel I got the inequality sign the wrong way, they are just for demonstration). If Sen could beat any protoss or zerg 7-0 95% of the time, but had 50% chance against Ts, would it then be a failure or success for him to win? If you allow race-specific skill-levels (or even player-player specific skill-levels) then we suddenly can't define what a good outcome for the tournament is since we can't say whether hypothetical Sen should rank higher than hypothetical Idra who wins 75% against anyone except Sen. EDIT: I realize you purposefully excluded this since it is hard to model, but I feel that any model that doesn't even take race-specific stats into account is bound to be flawed, and even if it ends up agreeing with a more thorough analysis you had no sound reason to believe that to be the case. My point is: do you think Sen (or any player) has 50% against ALL terrans? No, he has 50% against T's above a certain caliber. I don't think there is any player that is 95% against good protoss and 50% against terrible terrans. So, you will see this effect at the top of the tournament. But for very large tournaments, this effect won't be particularly significant. And my results include 512-player tournaments, which show the same trends, so I think the results can be considered although there are problems with the models. EDIT: I can't speak English.
No I dont believe that. That was an extreme example to illustrate an issue with the model. But I do believe there are players that are considerably better at some matchups than others. I do believe there exists at least some triples A,B,C of players such that A is favored when against B, B is favored when against C, and C is favored when against A.
For example Huk's PvP is 70%, but his PvZ is 50%. Kiwikaki's PvP is 54.17%, but his PvZ is 63.64%.
In Huk vs Kiwki, Huk is probably favored (their current score is 6-2 in favor of Huk), but against a good zerg Kiwi has a considerable higher chance of passing him. [Statistics from TLPD, so only high-profile matches included, but the general trend of their results is correct I believe]
|
On November 12 2010 10:26 rasnj wrote:Show nested quote +On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? What exactly would be the goal then? I thought about doing this kind of analysis myself, but decided that I couldn't formulate exactly what I wanted the tournament system to accomplish without imposing a total order on the skill levels of the players, and I considered this too far from reality to bother. If you can clearly express the goal of your tournament and a way to determine how far a given ranking is from that goal, then we can probably do some analysis.
Well, that is precisely what I am wondering. Is it possible that rather than specifying a 'goal' for the tournament, you instead want to find the tournament framework that minimises the impact of the order in which the players face each other (i.e. ideally it shouldn't matter if Idra plays Huk in the first game or HDStarcraft) when determining the winner (or rankings, or whatever goal you want to pick for the tournament).
The point of going to these lengths is to address exactly the difficulty you talk about - it's not realistic to impose a total order on the skill levels of the players (though it's obviously not terribly far out; we just want to know if it will affect the analysis or not).
|
On November 12 2010 10:26 nzb wrote:Show nested quote +On November 12 2010 10:19 Dragar wrote: Is it possible to rephrase the question to not assume that the better player is transitive? So that the goal is not to determine the 'best' player, but rather to minimise the effect of matchup ordering, etc? This is definitely possible, you would need some kind of relation for each player to every other. The problem with this is you would end up with a lot of choices in terms of modeling -- because the relationship, while not perfectly transitive, is pretty close. (That is, although the cream of the crop might be extremely intransitive, they are definitely better than most of the other players). Therefore the relation you come up with shouldn't be completely random. This kind of data would probably have to be pulled from actual player statistics, which would actually be a huge improvement to the study overall. But until that happens, I think keeping it simple is better because you avoid a lot of complexities that don't necessarily improve the results.
All agreed, but I'm trying to think if there's some simple way of making a strong argument that the current analysis is unaffected by these complications, given our new rephrasing of the point of interest.
|
First and foremost, this is an excellent post, and thanks for putting it out there in such a formal manner. I definitely dig it.
What I wonder is: if instead of the extended series in the format they currently have, what if instead of going on to a best of 7 with the previous results carrying over, it just becomes a clean best of 7. This addresses the problem of each match being a separate event, but also takes into account that it the better player should win.
I don't think anyone doubts that the winner of a best of 7 is a fairly good indicator of who is the better player. This way, too, you're not stuck in a situation where you may be one or two games behind and have to overcome a huge psychological road block as well as a clear disadvantage in terms of having to win more games than the other person. I think this would help mitigate the issues of getting unlucky when it comes to the map pools as well. If you're 2 games behind and the next two maps are your worst maps, that is going to be a tough break.
|
The OP is very interesting albeit somewhat distracting to those who don't understand the point of the extended series. B/c every tournament has a 1st, 2nd (...) and last place every tournament employs some kind of sorting algorithm to determine the final standings.
Your analysis shows what I gave up trying to argue in the extended series thread: the extended series (however marginally) improves the confidence in the final result. Amusingly in the SotG cast the only top player to demonstrate a modicum of understanding was Tyler. I was really surprised Idra, Day9, and JP were completely clueless with regards to the internal logic behind the tournament format. Especially a player like Idra since the extended series would benefit him the most.
Also serious lol @ Day9 for arguing against a specific tournament format based on his "feelings." I love Day9 but sometimes he is wrong.
EDIT: Great OP btw nzb, I appreciate your work.
|
I do a lot of tabletop gaming, and we use swiss pairing for our tournaments.
I am not a big math guy, but I'm under the impression that swiss pairing would not work well for Starcraft, because it operates best when you can gain a varying number of points for a victory. This can make it so in a four round tournament I've literally won the tournament already at the end of round 3, as you've pointed out, but it also means that if I'm on the top table in round 4, and I win but only slightly, a person on the second table who wins big can still place higher than me.
I'm not sure how you would set up a points system for Starcraft games that people would be able to agree on or that would really be fair.
On a different note, what I don't really like about the extended series rule, which is not part of your study, is that it makes the final games of the tournament less exciting.
Lets say for example Jinro beats TT1 in the winners semifinals 2-0, but then TT1 comes back to play him in the finals. We now have an extended series for the finals, but Jinro is already up 2-0. This makes the statistical chance of him winning the tournament from that point on much higher, and no matter how exciting the games are, the fact that I am aware of this makes it less fun for me.
Further, I'd imagine that the chances of any given game in the tournament being an extended series becomes much higher the longer the tournament goes on and the more people are eliminated. It'd be nice to see some numbers on what the chances of the finals being an extended series compared to the lower rounds would be. I'd imagine they're much higher.
These two problems together make for a tournament end that's very anticlimactic, in my opinion.
|
|
|
|