With a new OSL coming up, I decided to see if I can make some statistical analysis of who is likely to win (Bisu?), and perhaps other predictions, based on simulation. (I got the idea from Lightwip's recent thread).
First off, let me describe the process, which is in fact rather simple.
The Process The general idea and most basic assumption is that if you pit any two progamers A and B against each other, the chance that A will win is proportional to A's winrate vs the race of B, divided by B's winrate vs the race of A. The algorithm is like the following:
1. Denote by p1 player 1's winrate vs player 2's race, and by p2 player 2's winrate vs player 1's race. 2. Generate two uniformly distributed random numbers u1 and u2. 3. If p1*u1 > p2*u2, then player 1 wins. Otherwise player 2 wins.
There are naturally many other plausible ways of simulating the outcome of a match. Which one is the most realistic is hard to say, but it is certainly possible to think of schemes that puts more or less emphasis on the winrate differences.
Now I can take individual winrates for a large number of progamers and simulate the outcomes of matches between them. I have specifically tried to replicate the proceedings of the (rather complicated) OSL format in order to simulate who will win the OSL. By using a modular approach to the implementation (which is done in Matlab for my own convenience), I can now simulate a number of interesting scenarios, including how the rules for seeding affect the likelihood of certain players to win.
The group selection process for OSL round 2 is perhaps the most dubious here, because it's hard to know how the players will choose. I went for simplicity though, letting the seeds always choose the lowest ranked player available from the pool (current ELO rank). Other plausible (but slightly harder to implement) strategies would be to choose players based on winrates.
The players I have included in my implementation were for simplificy the 74 currently highest ELO ranked players according to TLPD. I also added HiyA, who is actually a seeded player for the upcoming OSL, but he didn't appear in the list because he doesn't have a team anymore .
I could add more players, but the problem with adding for example all the ones who play in the offline preliminaries is that many of them don't have any statistics that can be used to guess how well they will perform.
There are a couple of obvious limitations/assumptions/simplifications with this model, but I'll skip the boring part for later and jump straight to the results.
Results If we make 10000 OSL simulations, where each one is using the same seed as the upcoming OSL (Jangbi, Fantasy, Hydra and n.Die_soO seeded into the Ro16, etc.), we arrive at the following distribution of OSL golds:
Looks like a fairly good risk chance of a TvT finals this time! These numbers are of course greatly affected by the seeds. Interestingly enough, Bisu seem to be a pretty bad bet for the gold. Let's see what happens if we start with this seed, and then let the new seeds be decided by the results of the previous ones...
Flash 15.42% Jaedong 9.27% Bisu 6.43% Fantasy 5.31% Sea 4.09% Effort 3.86% Neo.G_SoulKey 3.31% Stork 3.29% Leta 3.04% Best 2.85% Stats 2.53% Hydra 2.41%
Pretty much regardless of where you start in terms of seeded players, the above distribution will manifest itself. Removing the effect of seeded players completely by selecting them randomly each time gives us the following (for 10000 simulations):
Flash 8.81% Jaedong 6.68% Bisu 5.17% Fantasy 4.29% Effort 3.68% Sea 3.63% Leta 3.32% Neo.G_SoulKey 3.09% Stork 2.90% Hydra 2.35% Stats 2.26% Best 2.24%
As we can see, the seeds amplify the probability for the strong players to win, with a lower chance of upsets. But even without the help of seeds (as in the last simulation above), the over all pattern doesn't change - Flash is still God, Jaedong a few leaps behind, then Bisu, Fantasy, and after that the individual skills are quickly starting to drown in the statistical noise. In fact, the likelihood of winning an OSL if the seeds are randomized (a measure of individual skill as good as any) is declining in an almost exponential fashion:
Probability of winning for the top 75 players, on a semilog scale. One interpretation of this is that the skill of individual players are fairly evenly distributed, with a few statistical abominations like Flash and Jaedong.
Limitations 1. The player statistics is a somewhat flawed measure of how well an individual player will perform in a tournament. For old players, having been in a slump in the past may reflect poorly on the current form, and vice versa - a player with historically impressive stats may be in a bad form at the moment. New players on the other hand may have played too few games to give a reasonable idea of their abilities - some player may have played only one game in a match up, and regardless of whether it was a win or a loss, it gives a bad statistic on which to guess the outcome of future games.
I can't really think of any better measure that can be used in practice, however.
2. I haven't added stats for all currently active progamers, partly because most of the ones that I didn't include don't have any recorded games at all. In any case, this means that some of the people who are actually in the offline preliminaries right now are not included in my analysis.
3. I haven't simulated who will advance from the PSL, but rather chosen them at random from the remaining (non-seeded) players. Although it shouldn't be too hard to include, I doubt it would make a very big difference for the general outcome. I guess it's on the to do list for this project, though.
4. Things like player experience, coaching, ambition, ability to handle stress, current form and so on are of course important factors in any individual league, but they are also hard to quantify. They are things to keep in mind that add to the uncertainty of this type of simulation, but there's not much else to be done about them in terms of extensions to the model.
5. It's not so hard to add things like (approximate) confidence intervals for the winrates. The only reason I haven't done it is because it's getting late and I wanted to finish this post before I go to bed. Edit: In the following simulations I have added approximate 95% confidence intervals based on the central limit theorem.
Update After some pain-staking copy and paste work from TLPD, I have collected the players individual match results. This allows me to put weights on the wins and defeats based on how long ago the game was played. I chose the following weight function:
w(t) = 0.5 - (pi/2)*arctan((t-100)/100), where t is the number of days since the game was played.
Basically what this does is it puts a lot of emphasis on recent games, and much less on games played in the past. For gamers like Flash and Jaedong, this puts over half of the weight on games played within the last year, and games from the beginning of their career count for virtually nothing.
Naturally this changes the winrates for all the players, most for the better, some for worse. Here's a list of the players who had an overall benefit from the weights: + Show Spoiler +
Flash Bisu Fantasy Neo.G_Soulkey Stork Leta Stats n.Die_soO Movie Horang2 Baby Best Calm Killer Action Brave Crazy-Hydra HoeJJa TurN firebathero Hyuk Shine Jaehoon Modesty Last Reality Grape Perfective sHy By.Sun s2 hyvaa sKyHigh BarrackS Canata Flying hero PianO Ssak Peace herO[jOin] Chavi PerfectMan Orion BByong Where Juni Sacsri hOn_sin Alone Sharp HiyA
Of course there are virtually an unlimited number of possible weights. I think the one I chose should capture the players current likelihood to win better than the unweighted winrates, however. For example, Flash went from 75% vT, 71% vZ, 70% vP to the incredible 81% vT, 79% vZ and 68% vP. Jaedong on the other hand had his winrates reduced by about 2% except for vP, which increased by 1% (units).
Anyway, for some more results!
This time I increased the number of simulations to 100,000. For the static seed (current OSL seed):
(The confidence interval is approximate 95%, based on the central limit theorem)
I'm sure you're as surprised as I am about this. Fantasy, wtf? Maybe I did something wrong, but I don't think so, I tripled checked the code. Of course Fantasy's stats are worse than Flash, even though they were improved a lot by the weights. Somehow the configuration of winrates swung greatly in favour of Fantasy, who in this model has a statistically significant edge to win over God himself! And who would have guessed n.Die_soO to be so far ahead of Hydra, Stork and Jangbi?
Let's see what happens if we randomize the seeds. Once more for 100,000 simulations:
I don't think overall winrate really means much, since a lot of players like sea, effort, and hydra are sucking hard right now. I am also confused about your inclusion of + Show Spoiler +
But seeds don't matter beyond the Ro16 anyway (actually, do they now randomize Ro16 too? You used to be able to pick them A picks B who picks C who picks D. All A's were previous semifinalists from last season). Quarterfinals are randomly assembled, so at that point, there's no advantage to having a higher seed, as you will not be guaranteed to play a much lower seed, or even a lower seed at all.
I like that you're having fun with statistics, and OSL hype is good, but I don't feel like a statistical analysis offers much insight when compared to our own empiricism. Your simulations show that Flash, Jaedong, and fantasy are favorites. Well, none of us had to run any simulations to tell you the same thing.
Fun bit of research and interesting to read. Thanks for taking the time to do it and share it with us!
I was surprised to see Fantasy rank above Jaedong in the first version of the chart. Then I saw the other two versions with your seeding and with randomized seeding and order was restored.
One other thing that made me smile: I know you're from Sweden, so the occasional odd choice of English word is certainly forgivable, but I really liked
a few statistical abominations like Flash and Jaedong.
FWIW, the usual term is "statistical anomalies", but I like your phrasing here much better!
One other thing that made me smile: I know you're from Sweden, so the occasional odd choice of English word is certainly forgivable, but I really liked
a few statistical abominations like Flash and Jaedong.
FWIW, the usual term is "statistical anomalies", but I like your phrasing here much better!
Lol, I though the exact same thing. They're not just anomalies at this point, they're abominations.
I enjoyed the analysis. Anything that gives Jaedong as much as a 6-10% chance of winning gets me hyped. Although, in fairness, he is going to win for sure
i think for any Jaedong fan, anything but a jaedong vs flash finals will be a disappointment.
Honestly if finals doesn't feature a combination of flash/fantasy/stork/jaedong, I would rather it be 2 players that haven't made the OSL final before.
On April 01 2012 11:49 Ideas wrote: i think for any Jaedong fan, anything but a jaedong vs flash finals will be a disappointment.
Honestly if finals doesn't feature a combination of flash/fantasy/stork/jaedong, I would rather it be 2 players that haven't made the OSL final before.
Exactly. I either want a JvF (Flash or even a Fantasy rematch) or maybe JvStork. I do not want a Flash v [irrelevant]. If we can't have JvF, may as well have 2 out of the wild, like Alone v Grape. Epic.
On April 01 2012 11:49 Ideas wrote: i think for any Jaedong fan, anything but a jaedong vs flash finals will be a disappointment.
Honestly if finals doesn't feature a combination of flash/fantasy/stork/jaedong, I would rather it be 2 players that haven't made the OSL final before.
Exactly. I either want a JvF (Flash or even a Fantasy rematch) or maybe JvStork. I do not want a Flash v [irrelevant]. If we can't have JvF, may as well have 2 out of the wild, like Alone v Grape. Epic.
If you're a Jaedong fan, why wouldn't you want Jaedong vs. anyone if you can't have Jaedong vs Flash/Fantasy/Stork?
Actually, as a JD fan myself, I'd rather see Jaedong vs anyone but Flash, since I think JD's got to be favored against anyone else.
On April 01 2012 11:49 Ideas wrote: i think for any Jaedong fan, anything but a jaedong vs flash finals will be a disappointment.
Honestly if finals doesn't feature a combination of flash/fantasy/stork/jaedong, I would rather it be 2 players that haven't made the OSL final before.
Exactly. I either want a JvF (Flash or even a Fantasy rematch) or maybe JvStork. I do not want a Flash v [irrelevant]. If we can't have JvF, may as well have 2 out of the wild, like Alone v Grape. Epic.
If you're a Jaedong fan, why wouldn't you want Jaedong vs. anyone if you can't have Jaedong vs Flash/Fantasy/Stork?
Actually, as a JD fan myself, I'd rather see Jaedong vs anyone but Flash, since I think JD's got to be favored against anyone else.
because if you don't beat flash in the final it doesnt count!
one last battle of FvJ would be OH so epic. So would be fantasy vs jaedong. And then my heart tells me it is stork who should win this. Im also a Soulkey fanboy tbh. Damn this is hard, im sooo pumped
I'd like to see Hyuk vs Flash (those games are always full of lulz), Stork vs Fantasy, Jaedong vs Flash or maybe... Baby vs Killer? Hahaha Seriously, I hope Stork takes this. I also hope Hiya makes it far into the tournament, if he's supposed to be playing..?
On April 01 2012 11:49 Ideas wrote: i think for any Jaedong fan, anything but a jaedong vs flash finals will be a disappointment.
Honestly if finals doesn't feature a combination of flash/fantasy/stork/jaedong, I would rather it be 2 players that haven't made the OSL final before.
Exactly. I either want a JvF (Flash or even a Fantasy rematch) or maybe JvStork. I do not want a Flash v [irrelevant]. If we can't have JvF, may as well have 2 out of the wild, like Alone v Grape. Epic.
If you're a Jaedong fan, why wouldn't you want Jaedong vs. anyone if you can't have Jaedong vs Flash/Fantasy/Stork?
Actually, as a JD fan myself, I'd rather see Jaedong vs anyone but Flash, since I think JD's got to be favored against anyone else.
because if you don't beat flash in the final it doesnt count!
Hahaha ... actually, all the title-holders who have kind of become average after winning seem to have been in finals against non-TBLS - I still have to remind myself that Calm has a title.
On April 01 2012 11:49 Ideas wrote: i think for any Jaedong fan, anything but a jaedong vs flash finals will be a disappointment.
Honestly if finals doesn't feature a combination of flash/fantasy/stork/jaedong, I would rather it be 2 players that haven't made the OSL final before.
Exactly. I either want a JvF (Flash or even a Fantasy rematch) or maybe JvStork. I do not want a Flash v [irrelevant]. If we can't have JvF, may as well have 2 out of the wild, like Alone v Grape. Epic.
If you're a Jaedong fan, why wouldn't you want Jaedong vs. anyone if you can't have Jaedong vs Flash/Fantasy/Stork?
Actually, as a JD fan myself, I'd rather see Jaedong vs anyone but Flash, since I think JD's got to be favored against anyone else.
because if you don't beat flash in the final it doesnt count!
Hahaha ... actually, all the title-holders who have kind of become average after winning seem to have been in finals against non-TBLS - I still have to remind myself that Calm has a title.
yet everyone forgets that mind beat bisu in the msl 5 years ago
The biggest problem with the simulation is the rapid change in skill level even in a month to month basis. Even though a large sample of games was used for the win rates, it doesn't correlate extremely strongly with current win rate. (e.g. I'd say Flash all match ups right now would be above 70%, maybe a whole 10% higher)
How can anyone root for flash to win? Its like rooting for Apple to make more money. They're probably going to do it anyway, so there's no surprise or excitement...
I'm rooting for JD all the way! And Zero too I <3 him so much Anyone have any sort of confirmation on whether Hiya is really going to play? I thought he retired
But seeds don't matter beyond the Ro16 anyway (actually, do they now randomize Ro16 too? You used to be able to pick them A picks B who picks C who picks D. All A's were previous semifinalists from last season). Quarterfinals are randomly assembled, so at that point, there's no advantage to having a higher seed, as you will not be guaranteed to play a much lower seed, or even a lower seed at all.
I like that you're having fun with statistics, and OSL hype is good, but I don't feel like a statistical analysis offers much insight when compared to our own empiricism. Your simulations show that Flash, Jaedong, and fantasy are favorites. Well, none of us had to run any simulations to tell you the same thing.
For the Ro16 group selection I followed the description in the ultimate OSL FAQ. In my implementation the selection criteria is going according to current ELO rank, where the player who chooses a person for his group always picks the one with the worst ELO ranking of the ones who are available. It's not perfect, but it should give a decent spread I think.
I'm not really claiming that this will offer a lot of insight - the reason I'm doing it is because it's fun. I do believe this type of simulation to be rather powerful however, if you remember the limitations. Even if no model will ever be a perfect replicate of the real world, this type of simulation can take in a lot more parameters than analytical calculations. Also it's easy to see how certain parameters affect things like chances of winning an OSL, within a particular model. You just have to keep in mind that there is no reasonable way (in this case) to know how close different models really are to reality. As much as we'd all love to, it's not like we can replay this coming OSL a thousand times and see what happens.
FWIW, the usual term is "statistical anomalies", but I like your phrasing here much better!
Lol, anomaly was definitely the word I was going for. I agree that abomination is strangely fitting in this context anwyay though
The biggest problem with the simulation is the rapid change in skill level even in a month to month basis. Even though a large sample of games was used for the win rates, it doesn't correlate extremely strongly with current win rate. (e.g. I'd say Flash all match ups right now would be above 70%, maybe a whole 10% higher)
I agree. That's why after a lot of copy-pasting from TLPD I now weighed the win rate statistics for each player according to a (in all honesty fairly arbitrary) function that puts a lot more emphasis on recent games. See the updated OP for further results!
I also apologize for having made a mistake in the description of the winning process. The simulations were done according to the following algorithm:
1. Denote by p1 player 1's winrate vs player 2's race, and by p2 player 2's winrate vs player 1's race. 2. Generate two uniformly distributed random numbers u1 and u2. 3. If p1*u1 > p2*u2, then player 1 wins. Otherwise player 2 wins.
This puts a bit more emphasis on winrate when compared to the other algorithm I described before, where player 1 wins if u1 < p1/(p1+p2). Which method is more realistic? No idea. It actually does make a really big difference for the results though.
Whoever make it to the finals, I am looking forward to watching this OSL. Not just the finals, but the entire tournament. IT IS GOING TO BE AMAZING. AAAAAH, OSL, FUCK YEAH.
On April 02 2012 03:12 1Eris1 wrote: Well...with bisu and best out I'm pinning my hopes on reach and Hyuk. Could totally see fantasy 3-0ing both flash and jd to take it though
Looking forward to this. The OSL, and the pro scene in general will end in a whimper as KT and Flash win. In the end, boredom, cheese, and terran prevail. An undignified, yet at the same time fitting, end to the game.
On April 02 2012 03:12 1Eris1 wrote: Well...with bisu and best out I'm pinning my hopes on reach and Hyuk. Could totally see fantasy 3-0ing both flash and jd to take it though
Reach is retiring though :T
But the OSL started before proleague finals fhurdkfijh...wow this is lame
On April 02 2012 05:02 Lightwip wrote: Looking forward to this. The OSL, and the pro scene in general will end in a whimper as KT and Flash win. In the end, boredom, cheese, and terran prevail. An undignified, yet at the same time fitting, end to the game.
No, it cannot be so. Jaedong will win this OSL. If this is to be the last one, it will be the final ascension of the Tyrant, asserting the final dominance of the Swarm.
On April 02 2012 05:02 Lightwip wrote: Looking forward to this. The OSL, and the pro scene in general will end in a whimper as KT and Flash win. In the end, boredom, cheese, and terran prevail. An undignified, yet at the same time fitting, end to the game.
No, it cannot be so. Jaedong will win this OSL. If this is to be the last one, it will be the final ascension of the Tyrant, asserting the final dominance of the Swarm.
It's nice to dream. But you'll only be that much more sad when the worst comes to pass.
It looks like you used the winrates based on the careers of of the players, instead of the win rates in the last year or two. Like Jaedong isn't even on the most recent TL power ranking and has been playing pretty "meh" as of late but you have at the second or third most likely person to win in each of your simulations because you used the career winrates instead of the recent ones. As much as I love Jaedong, I think that that is overestimating his chances.
flash is always a favorite no matter what. bo3, bo5, bo7, hes unbeatable. you'll get close, and force the final set, but in the end, hes not gonna lose. bo1, yes maybe. you've seen him lose those games to the toss players. however, hes like a tiger woods, his mental game is absurd and he wont budge from it. put him vs dear boX (besides 1) FlaSh would crush dear.
Names doesn't matter unless its good series, everyone can look down on GGplay and Iris saying they are not STATISTICAL monsters but they produced probably the best / one of the best OSL finals ever.
On April 02 2012 07:39 FlaShFTW wrote: flash is always a favorite no matter what. bo3, bo5, bo7, hes unbeatable. you'll get close, and force the final set, but in the end, hes not gonna lose. bo1, yes maybe. you've seen him lose those games to the toss players. however, hes like a tiger woods, his mental game is absurd and he wont budge from it. put him vs dear boX (besides 1) FlaSh would crush dear.
Too bad tiger couldn't even resist his sexual feelings. Listen up, bribe Flash with hookers!
On April 02 2012 07:29 Jonas wrote: It looks like you used the winrates based on the careers of of the players, instead of the win rates in the last year or two. Like Jaedong isn't even on the most recent TL power ranking and has been playing pretty "meh" as of late but you have at the second or third most likely person to win in each of your simulations because you used the career winrates instead of the recent ones. As much as I love Jaedong, I think that that is overestimating his chances.
Yeah, in the first set of simulations, the careerwise win rates were used. In the second set of simulations, as I tried to explain, I multiplied all wins with weights based on how long ago they were played. The weights become smaller the longer it was since the game was played, according to the following formula:
w(t) = 0.5 - (pi/2)*arctan((t-100)/100)
The weights for a few values of t, which is measured in days:
So to be perfectly clear, if you lost 7 games last year and won one game yesterday, your win rate with these weights would be roughly 50% + Show Spoiler +
1*75/(1*75+7*0.11) ~= 0.5
With these weights, most old progamers (in my list of 75) have more than 50% of the influence of their winrates based on games played the last year or so. For Jaedong, perhaps surprisingly, the win rates are still impressive:
Versus terran, with no weights, Jaedong has 199 wins and 116 losses - a 63% winrate. With the weights, it becomes 9.99 wins and 6.18 losses - a winrate of almost 62%. + Show Spoiler +
Jaedong slump? LIES!!
Similar numbers for vs Zerg and vs Protoss.
The weight function is of course arbitrarily chosen, but I did play around with the parameters a bit to get something that looked "reasonable". I want the recent games to count a lot, while still taking into consideration past merits to some degree. The weight function is first reducing rather slowly, then faster and faster. After 100 days it's dropping the fastest and after that the reduction is slowing down (the derivative is first growing, then reaches a peak at 100, then it's shrinking towards 0).
Even though Jaedong is not looking as strong as he once did, he is still one of the best players. And I'm not just saying that because I'm a fan. The power rank, while perhaps being good for other things, is not a good indicator of who's currently the best player. It's based too much on subjective opinions, and is also quite short sighted in that it only looks at the past month's achievements.
On April 02 2012 07:40 a176 wrote: you should do one of these for jin air, and then compare to actual results
Hey, that's actually a good idea. Challenge accepted!
I'll try first with the win rates that I have now, because it would take some time to update the win rates to correspond to what they were before that league started. I'll try to think of a way of updating the winrates automatically though. Time to transfer this project to Python perhaps...
Lets be honest here, if it is a JvF flash is going to win. Flash losing in a BoX in TvZ, its just not going to happen. As a massive JD fan I don't want a repeat of 2010. The best possible fiinals would be jaedong vs P, because I feel like if JD does some serious ZvP practice, he won't lose to anyone.
@Reuental, while I would put my money on Flash vs anyone at this moment in time, you are underestimating Jaedongs chances imo. As I just mentioned in another thread, Jaedong has won 4 of the last FvJ games. I mean, this game here: http://www.teamliquid.net/tlpd/korean/games/64421_Flash_vs_Jaedong/vod
This game represents the epitomy of standard solid build orders when it was played. Jaedong wins so convincingly and so quickly I was reminded that JD really is a monster, and I think some people on this site are being overly dismissive of his chances vs Flash. Again, just highlighting both sides of the argument.
On April 03 2012 08:32 CardinalAllin wrote: @Reuental, while I would put my money on Flash vs anyone at this moment in time, you are underestimating Jaedongs chances imo.
I agree. Flash is no doubt the better player right now, but it's not like he wins 100% of the time.
My simulations say that Flash wins against Jaedong 60% of the time. In a Bo5 game, that's about 70% chance to win for Flash.