|
Hey all,
I'm working on my master's degree in statistics. I've wanted to do a project that was Starcraft related. So, for my semester project in my Bayesian Methods class, I thought I'd do a ranking system for the GSL players. I hope to expand on this later, but right now it just takes all the brackets, the win/loss numbers with the player names, processes them using Bayesian magic (I used a Bayesian hierarchical model with binomial prior, p = inverse logit link with 2 player skill parameters, and player skills are distributed Normal(0,sigma^2). I'll post the technical write-up later if anyone is interested), and spits out a skill parameter which can be translated into a probability statement about who will win in a game.
A lot of players didn't play enough games to be able to estimate their skill with very much confidence, Jinro and Choya are examples, but players with high uncertainty are pulled towards the group average.
Forgive the formatting.
Probabilities of each of the top 4 taking the advancing to the next round and also advancing and then winning the whole tournament (Based only on GSL Seasons 1-3 data): + Show Spoiler + Remember these are only based on the data and are based on the chances of each player winning a Bo7 match against each of these other players. You should look at these predictions and say to yourself, "Those are almost all the same, based on just the data, this GSL could realistically go to any one of these 4 players." .
Names ProbWinNext ProbWinFinal 1 Rain - 0.5134 - 0.2443 2 HongUn - 0.4865 - 0.2256 3 MC - 0.4402 - 0.2183 4 Jinro - 0.5597 - 0.3115
Edit : Fixed Coding problem that made people with high variance skill's go incorrectly high. Top Player Rankings (Final skill chosen to be mean - 2 * std. error)
Rankings on Google Spreadsheet
How to interpret the skill parameters :
The actual number is not important, only the distance between the numbers.
+ Show Spoiler +To get the probability that one player will beat another, the formula is exp(skill1)/(exp(skill1)+exp(skill2)) where skill1 is the skill of the winner and skill2 is the skill of the loser.
So if you wanted to calculate FruitDealer vs NesTea and their skills were 1.48 and 1.32 respectively, calculate exp(1.48)/(exp(1.48)+exp(1.32)) = .5387. Meaning FruitDealer has a 53.87% chance of beating NesTea in a single game. This is not exactly how I did it, but this is a simple approximation.
If this is interesting to anyone, I'd happily provide more information.
I can generate hypothetical match-ups of anyone I have good data for (Code S players I have pretty good data on as a rule of thumb).
Disclaimer + Show Spoiler +Take the predictions and rankings with a grain of salt, they are only preliminary and will in crease in accuracy over time. They are only based on GSL data and they also treat all seasons equally, assuming no skill change between seasons (not an assumption I wanted to make, there just isn't enough data). This makes the data more useful as a ranking tool than a prediction tool at this point. There also are other tournaments that I could potentially enter in, but gathering the data and getting it properly formatted is proving to be a pain in the butt.
Future Work + Show Spoiler +I have additions I'd like to add to this analysis. If I can get some help with gathering and formatting data, I could adjust for and calculate race advantages or how much of an advantage a certain map gives to a certain race. Also, if I could just get the quantity of the data to increase, I could do a lot more calculation. I'd like to add in more tournaments besides the GSL. Using just the GSL is great to have a fair localized ranking system, but to predict future games, it's best to have as much data about each player as possible.
If I had hundreds of times more data, I could calculate, based on win/loss data, which build orders, strategies, transitions and such are more effective vs which others.
Right now my data looks like this :
NesTea 2 Jys 1 Vines 1 Sen 2 Goma 2 JookToJung 0 Maka 2 Sleep 0
etc.
No race information, no map information, nothing, I just copied it straight from Liquipedia and then organized it for analysis to look like this:
NesTea Jys NesTea Jys Jys NesTea Vines Sen Sen Vines Sen Vines etc.
After that it was just a matter of coding up the model and interpreting the results.
So leave comments if you found this interesting or want to know more or whatever. If you want help implementing something similar, pm me. If you can think of some interesting Starcraft related questions that can be answered through data, post them and I'll see about looking into it. Also, if you are interested in helping me find and/or format data pm me.
|
Very cool stuff. I'd be one of those people interested in the technical write-up.
|
Well I'm no stats or math wiz but I think the fact that NEXLiveForever is 3rd on your rankings list means that your system could be improved a lot, clearly some weighting should be given to more recent results, and possibly qualifiers should be taken into account as well, as someone like Rainbow who can qualify for 3 GSLs and make a semi final and a finals appearance is clearly better than someone like NEXLiveForever who was only able to only qualify once and made one semi, or OGSTop who made one Ro16 and was unable to qualify after that.
|
On December 08 2010 18:53 Wargizmo wrote: Well I'm no stats or math wiz but I think the fact that NEXLiveForever is 3rd on your rankings list means that your system could be improved a lot, clearly some weighting should be given to more recent results, and possibly qualifiers should be taken into account as well, as someone like Rainbow who can qualify for 3 GSLs and make a semi final and a finals appearance is clearly better than someone like NEXLiveForever who was only able to only qualify once and made one semi, or OGSTop who made one Ro16 and was unable to qualify after that.
Didn't LF skip the second and third season to focus on school?
|
I agree with your concern completely Wargizmo. I saw NEXLiveForever getting very highly ranked, and was like, "wtf". He's boosted, probably unfairly, by having taking out NesTea in his first set. The model, as it is, treats all games equally, regardless of round played in or which season it is in. At this point, however, with only the GSL data, trying to add in a time-effect isn't feasible. There just isn't enough data.
The biggest issue is cross-over, the single elimination bracket format is a statistical nightmare because it doesn't give you a lot of information how players would be doing against other opponents. A round robin into brackets or even double elimination would be so much better for an objective analysis.
Time effects are something I definitely have in mind for future use. I mean, it's pretty clear that a year from now, no one will care what happened in GSL Season 1 as far as predictions are concerned.
As far as fairness, at very worst, my ranking is as bad as the GomTV rankings with their arbitrary point system that doesn't take into account how difficult your bracket is. My ranking system is almost entirely based on the strength of your opponents in how much a win effects your ranking. With more data, this bayesian approach will be infinitely more fair and more reliable than a simple point system.
|
Yeah, I was just looking at the brackets again on LiveForever and Top, they are sandwiched at a high ranking by who they beat and who they lost against.
Top beat Polt, who beat MC, who is getting very highly ranked by the current season, which drags Top up. Then he lost to Fruitdealer, which pushes him back down.
LiveForever knocked out NesTea, and lost to FruitDealer. Ro4 is nothing to scoff at though, he deserves to be fairly high ranked.
I'm sure if I had more data on these guys, they'd get dragged down somewhat. But LF took out bigger names on his way to Ro4 than did Jinro, for example. I think this is a positive thing of the ranking algorithm, and more data will sort out any strange results.
|
You should use the stats to vote for your next gsl liquibet and see how it goes. Cool stuff.
|
Interesting stuff, keep improving it
|
Liveforever didn't come to GSL2,3.
|
On December 08 2010 18:53 Wargizmo wrote: Well I'm no stats or math wiz but I think the fact that NEXLiveForever is 3rd on your rankings list means that your system could be improved a lot, clearly some weighting should be given to more recent results, and possibly qualifiers should be taken into account as well, as someone like Rainbow who can qualify for 3 GSLs and make a semi final and a finals appearance is clearly better than someone like NEXLiveForever who was only able to only qualify once and made one semi, or OGSTop who made one Ro16 and was unable to qualify after that. Yeah that's the most glaring flaw so far. I mean Rain as 5th best? - -
Standardizing the scoring for someone who fails to qualify also has big problems (like for Tester who got knocked out in a qualifier by Foxer).
May have to separate it by season until there is more data.
|
u rly got nothing better to do
User was warned for this post
|
On December 08 2010 19:19 Mip wrote: Yeah, I was just looking at the brackets again on LiveForever and Top, they are sandwiched at a high ranking by who they beat and who they lost against.
Top beat Polt, who beat MC, who is getting very highly ranked by the current season, which drags Top up. Then he lost to Fruitdealer, which pushes him back down.
LiveForever knocked out NesTea, and lost to FruitDealer. Ro4 is nothing to scoff at though, he deserves to be fairly high ranked.
I'm sure if I had more data on these guys, they'd get dragged down somewhat. But LF took out bigger names on his way to Ro4 than did Jinro, for example. I think this is a positive thing of the ranking algorithm, and more data will sort out any strange results.
That makes sense, but having said that, there still needs to be some weighting for recent results, to account for a player improving over time. NesTea for example had only just switched over to Sc2 in GSL1 and if you watched those games against LiveForever you would see that he was an absolute newbie at the game back then.
Also I don't see why a player's performance in in GSL3 would determine his rating for a game that occurred in GSL1 in the first place, that seems kind of silly to me. I mean if someone else LiveForever beat ends up winning a GSL then you could potentially have this guy rising up the rankings even further without even playing a single game.
|
|
Could you give us the standard error of your probabilities ? I'd like to know if they're relevant or not, obviously you don't have a lot of data with just GSL 1-3.
I like the idea though !
|
you have a high chance of winning the LB for gsl4! lol
|
My background: I am studying mathematics to diploma for 12 semesters now. To be honest, I think your model should imply matchup deviation, therefor needs to be at least 3-dimensional. Of course, considering every map would be even better but then the sample pool will by far not return you any valueable information. But unless you at least concern matchups I fear the model is too theoretical for any substance. I sorted around the issue aswell when I wanted to do some Starcraft manager toy and wanted to use winning probability to generate match results.
Adressed to the point of your transititive conclusion (A beats B beats C), you can't nevertheless your amount of data, because as you already figured out with your approach, matches can be interpreted as weighed coinflips, but you cannot achieve any kind of transitive ordering in a competition (which is good, because thats the point of competition after alll *g*)
Nice stuff though, I love mathematical approaches to this.
Edit: If you want to generate forecasts, you should consider reading the book from the guy that wrote that baseball statistics book and developed the "on-base-percentage" stat. I can for heavens sake not recall his name, i'll look after it when i'm home, you americans though might know him right away, he was consulted by the Red Sox before winning MLB.
|
What is the error for the probability that you generated? Its not something ridiculous like +-0.5 right?
|
|
|
I know trolls should not be fed but seriously, if someone put work and time into something concerning SC2 he should be credited and not patronized. You are the sad fuck with no life flaming people like him.
|
I'm should be doing matlab-code for a Pattern Recognition course hand-in (with Hidden Markov Models, EM-maximization and Bayesian Learning stuff), but i'm browsing TL.net instead. Win
|
In my opinion it is rather unlikely that the "best" players have won GSL1 and GSL2, as the winpercentage for the favorite in a match is often only at like 60 or sometimes 70%. This can only really be solved by having a larger samplesize, and by the game evolving and "luck" playing less of a role. Right now i think that you have illusions about your ability to forecasts accurately with the model you built.
Other flaws have already been pointed out by others: you forget that players got better over time, yet you raise their rating for past games using their future results...
So all in all, it might be a good practise for your studies, and maybe you can summarize the past (although, when looking at the ranking your model produced, im not so sure about that ). But please, dont use the model to try to make projections for the future, because i have a feeling that if i randomly assign winpercentages between 50/50 and 70/30 to the players, my "accuracy" would probably not that much lower than the one of your model. If the accuracy of a model is not much better than just randomly putting in numbers, maybe its not yet time to use it for projections.
|
I think this write up should be taken as it is. An approximation of players skill levels based on who has beaten who with a limited sample size. Its an approximation nothing more. Even the OP said that the players with not alot of data were driven towards the average, however, players like Davit, who weren't spectacular in their games in GSL1, would be alot below the average, thereby pulling others up.
I actually like this analysis, and would be interested in more of the theory crafting behind this write up.
Nice work
|
On December 08 2010 21:38 DarKFoRcE wrote:So all in all, it might be a good practise for your studies, and maybe you can summarize the past (although, when looking at the ranking your model produced, im not so sure about that  ). But please, dont use the model to try to make projections for the future, because i have a feeling that if i randomly assign winpercentages between 50/50 and 70/30 to the players, my "accuracy" would probably not that much lower than the one of your model. If the accuracy of a model is not much better than just randomly putting in numbers, maybe its not yet time to use it for projections.
The nice thing about mathematical models is that they do not claim to be truth, and interpretation oblies the reader (a fact that economist forget all the time). So when you don't think the model will be accurate enough (which I dont either), there is no need to trust it :-) After rethinking you're model, you should at least if you don't seperate matchups seperate the map score into 3 different sets. Because if you don't, if someone wins a tournament he gets 7 sets of 1.0 probability samples which hurts the reality by quite a measure. On advantage, weighting with the probabilities (ELO like) would be a possible improvement.
|
I don't honestly know how you can model the probability of the players, it just blows my mind how complex putting a value on a player could be. It would says nothing about a winning strategy or the countless variables of real day events but seems to me that this system focuses more on averaging out past performance which following a market or a horse in its career is no guarantee. and even more sporadic the lesser the data. I suppose its a better guide than anything but I'm convinced this method would in itself require a probability of being right.
|
On December 08 2010 22:20 aka_star wrote: I don't honestly know how you can model the probability of the players, it just blows my mind how complex putting a value on a player could be. It would says nothing about a winning strategy or the countless variables of real day events but seems to me that this system focuses more on averaging out past performance which following a market or a horse in its career is no guarantee. and even more sporadic the lesser the data. I suppose its a better guide than anything but I'm convinced this method would in itself require a probability of being right.
You would be surprised. There are several professional booking companies in the UK that have specialized on betting on football matches. Their model does only incorporate past match data and does hit almost 90% for win tendencies, which is unbelievably high for football. The model is secret for obvious reasons but german journalist Christoph Biermann wrote a book about it.
|
You know what, most major tournament results in the West are entered into Liquipedia - you could probably get something much more interesting out if you took data from US and European tournaments and compared foreigner progamers. At least there would be much more data on different players, and these tournaments often have group stages. I'd suggest starting with a sample of the largest tournaments - maybe MLG, IEM, and Dreamhack - and see where that gets you. Maybe if someone doesn't have finals they could gve you a hand and compile results into whatever format your program takes as input.
|
Interesting read.
I cracked a smile @ TSL_Rain havin a higher % than Jinro though,
|
rain knows very well terran metagame see nexgenius/nestea games
|
stupid question, but if youre messing around with statistics shouldnt you write some words about the level of significance of your so called predictions?
i absolutely dont want to say that you did no nice work here, but i have to go with the common opionon in this thread and say that your rankings seem to be utter bullshit.
PS: could you please add 12 more ppl to your rankings, i would love to see whom you would send to code S league! ( http://www.teamliquid.net/forum/viewmessage.php?topic_id=170463 )
|
To all griefers, you're right.
To all wanting errors associated with predictions, I have them, the reason I didn't include them is purely because of not knowing how to make tables to post on these forums. I didn't want to dump too much unformatted data. If someone could give me a quick how-to on tables and images for these forums, I can make this more readable. I'm planning on doing another one of these for the finals, and I'd like to make it much more readable.
The methodology is sound, and yeah, there should be a time effect in the data, but it's really just not feasible right now with the limited sample size. This makes predictions highly volitile
The original post I really just threw together in the middle of the night while I was working through my output. I'll have some more time after this weekend to do a proper write-up in excruciatingly more detail.
To Heimatloser, the predictions are based on draws from the posterior predictive distribution and significance doesn't mean very much in Bayesian statistics. Uncertainty about skill level is averaged out in the calculations. I will tell you that the variance associated with the predictions is large and I reported only the average. The results you see are highly sensitive to additional data, which as I've said repeatedly, is the biggest problem I have.
To strongandbig, the problem with just adding a lot of non-korean tournaments is it adds a lot of more players where the biggest weakness of the model is not enough information on the current players. But I'm totally up for exploring more tournaments, and I want to go in that direction.
If there's anyone out there that wants to help me find data, that would be awesome. Oh and thanks to whomever suggested I use my predictions on Liquidbets, that is actually an amazing data source because they have results + races + maps. If there's anyone out there that's really good at parsing html code or just the realizations on the webpages, I could really use help with that. Copy-pasting the brackets works pretty well, but it doesn't grab the race information, and the map information is not there at all.
|
Awesome job, TS. I use live automated trading models to trade my accounts and these rely almost solely on historical data. Methods like this can be amazingly powerful once you get enough data to make things significant. Nice to see people applying stuff like this to the things they enjoy.
|
I think your model is sound enough. As stated some kind on time factor should be added... but you don't have enough data for that. In reality I think thats why there are such messed up stats. It's a matter of the amount of data that you have. I think a similar model (with some kind of time variable) could be very accurate once we have more games to base scores off of. Then when a lower ranked player take a series off of a highly ranked player it will make less of a difference because your lower score is based off of say 30-40 games instead of 5-10.
Although to be realistic, historical data is only semi-valid in any sporting event (ie. upsets happen all the time).
|
The only problem is that you did not include qualifiers.
|
|
On December 09 2010 01:51 Centorian wrote: I think your model is sound enough. As stated some kind on time factor should be added... but you don't have enough data for that. In reality I think thats why there are such messed up stats. It's a matter of the amount of data that you have. I think a similar model (with some kind of time variable) could be very accurate once we have more games to base scores off of. Then when a lower ranked player take a series off of a highly ranked player it will make less of a difference because your lower score is based off of say 30-40 games instead of 5-10.
Although to be realistic, historical data is only semi-valid in any sporting event (ie. upsets happen all the time).
I would argue historical data is fully valid, and the very best measure to take when predicting future results. But no statistical model is perfect, and it will be wrong some of the time, especially when there's not a lot of data available. Just because there are upsets doesn't mean the favorite won't win most of the time, so if you had to make a prediction, unless there is some highly convincing information external to the model, it would make sense to bet on the favorite at 1:1 odds.
|
This will really be better once the other seasons go by to get more data, but as an analysis project it's very cool. Since there has been a huge difference in skill between season 1 and season 3(look at Nestea's play for example), this won't be able to predict properly, yet.
MLG would probably be a good, since it has double elimination and more tournaments to pool data from.
|
On December 08 2010 22:24 kazansky wrote:Show nested quote +On December 08 2010 22:20 aka_star wrote: I don't honestly know how you can model the probability of the players, it just blows my mind how complex putting a value on a player could be. It would says nothing about a winning strategy or the countless variables of real day events but seems to me that this system focuses more on averaging out past performance which following a market or a horse in its career is no guarantee. and even more sporadic the lesser the data. I suppose its a better guide than anything but I'm convinced this method would in itself require a probability of being right. You would be surprised. There are several professional booking companies in the UK that have specialized on betting on football matches. Their model does only incorporate past match data and does hit almost 90% for win tendencies, which is unbelievably high for football. The model is secret for obvious reasons but german journalist Christoph Biermann wrote a book about it.
The difference between football and Starcraft is variance, especially in SC2. Football teams have a lot of players, so the impact of one players having a bad/good day is relatively low compared to a team of one. If the solo player has a bad/good day, it skews the results immensely. Also, football teams have faced each other many times in the professional arena, so there is a lot more data to draw upon. SC2 is also a new game with evolving strategies and nobody is at the top level yet, making the data even more inconsistent. Finally, I don't believe the formula accounts properly for player skill difference. In SC2, a player who is just slightly better than another will almost never lose on a favorable map, even though the data says it's 60/40.
I think it's a good effort, but I don't believe there is any formula that can rate SC2 players right now with any degree of accuracy. This would be better applied to BW where the data, players, and maps are more consistent.
|
@Mip, I am actually very interested in this topic and helping to put together data. However, I think a very useful first step would be to develop a ranking system similar to the one used by the Association of Tennis Professionals (ATP). See: http://en.wikipedia.org/wiki/ATP_Rankings
The ATP uses a 52-week cumulative, rolling point system that awards points by finishing place at each qualifying tournament. However, more important tournaments are awarded more points, and the weights are approximately determined by the financial rewards from the tournaments. That is, the potential financial gain determines the "level" of the tournament, but other feedback from the community could be used as well.
I think a similar system would work well for SC2 because it and tennis share some (generally) common characteristics: 1) 1 vs. 1 tournaments with elimination brackets 2) Different surfaces (grass, clay, hard court) and conditions (day,night, hot, cold, windy) are approximate to different races and maps in SC2 3) Not all players play in all the tournaments
In the same way that the "Upcoming Events" leverages the community's knowledge, one could develop database or results form where the community could help submit tournament results.
Obviously, all ranking systems have their pros and cons, so only constructive criticism please! Finally, this ranking system (or an analogous one) can then inform the prediction models.
|
Following-up my previous post: Obviously, since there are so many small SC2 tournaments, some subjective measures need to be used to build the pool of tournaments that would in an ATP-style ranking system.
An initial list includes: GSL(s) MLG(s) IEM(s) Dreamhack Blizzcon
I am sure there are others that would be good to also include, but I think a really good place to start would be the Blizzard-licensed events (all of the above are included I think). Does someone have a comprehensive list of the official Blizzard events? Other thoughts?
|
As an economist I understand all too well how frustrating it can be to have assumptions block the practical implementation of your model, so I definitely feel you there. 
I don't know if you've thought of this yet, but if you want to test the validity of your model, why not use SC1 data instead of SC2? Yes, the differences between SC1 and SC2 will introduce more uncertainty into your model, however the wealth of data points you'll gain from it might be worth it. Just a thought.
|
My problem with a point system is that it doesn't take into account the skill of the players you play against. The base of the system I used is the same as the ELO system that is used for Chess ranking except that I took a Bayesian approach.
The rankings that I posted are the posterior means of the skill parameters - 1 standard deviation. It's somewhat arbitrary. If I made it 2 standard deviations, you'd see players like LiveForever drop down a lot. For people like FruitDealer and NesTea, there are a lot more games used to estimate their skill, so if you penalize uncertainty, players who have played a lot of GSL games will float to the top.
Here's a google spreadsheet of the full ranking results : GSL Ranking Results
|
Updated rankings in original post, see if you find them more agreeable. I think most will.
|
I totally agree with you that a point system with random tournament seeding does not tell you very much. However, large elimination tournaments with huge skill differences between players like GSL and MLG seed their tournament brackets. (Note: Just like in tennis these seedings can be independent of the ATP-style rankings and it won't change the story.) These seedings will get better over time and thus reduce the luck aspect. In the end, good players will get farther in tournaments more often, and thus accumulated more points in a point system. Thus, a point system with NON-random tournament seeding should be a good approximation of skill given the sparseness of games compared to all the possible 1v1 match-ups.
Again, I agree that a point system cannot account for everything, and a richer model would be preferable. All I am saying is that a point system can be informative.
|
Oh, I totally agree. I think seeding is always valuable so as to maximize the opportunity to gather data from the players. I see a point system as an approximation to dynamic Bayesian system however. It's not that it doesn't work or that it's not valuable, but the Bayesian approach just lets the data inform the rankings entirely, where a point system is only informed by the round reached.
For example, in the GSL Season 2, FruitDealer lost to MarineKing in the Ro32. By the point system, FruitDealer lost out on a lot of points by not getting further in the tournament. In the Bayesian model, it takes into account that MarineKing is friggen good, so losing to him isn't really that big of an upset. The point system also gives no way to quantify uncertainty about the players skill.
Realistically, either approach works fairly well, the Bayesian approach is just more dynamic in the way it ranks.
|
Can't access the Google spreadsheet in the OP, just to let you know.
Keep up the good work.
|
However, if is true that FruitDealer and MarineKing are both "good" players, then under a "good" seeding system they would not be meeting the Ro32. For example, no tennis tournament would ever have the possibility Roger Federer and Rafael Nadal meeting in the 2nd round of their tournament, as both are considered good players, and thus the point system per round makes sense for tennis. Unfortunately, SC2 is not well developed enough yet to make clean seedings. In this respect, your Bayesian approach adds value in these early stages of the game, and I would be very curious to see the details of your analysis.
On a slightly different tack, on the State of the Game podcast a couple of weeks age there was a big discussion about MLG's extended series tiebreaker. The crux of the argument centered around whether different rounds of a tournament should be considered different; that is, does defeating someone in the Ro32 mean something different than defeating someone in the Ro8. In my opinion, yes, and thus the ending point of a tournament is important to incorporate. For instance, if MarineKing beat two great players in Ro64 and Ro32, that is good, but not as good as beating them in the Ro16 and Ro8. I believe proper seeding and points for tournament ending spot takes this situation into account.
|
On December 08 2010 19:10 Mip wrote: Time effects are something I definitely have in mind for future use. I mean, it's pretty clear that a year from now, no one will care what happened in GSL Season 1 as far as predictions are concerned.
Here is a paper for accounting for time effects: "Whole-History Rating: A Bayesian Rating System for Players of Time-Varying Strength" http://remi.coulom.free.fr/WHR/
I thought it sounds like a cool concept and I'd like to see it used. On a different game server I play on (KGS - a Go server) they use Bayesian, but to account for time variation they use a simple weight-decay, and it has some strange side effects.
|
To Solon TLG: About the Stat of the Game podcast, my thought is that the only thing that matters is the skill of the players involved. Whether MarineKing beats FruitDealer matters only how skillful they are. I don't think it matters which round they are in. I don't see that being in the round of 32 vs the finals will make a difference. Since they are both comfortable under pressure, I think it's reasonable to assume that the round effects them both in the same way. If that is not true, who is favored? So if neither are favored, we should be able to treat the data as if the round doesn't matter.
To KillerDucky: Thanks for the article. My thought for a time parameter was to have some measurement of the time passed and have the likelihood of past events shrink toward 50/50 as the data becomes older, the past significant upsets will shrink towards non-significance as time passes.
|
is this somewhat like true skill on the xbox?
|
@beat farm: They are both Bayesian approaches... so probably.
|
I think the predictions could be made more accurate, if you take into account the players strength in each matchup. The problem is thatit may require more games to become accurate (as each matchup is only one third of the games, for a random player even worse). Still I think, once enough data is available it would be more accurate to give the players seperate rankings for each match-up.
|
On December 09 2010 02:31 Cel.erity wrote:Show nested quote +On December 08 2010 22:24 kazansky wrote:On December 08 2010 22:20 aka_star wrote: I don't honestly know how you can model the probability of the players, it just blows my mind how complex putting a value on a player could be. It would says nothing about a winning strategy or the countless variables of real day events but seems to me that this system focuses more on averaging out past performance which following a market or a horse in its career is no guarantee. and even more sporadic the lesser the data. I suppose its a better guide than anything but I'm convinced this method would in itself require a probability of being right. You would be surprised. There are several professional booking companies in the UK that have specialized on betting on football matches. Their model does only incorporate past match data and does hit almost 90% for win tendencies, which is unbelievably high for football. The model is secret for obvious reasons but german journalist Christoph Biermann wrote a book about it. The difference between football and Starcraft is variance, especially in SC2. Football teams have a lot of players, so the impact of one players having a bad/good day is relatively low compared to a team of one. If the solo player has a bad/good day, it skews the results immensely. Also, football teams have faced each other many times in the professional arena, so there is a lot more data to draw upon. SC2 is also a new game with evolving strategies and nobody is at the top level yet, making the data even more inconsistent. Finally, I don't believe the formula accounts properly for player skill difference. In SC2, a player who is just slightly better than another will almost never lose on a favorable map, even though the data says it's 60/40. I think it's a good effort, but I don't believe there is any formula that can rate SC2 players right now with any degree of accuracy. This would be better applied to BW where the data, players, and maps are more consistent.
I just wanted to state out that it is possible to build up very good models just on match histories, not that it is in any way comparable, i'm sorry if I didn't point that out enough :-) I totally agree if you that if it should be any accurate, only a very researched game with at least 5 years of history could fit something like that.
|
@Sandermatt Yeah, I would like add something like that. It would take more data (which is already a problem). The way I would do it is have a skill rating for each player, and then an adjustment for the opponents race. Would be very easy to add if I had more data.
|
where are you going to school for statistics?
|
On December 09 2010 07:41 kazansky wrote:Show nested quote +On December 09 2010 02:31 Cel.erity wrote:On December 08 2010 22:24 kazansky wrote:On December 08 2010 22:20 aka_star wrote: I don't honestly know how you can model the probability of the players, it just blows my mind how complex putting a value on a player could be. It would says nothing about a winning strategy or the countless variables of real day events but seems to me that this system focuses more on averaging out past performance which following a market or a horse in its career is no guarantee. and even more sporadic the lesser the data. I suppose its a better guide than anything but I'm convinced this method would in itself require a probability of being right. You would be surprised. There are several professional booking companies in the UK that have specialized on betting on football matches. Their model does only incorporate past match data and does hit almost 90% for win tendencies, which is unbelievably high for football. The model is secret for obvious reasons but german journalist Christoph Biermann wrote a book about it. The difference between football and Starcraft is variance, especially in SC2. Football teams have a lot of players, so the impact of one players having a bad/good day is relatively low compared to a team of one. If the solo player has a bad/good day, it skews the results immensely. Also, football teams have faced each other many times in the professional arena, so there is a lot more data to draw upon. SC2 is also a new game with evolving strategies and nobody is at the top level yet, making the data even more inconsistent. Finally, I don't believe the formula accounts properly for player skill difference. In SC2, a player who is just slightly better than another will almost never lose on a favorable map, even though the data says it's 60/40. I think it's a good effort, but I don't believe there is any formula that can rate SC2 players right now with any degree of accuracy. This would be better applied to BW where the data, players, and maps are more consistent. I just wanted to state out that it is possible to build up very good models just on match histories, not that it is in any way comparable, i'm sorry if I didn't point that out enough :-) I totally agree if you that if it should be any accurate, only a very researched game with at least 5 years of history could fit something like that.
I think you guys are kind of off base, I already have a model that can rate Starcraft players with a decent amount of accuracy with only 400 something games. Is it perfect? No. But it has a lot of strength and will learn as it gets more data.
Statistical models of this sort are not going to ever give very high prediction accuracy. If you take players with similar skills, you are always going to have difficulty predicting the outcome. But to say that I need 5 years of "research" to start making predictions is just absurd.
As for this model and map imbalance, this model averages over all maps. It's primary function is the rate the players objectively based on their performance, which I believe it does quite nicely. If you want to optimize this for prediction, which I believe there is enough data out there that we could start, we need to pull together more data, which I would like help with if there is anyone out there good at parsing webpages.
Like I said in the original post, my data look like this [2343,] "MC" "MarineKing" [2344,] "MC" "MarineKing" [2345,] "MC" "MarineKing" [2346,] "Jinro" "Choya" [2347,] "Jinro" "Choya" [2348,] "Jinro" "Choya" [2349,] "Choya" "Jinro" [2350,] "Choya" "Jinro"
It actually starts out like this :
MarineKing 1 MC 3 Jinro 3 Choya 2
and then I convert it.
If instead I could get my data to look more like this: MC Protoss MarineKing Terran Lost Temple MC Protoss MarineKing Terran Blistering Sands MC Protoss MarineKing Terran Jungle Basin
I could then start adjusting for those kinds of things. There should already be enough data to start something like this. So long as I have more data than the effective number of parameters that I'm trying to estimate, I can do it no problem.
|
@PROJECTILE I'm going to school at BYU in Provo, UT. They have a pretty good statistics program, but no PhD option, they stop at Master's degrees.
|
On December 09 2010 15:58 Mip wrote: Statistical models of this sort are not going to ever give very high prediction accuracy. If you take players with similar skills, you are always going to have difficulty predicting the outcome. But to say that I need 5 years of "research" to start making predictions is just absurd.
As I said, yes statistic models of this kind are able to give very high prediction accuracy. I didn't say yours will yet, and I think if you keep the work up, yours will in about 5 years, or lets say 2 years. That is at least what I meant. To provide high accuracy for the complete outcome of a tournament, and very reliable predictions, you need a huge amount of data to weight, on the one hand.
Why I said 5 years was: if you knew every result of the SC2 players right now to base your assumption on, or every result of the BW players, you would highly likely choose the Broodwar players to predict, because the game is far more figured out, so your variation is narrowed down by far, because no every week a new cheese appears.
You can start making predictions whenever you want, but if you want to hit +95% over a total GSL (every game) just based on a statistical model, I think you will have to rely on 5 years of tactic development and 2 years of data :-)
I didn't want to spoil your fun, I love your work and totally appreciate it.
|
Interesting and fun project, though, as you said, you don't have enough data to actually make that strong of predictions. As others have said, you probably need to include a time factor as well.
|
@Kazansky Small variance and prediction accuracy are not the same thing in this kind of model.
Each player has an unmeasurable skill parameter, that we can get glimpses of when they win or lose. So the more wins and loses I observe, the more I can nail down exactly what a player's skill parameter is. Over time, I can hope to achieve a fairly high precision with many player's skill levels.
But knowing a player's skill is only the parameter that feeds my function that tells me the probability that a player will win, which from the first post is exp(skill1)/(exp(skill1)+exp(skill2)). If in 5 years, I have 2 players of the same skill, the according to this formula, the probability of either winning is 50/50. Which makes sense for players of identical skill. So right now, I might say, well, there's a 30-70% chance player 1 wins (centered at 50-50, but I'm uncertain about exactly what it is), then 5 years from now I can say that there's a 49-51% chance player 1 wins (still 50-50, but I'm certain it's about 50-50 at this point). I'll be able to narrow in only on the probability that a specific player can beat another, not on the actual outcome.
What you are saying is that in 5 years, there will only be <5% upsets, and >95% perfect predictability. According to any paired comparison model, that would imply that all player's skill levels are tremendously far apart, which is not likely to be the case. That would imply that no rivalry would exist, no excitement in wondering who will come out on top in any match-up because 95%+ of the time you'd know the victor in advance.
I don't understand how one could ever have high predictability of evenly match opponents. I think that would, by definition, make them not evenly matched.
|
You must take in account that "Player A is better than Player B" does not ensure "Player A beats Player B". But the more historical data you have, the more you can take in account indicators for when the one or the other player wins.
You should be aware that it is high likely that the model in its current state, as far as I understand it, will not converge. Mainly because players at the best don't have a distinct probability of winning but more a set or even range of measures as probabilities.
And another point is that the eveness of comparison of two players depends on the granularity of the model.
You don't need very far apart skill levels to reach a very high prediction score. The model just needs to take in account the most relevant data possible. Which means more, and more precise, data.
That is what I meant with "if you keep the work up", you need to gain much knowledge and add a ridiculous account of parameters about the game state if you want to be really on spot with predictions.
Your model is a quite good start off, but it will naturally reach its limits hence to minor-dimensioness and cardinality of data sets, two different factors of course.
And it definitely helps when the game is so far figured out that players stuck to their guns rather then gamble, obviously, aswell (technically spoken: variance) :-)
If you are keen in expanding this project further I would love helping you to figure out further improvements of the model, although I must admit my focus point is game theory so my experience and knowledge is limited.
|
I believe the concept "Player A is better than Player B" does not ensure "Player A beats Player B" was the entire substance of my last post on how the skill system calculates the win percentage.
You'll have to be more specific on what you mean by "converge." The skill parameters will most definitely converge under the current model. The probability of a player winning will not, and I don't expect it to. Assuming perfect predictability assumes that players will always make predictable decisions. Even late into Brood War, people made risky decisions to try and pull off an extra win. Players always have the decision to go fast/all-in or slow/macro or anywhere in between.
If a player is 95% predictable, then their opponent is going to snipe build whatever they do. If the opponent is savvy of this, they will counter the snipe build, and the head games go on until you have no idea exactly which player is going to do what, thus leading to uncertainty of individual games outcomes.
I think it would help me if you spoke in more specific terms. Explain what the situation would look like where I could know exactly who was going to win because I just don't see that ever being the case. I can't imagine the game reducing to be so one-dimensional so it would help to have this explained to me.
|
Also, to anyone reading this, if anyone has some good parsing skills, the TeamLiquid Database is awesome, it has player names, races, maps, and tournament and maybe some other stuff, I don't remember. If someone could help me figure out how to extract all that information in bulk, I could leap this project forward a considerable amount.
|
I mean for example you can expand the function with recent match history, matchup and stuff like that. It just is quite hard to figure out how to weight these parameters though to gain accuracy instead of losing it.
highly mathematical inside + Show Spoiler + mathematically, your function is h = g(f(x,y)), g : Q -> {0,1} "winner function" f : R x R -> [0,1] "match function" x,y being the Players parameters. R, Q, N as usual
what I mean is you replace the function f with for example a function f2, f2 : F x F x R x N x N^n x N^n -> [0,1] f2(x,y,z,a,b,c), x,y player functions, z team factor (simulate training partner), a the map, b the history of player A, c the history of player B
F the space of parameter functions, f3 in F being f: N x R^3 x N^n -> R f3(x,y,z) with x enemys race, y players parameters per matchup, z winning history over last n matches
This can obviously be expanded endlessly with successively minor improvement
if you figure out how to model these factors in even slightly (which is by no means near trivial) it may make the predictions more accurate.
With skill parameter not converging I meant it may rise in ones players prime and fall again and then again rise but I don't know if it converges in the normal mathematical sense on infinite examples, means if there is a distinct skill x0 for every player which his parameter will converge against.
I cannot state any of this with proof so I could be wrong on my assumptions of course :-)
|
a bit of a statistical upset, ogsMC is probally going to win it.
|
There is an rss feed for the tlpd, although it only returns the 100 most recent games (is there a way to get it to send more?). You could parse this to get new game results.
#!/usr/bin/python
# Before running, do 'wget [url=http://www.teamliquid.net/tlpd/sc2-korean/games/rss']http://www.teamliquid.net/tlpd/sc2-korean/games/rss'[/url]
import feedparser # From [url=http://www.feedparser.org/]http://www.feedparser.org/[/url] d = feedparser.parse("./rss")
# The link in rss feed is missing the "sc2-korean" part # rss: d['entries'][0]['link'] = [url=http://www.teamliquid.net/tlpd/games/50656_InCa_vs_NaDa]http://www.teamliquid.net/tlpd/games/50656_InCa_vs_NaDa[/url] # actual: [url=http://www.teamliquid.net/tlpd/sc2-korean/games/50656_InCa_vs_NaDa]http://www.teamliquid.net/tlpd/sc2-korean/games/50656_InCa_vs_NaDa[/url]
#print d['entries'][0]
for k in d['entries']: print k['title'], k['updated']
|
Mip, this is deeply intriguing and I'm looking forward to see the final results. Having finished my elementary course in statistics, I was hoping to gain some inspiration as for what I could do when I'm going to begin my exam, this really got me thinking, seeing as I want to do something game related as well. Perhaps it would be interesting, as I think someone already mentioned, do a comparative study with other tournaments?
|
Have you considered using trueskill? It's a bayesian sorting algorithm that can be used for games with any number of players.
|
|
|
|