|
On February 11 2013 08:09 Epamynondas wrote:Show nested quote +On February 08 2013 03:06 TheBB wrote:+ Show Spoiler +On February 08 2013 02:16 KillerDucky wrote:Show nested quote +On February 07 2013 23:50 dcemuser wrote:I love Aligulac. The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches. Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period. Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed. TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL. http://aligulac.com/periods/21/?sort=&race=ptzrs&nats=all is a big example of this. Some guy named Bubbles 3-0'd a bunch of foreigners (0 korean players) and became the #1 player in the world, lol. I'm honestly not sure how you solve this, other than just mentioning it in the FAQ and admitting the early data is going to be kind of weird. It's possible to fix this, some papers I read call it parameter smoothing, using backward filtering to smooth the past ratings. See for example this paper: http://tennis-skill-rankings.googlecode.com/hg-history/c977c53a3af2913e780e39666fe1a272cc298319/links/glicko.pdf I thought about this (that's the paper I based my method on actually), but I didn't quite like the idea of past lists changing forever. When FIDE (chess) ratings are published they are set in stone, and you know for example that Kasparov's 2851 record from 1999 or Carlsen's 2872 at the moment will never be anything other than what they are. It makes it awkward for enthusiasts to track records. Not that I've noticed a lot of people tracking Aligulac records, since the pasts lists are changing anyway due to the expanding database (for the time being), but still, I wanted to give people the option. Thoughts? Maybe you could do some kind of backwards adjustement (or this "smoothing" you guys speak of) only on new players? Like, compute things normally for them for about 4 periods or something like that (or for a set amount of games played, i guess?), and then adjust their ratings retroactively, and then don't mess with their past ever again. So imagine that I get a magical seed for Code S next season, and lose my first game of the group stages against Life (but only because i'm nervous). This doesn't give a lot of points to Life because I'm totally unknown at that point. Then I proceed to stomp all competition and win Code S without dropping another map. Then your script readjusts my ratings and suddenly Life has a rating of like 3000 because he took a game off me. And then pro players catch up to my silver strats and I don't win a game ever again.
Do you have a Code S seed that you haven't told anyone about? :D
|
Negotations are still underway so I'm not supposed to talk about it.
;D
|
On February 12 2013 01:06 Epamynondas wrote: Negotations are still underway so I'm not supposed to talk about it.
;D
In before EGEpamynondasRC
|
While it is definitely a rating based behind extremely high level players and seeing where they are ranked. I find it cool that I could even find some of my own results that I had completely forgotten about from like early 2011 etc. This is a great system for the high level professional players and also really useful for a lowly NA semi-pro.
|
I am a bit ashamed, studying statistics myself,but why would you consider your red line ideal?
I suppose the linear regression gives you the approximation with smallest prediction error. Why would you minimize/maximize something else?
|
On February 12 2013 02:57 fezvez wrote: I am a bit ashamed, studying statistics myself,but why would you consider your red line ideal?
I suppose the linear regression gives you the approximation with smallest prediction error. Why would you minimize/maximize something else? Because ideally you want the actual win rate to match your predicted win rate.
|
On February 12 2013 03:29 Traceback wrote:Show nested quote +On February 12 2013 02:57 fezvez wrote: I am a bit ashamed, studying statistics myself,but why would you consider your red line ideal?
I suppose the linear regression gives you the approximation with smallest prediction error. Why would you minimize/maximize something else? Because ideally you want the actual win rate to match your predicted win rate.
Ninja'd me
|
|
On February 07 2013 09:57 TheBB wrote:Show nested quote +On February 07 2013 09:55 Greenei wrote: hmm the 80%+ winrates seem to be really poorly predicted. Also this could be because of the underlying model (maybe using normal distribution wasn't the best idea, logistic might be better after all... more on this in a later edition maybe), or because there really aren't that many games with 80%+ skill gap.
Logistic is what gets used in Chess ELO rankings specifically because it was found to be a better fit to the data. The only part the two functions handle significantly differently is the tail (so the 80% percentage gap cases).
I mean, specifically looking at the chart you posted:
+ Show Spoiler [Proof] +
Ignore the data noise for a moment and look at the fitted curve. The fitted curve starts dead even with the ideal curve, but slowly diverges (and from 50%-83% the actual game data very closely follows the fitted curve). This is what I'd expect to see if the probability distribution being used was collapsing too tightly.
The problem with the normal distribution is that it's not built for transitive operations.
If Life has an 84% chance of beating Sheth and Sheth has an 84% chance of beating Artosis Does this mean that Life has a 98% chance of beating Artosis?
Because this is what Normal distribution predicts when you combine two win percentages.
Two 31% chances combine into a 16% chance. Two 23% chances combine into a 7% chance. Two 16% chances combine into a 2% chance. Two 7% chances combine into a 0.1% chance.
Which...doesn't feel right to me. If we take the above Life/Sheth/Artosis numbers, but make Sheth 50% less likely to win against Life, and Artosis 50% less likely to win against Sheth, suddenly Artosis is 95% less likely to win against Life? The odds against him really increase by a factor of 20, when the odds of the two intermediate matchups only get worse by a factor of 2?
Compare to the logistic curve. When you combine two win percentages to predict a more distant match...
Two 31% chances combine into a 17% chance Two 23% chances combine into a 9% chance Two 17% chances combine into a 4% chance Two 9% chances combine into a 1% chance
Which is to say: if you take existing win percentages, and change them so that Sheth is 1/2 as likely to beat Life, and Artisis is 1/2 as likely to beat Sheth, then that makes Artosis is 1/4 as likely to beat Life (instead of 1/20 as likely).
Just intuitively, this just feels like a more reasonable way to combine percentages. If you told me with absolute certainty "Life beats sheth X% of the time" and "Sheth beats Artosis Y% of the time", and then asked me "What do you expect Life's winrate against Artosis to be?" My guess would be much closer to the logistic distribution than the normal distribution.
|
I got quoted in an OP? That's awesome.
Also, this is a cool project.
|
When I look at TheBB's posts, I see the gathering mass of a star being born.
This is insanely useful information, an excellent use of statistics, and I hope to (insert deity or otherworldly influence here) that you can get some academic use out of this project as well. (A paper, an essay, something.)
|
On February 08 2013 04:56 ACrow wrote: Good job, always love your list! Glad you found a bug, it still seems a bit weird seeing Scarlett that high on the list, but w/e, math does not lie and it's only a model not the truth (whatever truth is).
As a big Scarlett fan...yeah, I would not put her ahead of Hyun.
In general, the place the ratings feel a little wrong is when players play in an overly easy (or overly hard) group.
Take the list of best foreign Terrans:
http://aligulac.com/periods/77/?page=1&sort=&race=t&nats=foreigners
Actually, let me quickly note that MaSa is also not on this list (Aligulac has MaSa as Korean; Liquipedia has MaSa as Canadian; I believe Canadian is correct here).
But what I really want to point out here is.... Look at the 5th best foreign Terran. Bunny! Who is Bunny? Someone who participated in a Danish Starcraft tournament, and I guess got more wins than losses. Boom, 5th best foreigner Terran, apparently!
If you want to get highly rated by Aligulac, then play opponents weaker than yourself...which sums up a decent number of Scarlett's tournaments (WCS Canada, WCS North America, IPL qualifier for North America...).
Conversely, if you want a low rating on Aligulac, then play stronger opponents. (Most of the foreigners who participated in the MLG vs Proleague event had their ratings dip, usually by about 300 points right around October 2012. For example, look at the rating graph of qxc: http://aligulac.com/players/261/ ).
I don't really know if there's a good statistical way to fix this issue, however. If all the Danish people collectively decide to never play anyone outside of Denmark, some of them are going to end up with very high ratings, some of them are going to end up with very low ratings. Not a whole lot that can be done about it.
|
On February 12 2013 08:20 metroid composite wrote:Show nested quote +On February 08 2013 04:56 ACrow wrote: Good job, always love your list! Glad you found a bug, it still seems a bit weird seeing Scarlett that high on the list, but w/e, math does not lie and it's only a model not the truth (whatever truth is). As a big Scarlett fan...yeah, I would not put her ahead of Hyun. In general, the place the ratings feel a little wrong is when players play in an overly easy (or overly hard) group. Take the list of best foreign Terrans: http://aligulac.com/periods/77/?page=1&sort=&race=t&nats=foreignersActually, let me quickly note that MaSa is also not on this list (Aligulac has MaSa as Korean; Liquipedia has MaSa as Canadian; I believe Canadian is correct here). But what I really want to point out here is.... Look at the 5th best foreign Terran. Bunny! Who is Bunny? Someone who participated in a Danish Starcraft tournament, and I guess got more wins than losses. Boom, 5th best foreigner Terran, apparently! If you want to get highly rated by Aligulac, then play opponents weaker than yourself...which sums up a decent number of Scarlett's tournaments (WCS Canada, WCS North America, IPL qualifier for North America...). Conversely, if you want a low rating on Aligulac, then play stronger opponents. (Most of the foreigners who participated in the MLG vs Proleague event had their ratings dip, usually by about 300 points right around October 2012. For example, look at the rating graph of qxc: http://aligulac.com/players/261/ ). I don't really know if there's a good statistical way to fix this issue, however. If all the Danish people collectively decide to never play anyone outside of Denmark, some of them are going to end up with very high ratings, some of them are going to end up with very low ratings. Not a whole lot that can be done about it.
Very true, I believe a few pages back someone posted an explanation of how "islands" within a ranking system affects this. But please remember, this isn't "THE TRUTH". Bunny has very few matches, the problem is always when someone new enters in a scene with a lot of "new" players (non-ranked players starting on 1000). And yes it can become a problem if a subcommunity only plays each other.
|
On February 12 2013 07:00 metroid composite wrote:Show nested quote +On February 07 2013 09:57 TheBB wrote:On February 07 2013 09:55 Greenei wrote: hmm the 80%+ winrates seem to be really poorly predicted. Also this could be because of the underlying model (maybe using normal distribution wasn't the best idea, logistic might be better after all... more on this in a later edition maybe), or because there really aren't that many games with 80%+ skill gap. + Show Spoiler +Logistic is what gets used in Chess ELO rankings specifically because it was found to be a better fit to the data. The only part the two functions handle significantly differently is the tail (so the 80% percentage gap cases). I mean, specifically looking at the chart you posted: + Show Spoiler [Proof] +Ignore the data noise for a moment and look at the fitted curve. The fitted curve starts dead even with the ideal curve, but slowly diverges (and from 50%-83% the actual game data very closely follows the fitted curve). This is what I'd expect to see if the probability distribution being used was collapsing too tightly. The problem with the normal distribution is that it's not built for transitive operations. If Life has an 84% chance of beating Sheth and Sheth has an 84% chance of beating Artosis Does this mean that Life has a 98% chance of beating Artosis? Because this is what Normal distribution predicts when you combine two win percentages. Two 31% chances combine into a 16% chance. Two 23% chances combine into a 7% chance. Two 16% chances combine into a 2% chance. Two 7% chances combine into a 0.1% chance. Which...doesn't feel right to me. If we take the above Life/Sheth/Artosis numbers, but make Sheth 50% less likely to win against Life, and Artosis 50% less likely to win against Sheth, suddenly Artosis is 95% less likely to win against Life? The odds against him really increase by a factor of 20, when the odds of the two intermediate matchups only get worse by a factor of 2? Compare to the logistic curve. When you combine two win percentages to predict a more distant match... Two 31% chances combine into a 17% chance Two 23% chances combine into a 9% chance Two 17% chances combine into a 4% chance Two 9% chances combine into a 1% chance Which is to say: if you take existing win percentages, and change them so that Sheth is 1/2 as likely to beat Life, and Artisis is 1/2 as likely to beat Sheth, then that makes Artosis is 1/4 as likely to beat Life (instead of 1/20 as likely). Just intuitively, this just feels like a more reasonable way to combine percentages. If you told me with absolute certainty "Life beats sheth X% of the time" and "Sheth beats Artosis Y% of the time", and then asked me "What do you expect Life's winrate against Artosis to be?" My guess would be much closer to the logistic distribution than the normal distribution. Yes, I know all this now. But thanks for putting it into words anyway. 
I will try and see what happens. I expect some improvement, too.
|
On February 12 2013 07:23 felisconcolori wrote: This is insanely useful information, an excellent use of statistics, and I hope to (insert deity or otherworldly influence here) that you can get some academic use out of this project as well. (A paper, an essay, something.)
Thanks . I asked my advisor about whether or not the institute has a policy on publishing reports in topics that are outside the main area of research and he said it was fine as long as I found some statistician to look at it. (None of the people I usually work with know anything about statistics, lol.)
|
I've now converted everything to using the logistic distribution. You should see somewhat more conservative predictions now.
Updated prediction analysis:
It didn't help as much in the 80%+ regime as I thought it would. I'm thinking the problem is more related to the sudden arrival of new player pools (koreans in late 2010, kespa in 2012), and I may have to do something about that, such as one or more of:
- parameter smoothing over certain time periods - use time-dependent parameters
At the moment I'm a bit tired of the mathematical part and I'll go back to working on the website for a few weeks.
|
|
Yeah, that's pretty fucking awesome, good luck with your thesis!
|
aaaah! as i see you have included my proposed feature, sweeeet!
|
|
|
|
|