|
On February 08 2013 02:16 KillerDucky wrote:Show nested quote +On February 07 2013 23:50 dcemuser wrote:I love Aligulac. The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches. Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period. Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed. TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL. http://aligulac.com/periods/21/?sort=&race=ptzrs&nats=all is a big example of this. Some guy named Bubbles 3-0'd a bunch of foreigners (0 korean players) and became the #1 player in the world, lol. I'm honestly not sure how you solve this, other than just mentioning it in the FAQ and admitting the early data is going to be kind of weird. It's possible to fix this, some papers I read call it parameter smoothing, using backward filtering to smooth the past ratings. See for example this paper: http://tennis-skill-rankings.googlecode.com/hg-history/c977c53a3af2913e780e39666fe1a272cc298319/links/glicko.pdf I thought about this (that's the paper I based my method on actually), but I didn't quite like the idea of past lists changing forever. When FIDE (chess) ratings are published they are set in stone, and you know for example that Kasparov's 2851 record from 1999 or Carlsen's 2872 at the moment will never be anything other than what they are. It makes it awkward for enthusiasts to track records. Not that I've noticed a lot of people tracking Aligulac records, since the pasts lists are changing anyway due to the expanding database (for the time being), but still, I wanted to give people the option.
Thoughts?
Edit: Just so it's clear, we're talking about basing ratings on both past and future results, so that the historical ratings look more correct in hindsight. It can fix some of the early problems by (for example) adjusting Koreans upwards because we now know that they have an average higher skill level.
|
Maybe just run smoothing once. Really as long as you start from around October 2012 (MvP matches) and smooth backwards from there, most of the problems would probably be fixed.
|
Just letting you know that I encountered an error when playing around with the prediction stuff:
+ Show Spoiler +
It seems like the round robin thing doesn't like large groups. (Was just an 8-man round robin w/ bo1 using the players shown)
|
Lol funny rounding error that is.
|
Ah yeah, you used a group that was so big it was forced to use Monte Carlo simulation. Thanks for the heads up.
|
There is not a lot of difference between First and Last in the recent list.
Good job, always love your list! Glad you found a bug, it still seems a bit weird seeing Scarlett that high on the list, but w/e, math does not lie and it's only a model not the truth (whatever truth is).
|
what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
|
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.)
Edit: Of course 3 std's = 99% (I am retarded)
|
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift? It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
|
Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.)
3 stds would be ~99%.
On February 08 2013 08:34 TheBB wrote:Show nested quote +On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift? It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
k thx. do you plan on making the database open source at any point? because i'd like to make some calculations of my own from time to time and there would be no point at all in starting an own database at this point.
|
On February 08 2013 11:52 Greenei wrote:Show nested quote +Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.) 3 stds would be ~99%. Show nested quote +On February 08 2013 08:34 TheBB wrote:On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift? It's actually just one estimated standard deviation, so it's a pretty weak confidence interval. k thx. do you plan on making the database open source at any point? because i'd like to make some calculations of my own from time to time and there would be no point at all in starting an own database at this point. You can download an SQL database dump at http://aligulac.com/db/.
|
On February 08 2013 15:17 Conti wrote:Show nested quote +On February 08 2013 11:52 Greenei wrote:Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.) 3 stds would be ~99%. On February 08 2013 08:34 TheBB wrote:On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift? It's actually just one estimated standard deviation, so it's a pretty weak confidence interval. k thx. do you plan on making the database open source at any point? because i'd like to make some calculations of my own from time to time and there would be no point at all in starting an own database at this point. You can download an SQL database dump at http://aligulac.com/db/. 
ah thx, that was a bit hidden :D
|
On February 08 2013 08:34 TheBB wrote:Show nested quote +On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift? It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
Yeah ok, 68%
|
As a stats buff, gotta say it really is a nice website, like a cleaner and better version of TLPD (or sc2charts, whatever floated your boat). Both infuriated me for the longest time because they had the data and did nothing with it. You on the other hand understand that a db is as good as what you do with it. I also love how well your data is historized.
Downloading that Db dump from work is so tempting...
|
you could revisit the EG curse with those stats
|
So BB if you ever get particularly bored, could you make a prediction system for ProLeague/GSTL based on not only on player rating for both rosters but also maps? Or is it simply not going to be accurate enough to warrant the gargantuan effort involved in creating and implementing the system? xD
|
On February 11 2013 06:40 MasterOfPuppets wrote: So BB if you ever get particularly bored, could you make a prediction system for ProLeague/GSTL based on not only on player rating for both rosters but also maps? Or is it simply not going to be accurate enough to warrant the gargantuan effort involved in creating and implementing the system? xD There's currently no map information saved in the database, only matches and results. So before any kind of predictive magic math can be applied, we'd need that information for >100.000 games. And we'd need a whole lot more volunteers for that.
Nudge. Nudge.
|
On February 11 2013 07:14 Conti wrote:Show nested quote +On February 11 2013 06:40 MasterOfPuppets wrote: So BB if you ever get particularly bored, could you make a prediction system for ProLeague/GSTL based on not only on player rating for both rosters but also maps? Or is it simply not going to be accurate enough to warrant the gargantuan effort involved in creating and implementing the system? xD There's currently no map information saved in the database, only matches and results. So before any kind of predictive magic math can be applied, we'd need that information for >100.000 games. And we'd need a whole lot more volunteers for that. Nudge. Nudge.
Plus we (TheBB) had to rework how the entire database is configured because matches =/= games.
Plus it would be hard since a lot of LP-articles contain no mapinfo, even on big tournaments like MLG it is impossible to find map info for stuff like open bracket etc. So yeah, way too much work, whenever a new feature has to be "backtracked" as I like to call it, it literally takes our small team of 4-5 (TheBB, Conti, kiekaboe does a shit ton each and I + Inflicted does some as well) weeks, just look at this http://aligulac.com/db/ "only" 64% is catalogued in the event hierarchy.
|
On February 08 2013 03:06 TheBB wrote:+ Show Spoiler +On February 08 2013 02:16 KillerDucky wrote:Show nested quote +On February 07 2013 23:50 dcemuser wrote:I love Aligulac. The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches. Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period. Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed. TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL. http://aligulac.com/periods/21/?sort=&race=ptzrs&nats=all is a big example of this. Some guy named Bubbles 3-0'd a bunch of foreigners (0 korean players) and became the #1 player in the world, lol. I'm honestly not sure how you solve this, other than just mentioning it in the FAQ and admitting the early data is going to be kind of weird. It's possible to fix this, some papers I read call it parameter smoothing, using backward filtering to smooth the past ratings. See for example this paper: http://tennis-skill-rankings.googlecode.com/hg-history/c977c53a3af2913e780e39666fe1a272cc298319/links/glicko.pdf I thought about this (that's the paper I based my method on actually), but I didn't quite like the idea of past lists changing forever. When FIDE (chess) ratings are published they are set in stone, and you know for example that Kasparov's 2851 record from 1999 or Carlsen's 2872 at the moment will never be anything other than what they are. It makes it awkward for enthusiasts to track records. Not that I've noticed a lot of people tracking Aligulac records, since the pasts lists are changing anyway due to the expanding database (for the time being), but still, I wanted to give people the option. Thoughts?
Maybe you could do some kind of backwards adjustement (or this "smoothing" you guys speak of) only on new players? Like, compute things normally for them for about 4 periods or something like that (or for a set amount of games played, i guess?), and then adjust their ratings retroactively, and then don't mess with their past ever again.
So imagine that I get a magical seed for Code S next season, and lose my first game of the group stages against Life (but only because i'm nervous). This doesn't give a lot of points to Life because I'm totally unknown at that point.
Then I proceed to stomp all competition and win Code S without dropping another map. Then your script readjusts my ratings and suddenly Life has a rating of like 3000 because he took a game off me.
And then pro players catch up to my silver strats and I don't win a game ever again.
|
..and sorting matches into events is about a gazillion times faster to do than adding maps to matches would be.
|
|
|
|