Aligulac.com is an ongoing statistical project and website in development started in December 2012. It offers a comprehensive database of games from the pro and semipro SC2 scene, as well as a unique rating system aimed at rating players and teams, and predicting games.
The FAQ might be able to answer your questions. If not, I'll be keeping an eye on this thread so you can ask away here.
Also, before we start, I want to quote Heartland from one of my previous threads who had this to share, better put into words than I ever could.
On February 03 2013 02:50 Heartland wrote: I think what's cool and great about this work is that it does what statistics are good for. They give you the ability to create data and then to look at it critically. But maybe people in this thread confuse statistics with the Truth with a capital p (sic). That's not the way you should read statistics, whether in the morning paper or on TL. Rather statistics can make us think about deeper connections that we haven't seen before, twist and turn around concepts and play with them through statistical models. They're not meant to say "Scarlett should be in Code S." Obviously there are flaws or issues with these stats, but it's common for statistics everywhere. What you can do with that is to add or change some modifier, let it meet other forms of reasoning or to extrapolate on what we take for granted.
So yeah, tl;dr, lies, damn lies and statistics are the case with all stats but it's not the point of stats.
So, let's get on with the news.
Ratings bug
Aside from a lot of minor improvements, two significant things have happened. On Monday I got an e-mail from a certain Zomia, who had lots of constructive feedback to the site.
You see, I initially envisioned this site to be about ratings, but now it's more of a TLPD-like thing, which is to say it's a database of results, and most of the development lately has been related to that. But Zomia convinced me to go back and look at the rating system once more, and lo and behold, there was a bug.
Strictly speaking, the bug was not in the rating system, but in the code that I used analyze it and pick parameters. Until now I have basically been basing these numbers off of flawed information. No longer! As it turns out this is for everyone's benefit, because it allowed me to tweak the system to be more conservative (which a lot of people have been wanting), while giving it the predictive power that I thought it had, and which it now actually has.
This graph plots all 114k+ games, categorizing them by predicted winrate for the presumed strongest player on the horizontal axis, and actual winrate for same on the vertical axis. That's the thick black line. The blue line shows a weighted linear fit, while the red line is the ideal that we want. The system tends to slightly overestimate the strength of the strongest player up to a game winrate of 80%. From then on the overestimation might be significant. (Note that this is game winrate, not Bo3 or Bo5 winrate, which is higher.)
Hopefully this can convince everyone that it's working just fine now.
The upshot of this is that the rating list looks much more sensible, as does the Hall of Fame (which I've changed a bit how it works), which is now topped by people like MC, MKP and Mvp.
So to those of you who have been critiquing me, I owe you an apology.
Add results! Now easy even for grandma
Now, to the results database. We have 114 thousand games. That's about twice the size of sc2charts.net and about a third more than TLPD has, which makes us the most complete publically available pro/semipro SC2 database in the world (Blizzard's internal database is neither). Which is fine by me, I don't much care for sc2charts.net, and while TLPD is great the international database is not kept up to date very well any more.
Yet, I don't blame them. Creating this database with little more than just four people pulling the cart, I've learned that this shit's hard.
That is why we have opened for everyone to submit results. You can do that here: http://aligulac.com/add/
Publically submitted results will be subject to review by us before becoming visible. If you are interested, you can still PM me and get an admin account, which will allow you to:
submit results directly without all the review bother.
change, create and delete players, teams, matches...
review other people's submitted results.
sort matches into our events catalogue, which is still missing about 40% of the database.
mark the offline games as offline (some people wanted a separate rating for offline games, but that can't happen until this sorting is done).
bug me for feature requests (priority given to helpers).
just about anything else you can think of.
So, if your favourite player is rated too low, I suggest you go out and find some games that they won and add them for us.
(Kinda joking, but also kinda not.)
Thanks to the unknown guy out there who submitted IPTL this morning, and thus unknowingly became the first to use this system. (No idea who you are...)
Now, as usual the ratings have been updated to the latest two-week period. This one includes results from the two first days of GSL Code A and the preliminaries, most of GSL Code S Ro32, Proleague, IPL qualifiers, IPTL, Iron Squid finals, a handful of go4sc2s and ZOTACs and a bunch of other tournaments.
Zerg is still ahead in the top of the ladder, but this time it's Terran who is lagging behind, and not (as is usually the case) Protoss.
(I'm a little bit earlier this time so the list can be out in time for the GSL Ro16.)
Current top 10
Life 2256 (+4 after Iron Squid)
PartinG 2239 (+15 after Code S Ro32, MLG quals and FXO inv. playoffs)
Leenock 2235 (no change after FXO Inv. playoffs)
Bomber 2233 (+12 after Code S Ro32)
Rain 2211 (-9 after Proleague and Code A prelims)
DongRaeGu 2203 (+41 after IPL intl. regionals, Iron Squid, and Code S Ro32)
TaeJa 2194 (-1 after Code S Ro32 and a delayed game from IPL intl. regional)
RorO 2179 (+93 after Proleague, Code S Ro32 and MLG quals)
Scarlett 2137 (no games)
viOLet 2132 (no games)
Parting and Leenock have switched places, same with DongRaeGu and Taeja. Roro and Violet make their appearances while Hero and Last drop out. Roro and DRG make the biggest jumps upwards.
Foreigner top 10
Scarlett 2137
Stephano 2015
VortiX 2014
Snute 1973
LucifroN 1947
Sen 1909
Kas 1886
Fraer 1869
TitaN 1868
Nerchio 1826
The foreigner list is still Zerg heavy, but not as much as before maybe. Nobody in the top 10 gained any points, except Snute who shot up 136 after playing a ton of games.
Top 10 teams
MVP 91.44%
StarTale 91.11%
SK Telecom T1 91.04%
Incredible Miracle 90.14%
AZUBU 89.08%
Team Liquid 88.75%
Prime 86.25%
FXOpen e-Sports Korea 85.96%
STX SouL 84.76%
Evil Geniuses 84.44%
This is the allkill rank, and it's very close in the top. StarTale has lost their big advantage, and SKT T1 keeps making their name as the strongest Kespa team.
The proleague rank looks like this:
SK Telecom T1 79.26%
MVP 78.80%
Incredible Miracle 74.91%
AZUBU 74.71%
StarTale 72.00%
STX SouL 67.12%
Team Liquid 66.52%
Prime 62.50%
Evil Geniuses 61.92%
FXOpen e-Sports Korea 60.87%
Still close between SKT T1 and MVP, who might be the strongest teams (on paper) at the moment. StarTale is comparatively weaker in this format (their roster is topped out with Life and Bomber), while STX is stronger.
(Note team ranks are based on player ratings and rosters, not actual team matches.)
Thanks
To my team of trusted helpers: Conti, kiekaboe, Grovbolle, PhoenixVoid, Inflicted_ (new) and scisyhp (new). This project would never have been possible without you.
This week particularly to Zomia whose feedback led me to reconsider a few things.
Also to my academic advisor whose timely conference trip abroad allowed me the free time to waste.
Hahaha, the person who submitted the IPTL results was ME! The system reset, and I forgot I had to login again, which I only realized after seeing the message that showed I was not logged in.
Funnily enough, I still didn't do it properly, forgot the "Source".
hmm the 80%+ winrates seem to be really poorly predicted. but i guess that makes sense when new people have 1000 points and play against somewhat equally skilled players who have 2000 points. what really matters is the 50-70% region anyways. thx for the update, now i may FINALLY make some money with this thing :D
On February 07 2013 09:55 Greenei wrote: hmm the 80%+ winrates seem to be really poorly predicted.
Also this could be because of the underlying model (maybe using normal distribution wasn't the best idea, logistic might be better after all... more on this in a later edition maybe), or because there really aren't that many games with 80%+ skill gap.
btw: i still think a timeframe independant model would be nice, because it would make ratingupdates quicker. i could understand it though if that's not possible. so here is my idea:
how about a 'predicted rating development'? as in 'if these games, which have been reviewed, were the only games played in this timeframe, this rating would be the result.
Just want to say that you're awesome, TheBB. It's great someone picked up the pieces after TLPD just kinda shattered and made something even better and more complete.
I've been using it quite a bit and it's now my go-to SC2 database site if I need to look up something, thanks a lot to you and all your helpers!
Also, I'm very happy with the foreigner rankings, ROX.KIS fighting!
On February 07 2013 10:02 Greenei wrote: how about a 'predicted rating development'? as in 'if these games, which have been reviewed, were the only games played in this timeframe, this rating would be the result.
so what do you think about this? i just think it would be really cool stuff.
It makes no sense for me to do a write-up on the kespa pros since a shit ton of old matches were added, meaning that everyones ranking changed so it's hard to compare between period 76 and 77. Will do it for the next ranking. I feel like quoting my own skype log with TheBB because I said the whole "aligulac is like caligula" first :D
poor socke he lost the position as the best foreigner protoss. He isn't even in the top 10 Eu toss right now. I guess he needs some more consistency to get up there again with the new ranking.^^
On February 07 2013 21:28 StarGalaxy wrote: One of my favourite sc2 releated sites.
poor socke he lost the position as the best foreigner protoss. He isn't even in the top 10 Eu toss right now. I guess he needs some more consistency to get up there again with the new ranking.^^
On February 07 2013 11:05 LockeTazeline wrote: Awesome stuff, as always.
What are you going to do with HotS? I noticed you've already put in the MLG Showdowns.
There are already various HotS showmatches and tournaments in the database, and we (and anyone who wants to help out!) are in the process of categorizing them as HotS matches/tournaments.
As far as I know, there won't be a distinction made between WoL and HotS matches played in regards to the ratings. There might be some chaos during the launch of HotS because of this, but there's really no smart way to prevent this, and the ratings will adjust quickly enough.
On February 07 2013 21:41 JustPassingBy wrote: Only 5 terrans in the top 40 non-Koreans now... ;; Is there a reason why Demulsim does not show up in the non-Korean rankings, btw?
No matches played for 4 periods (8 weeks) meaning he is currently too hard to rate, his rating isn't deleted, just hidden until we again have an idea of his skill level.
"If a player has not played any games for four periods (eight weeks, or about two months), he or she is removed from the list. The entry is still there, and will be taken into account if the player plays a new game some time in the future. It's only kept from the published list to keep it from filling up with uncertain and possibly irrelevant data.
The same happens if your rating uncertainty goes over 200.
I'm aware that a strict limit of only two months may seem harsh, but as the game is so volatile, if a player doesn't play fairly frequently, the ratings quickly become very uncertain and useless. Remember, your rating is still there."
On February 07 2013 10:02 Greenei wrote: how about a 'predicted rating development'? as in 'if these games, which have been reviewed, were the only games played in this timeframe, this rating would be the result.
so what do you think about this? i just think it would be really cool stuff.
The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches.
Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period.
Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed.
TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL.
That's precisely the reason why we have an international TLPD and a Korean TLPD (at least I think so. Do let me know if I'm wrong here). Since in Brood War, Koreans practically never played against foreigners, and if they would have combined the TLPD elo ranking back then, foreigners would rank evenly with Koreans. Which was even more laughable back then in Brood War than it is now in SC2. So having separate databases for TLPD made perfect sense.
Now, however, Koreans play foreigners all the time, so we can compare them. With the exception, as you say, of the early days of GSL. Another more prominent exception is Proleague, and it's even worse there, IMO. You can see the jumps in the rankings of most Kespa players (Classic, Last, Trap, etc.), which denotes the point where Kespa players started to play against non-Kespa Koreans. As you can see, the rankings have adjusted pretty quickly.
Unfortunately, there are still a lot of lesser known Kespa players who have almost solely played in Proleague so far, and their ranking is unrealistically low. Heck, some of the worst ranked players in the entire ranking are Kespa players. I don't think there's any real way to "fix" this, other than waiting it out until the Kespa ratings have adjusted to the overall ratings.
On February 07 2013 23:50 dcemuser wrote: I love Aligulac.
The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches.
Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period.
Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed.
TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL.
I'm honestly not sure how you solve this, other than just mentioning it in the FAQ and admitting the early data is going to be kind of weird.
It speaks for itself that it takes a lot of matches from a large group of players to determine their skills,and I agree that the early stages of the lists are somewhat dubious. The same goes for a lot of the Kespa pros, the more matches we get to see from them, and the more interaction there is in the entire scene, the better the lists and the predictions get. They all started out at 1000 rating, which isn't a lot since we knew that most of them would still be pretty good, but it is also hard to just say "well they are better than a 1000 for sure" when we have no definitive way of being sure.
Valid points/concerns though. The system will get shaken a lot when HotS hits as well.
On February 07 2013 21:41 JustPassingBy wrote: Only 5 terrans in the top 40 non-Koreans now... ;; Is there a reason why Demulsim does not show up in the non-Korean rankings, btw?
No matches played for 4 periods (8 weeks) meaning he is currently too hard to rate, his rating isn't deleted, just hidden until we again have an idea of his skill level.
"If a player has not played any games for four periods (eight weeks, or about two months), he or she is removed from the list. The entry is still there, and will be taken into account if the player plays a new game some time in the future. It's only kept from the published list to keep it from filling up with uncertain and possibly irrelevant data.
The same happens if your rating uncertainty goes over 200.
I'm aware that a strict limit of only two months may seem harsh, but as the game is so volatile, if a player doesn't play fairly frequently, the ratings quickly become very uncertain and useless. Remember, your rating is still there."
Ah, thanks for the info. That rules does make alot of sense to me.
On February 07 2013 23:50 dcemuser wrote: I love Aligulac.
The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches.
Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period.
Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed.
TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL.
On February 07 2013 10:07 Day[9] wrote: Aligulac is very nearly Caligula backwards. At least they anagram.
It's not even that complicated. Just move the C to the front or the back.
It's a word I came up with as a kid :-P. I don't remember if Caligula was the inspiration. I guess he could've been.
I thought it definitely was :D When you released aligulac for the first time, I even searched into Caligula to find some kind of connection to your website. It wasn't very insightful. Except that he had decent macro, I guess.
On February 07 2013 23:50 dcemuser wrote: I love Aligulac.
The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches.
Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period.
Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed.
TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL.
I thought about this (that's the paper I based my method on actually), but I didn't quite like the idea of past lists changing forever. When FIDE (chess) ratings are published they are set in stone, and you know for example that Kasparov's 2851 record from 1999 or Carlsen's 2872 at the moment will never be anything other than what they are. It makes it awkward for enthusiasts to track records. Not that I've noticed a lot of people tracking Aligulac records, since the pasts lists are changing anyway due to the expanding database (for the time being), but still, I wanted to give people the option.
Thoughts?
Edit: Just so it's clear, we're talking about basing ratings on both past and future results, so that the historical ratings look more correct in hindsight. It can fix some of the early problems by (for example) adjusting Koreans upwards because we now know that they have an average higher skill level.
Maybe just run smoothing once. Really as long as you start from around October 2012 (MvP matches) and smooth backwards from there, most of the problems would probably be fixed.
There is not a lot of difference between First and Last in the recent list.
Good job, always love your list! Glad you found a bug, it still seems a bit weird seeing Scarlett that high on the list, but w/e, math does not lie and it's only a model not the truth (whatever truth is).
what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.)
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.)
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
k thx. do you plan on making the database open source at any point? because i'd like to make some calculations of my own from time to time and there would be no point at all in starting an own database at this point.
Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.)
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
k thx. do you plan on making the database open source at any point? because i'd like to make some calculations of my own from time to time and there would be no point at all in starting an own database at this point.
Making a qualified guess, I would say it is +-3 standard deviations (meaning that 95% of the time, the actual rating falls within the confidence interval, i.e. Rating +- 3 St. Deviations.)
3 stds would be ~99%.
On February 08 2013 08:34 TheBB wrote:
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
k thx. do you plan on making the database open source at any point? because i'd like to make some calculations of my own from time to time and there would be no point at all in starting an own database at this point.
On February 08 2013 07:22 Greenei wrote: what does the '+-30' in the matchuppoints nad general rating actually mean? does it mean ~100% of the time the rating is in that area? or is that 1 or 2 or 3 standarddeviations? or the maximum amount that the rating will shift?
It's actually just one estimated standard deviation, so it's a pretty weak confidence interval.
As a stats buff, gotta say it really is a nice website, like a cleaner and better version of TLPD (or sc2charts, whatever floated your boat). Both infuriated me for the longest time because they had the data and did nothing with it. You on the other hand understand that a db is as good as what you do with it. I also love how well your data is historized.
Downloading that Db dump from work is so tempting...
So BB if you ever get particularly bored, could you make a prediction system for ProLeague/GSTL based on not only on player rating for both rosters but also maps? Or is it simply not going to be accurate enough to warrant the gargantuan effort involved in creating and implementing the system? xD
On February 11 2013 06:40 MasterOfPuppets wrote: So BB if you ever get particularly bored, could you make a prediction system for ProLeague/GSTL based on not only on player rating for both rosters but also maps? Or is it simply not going to be accurate enough to warrant the gargantuan effort involved in creating and implementing the system? xD
There's currently no map information saved in the database, only matches and results. So before any kind of predictive magic math can be applied, we'd need that information for >100.000 games. And we'd need a whole lot more volunteers for that.
On February 11 2013 06:40 MasterOfPuppets wrote: So BB if you ever get particularly bored, could you make a prediction system for ProLeague/GSTL based on not only on player rating for both rosters but also maps? Or is it simply not going to be accurate enough to warrant the gargantuan effort involved in creating and implementing the system? xD
There's currently no map information saved in the database, only matches and results. So before any kind of predictive magic math can be applied, we'd need that information for >100.000 games. And we'd need a whole lot more volunteers for that.
Nudge. Nudge.
Plus we (TheBB) had to rework how the entire database is configured because matches =/= games.
Plus it would be hard since a lot of LP-articles contain no mapinfo, even on big tournaments like MLG it is impossible to find map info for stuff like open bracket etc. So yeah, way too much work, whenever a new feature has to be "backtracked" as I like to call it, it literally takes our small team of 4-5 (TheBB, Conti, kiekaboe does a shit ton each and I + Inflicted does some as well) weeks, just look at this http://aligulac.com/db/ "only" 64% is catalogued in the event hierarchy.
On February 07 2013 23:50 dcemuser wrote: I love Aligulac.
The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches.
Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period.
Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed.
TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL.
I thought about this (that's the paper I based my method on actually), but I didn't quite like the idea of past lists changing forever. When FIDE (chess) ratings are published they are set in stone, and you know for example that Kasparov's 2851 record from 1999 or Carlsen's 2872 at the moment will never be anything other than what they are. It makes it awkward for enthusiasts to track records. Not that I've noticed a lot of people tracking Aligulac records, since the pasts lists are changing anyway due to the expanding database (for the time being), but still, I wanted to give people the option.
Thoughts?
Maybe you could do some kind of backwards adjustement (or this "smoothing" you guys speak of) only on new players? Like, compute things normally for them for about 4 periods or something like that (or for a set amount of games played, i guess?), and then adjust their ratings retroactively, and then don't mess with their past ever again.
So imagine that I get a magical seed for Code S next season, and lose my first game of the group stages against Life (but only because i'm nervous). This doesn't give a lot of points to Life because I'm totally unknown at that point.
Then I proceed to stomp all competition and win Code S without dropping another map. Then your script readjusts my ratings and suddenly Life has a rating of like 3000 because he took a game off me.
And then pro players catch up to my silver strats and I don't win a game ever again.
On February 07 2013 23:50 dcemuser wrote: I love Aligulac.
The -only- issue that I can think of is that there is a 'flaw' to the way it handles GSL/PL. When the Korean scene stays largely separate from the foreign scene, Aligulac has a hard time distinguishing "the gap". For example, the first few GSL seasons had almost no players who had played against foreigners in recorded matches.
Therefore, players who beat exclusively Koreans who did not play (and stomp) foreigners are not having their bar set high enough. This taints a lot of the 2010/early 2011 data. For example, look at NesTea in the Hall of Fame. He's below -Naniwa- and like 4 other foreign players, despite winning 3 GSLs in a 9 month time period.
Luckily, this probably isn't a major issue going forward, since MLG and other tournaments are bringing major Koreans to stomp foreigners, and then those Koreans go home and get beaten by better Koreans, which keeps things balanced. However, there was a once upon a time where the scenes rarely mixed.
TL;DR: GSL Code S is the highest skill tournament in the world, but to Aligulac, this is something that needs to be -continuously- proven, and therefore runs the risk of -not- being proven due to the said players not traveling and focusing entirely on GSL.
I thought about this (that's the paper I based my method on actually), but I didn't quite like the idea of past lists changing forever. When FIDE (chess) ratings are published they are set in stone, and you know for example that Kasparov's 2851 record from 1999 or Carlsen's 2872 at the moment will never be anything other than what they are. It makes it awkward for enthusiasts to track records. Not that I've noticed a lot of people tracking Aligulac records, since the pasts lists are changing anyway due to the expanding database (for the time being), but still, I wanted to give people the option.
Thoughts?
Maybe you could do some kind of backwards adjustement (or this "smoothing" you guys speak of) only on new players? Like, compute things normally for them for about 4 periods or something like that (or for a set amount of games played, i guess?), and then adjust their ratings retroactively, and then don't mess with their past ever again.
So imagine that I get a magical seed for Code S next season, and lose my first game of the group stages against Life (but only because i'm nervous). This doesn't give a lot of points to Life because I'm totally unknown at that point.
Then I proceed to stomp all competition and win Code S without dropping another map. Then your script readjusts my ratings and suddenly Life has a rating of like 3000 because he took a game off me.
And then pro players catch up to my silver strats and I don't win a game ever again.
Do you have a Code S seed that you haven't told anyone about? :D
While it is definitely a rating based behind extremely high level players and seeing where they are ranked. I find it cool that I could even find some of my own results that I had completely forgotten about from like early 2011 etc. This is a great system for the high level professional players and also really useful for a lowly NA semi-pro.
On February 12 2013 02:05 KingDime wrote: While it is definitely a rating based behind extremely high level players and seeing where they are ranked. I find it cool that I could even find some of my own results that I had completely forgotten about from like early 2011 etc. This is a great system for the high level professional players and also really useful for a lowly NA semi-pro.
If you ever participated in any of the major tournaments, or as in what I assume is your case, NASL Season 4 and zotac cup #20 + what we scraped from TLPD, there is a good chance you are in the system
Edit: You are canadian not american, so your results are from WCS Edit edit: I merged the players Dime (us) and Dime (ca). Please respond to my PM to you so I can get the results you submitted reviewed
On February 07 2013 09:55 Greenei wrote: hmm the 80%+ winrates seem to be really poorly predicted.
Also this could be because of the underlying model (maybe using normal distribution wasn't the best idea, logistic might be better after all... more on this in a later edition maybe), or because there really aren't that many games with 80%+ skill gap.
Logistic is what gets used in Chess ELO rankings specifically because it was found to be a better fit to the data. The only part the two functions handle significantly differently is the tail (so the 80% percentage gap cases).
I mean, specifically looking at the chart you posted:
Ignore the data noise for a moment and look at the fitted curve. The fitted curve starts dead even with the ideal curve, but slowly diverges (and from 50%-83% the actual game data very closely follows the fitted curve). This is what I'd expect to see if the probability distribution being used was collapsing too tightly.
The problem with the normal distribution is that it's not built for transitive operations.
If Life has an 84% chance of beating Sheth and Sheth has an 84% chance of beating Artosis Does this mean that Life has a 98% chance of beating Artosis?
Because this is what Normal distribution predicts when you combine two win percentages.
Two 31% chances combine into a 16% chance. Two 23% chances combine into a 7% chance. Two 16% chances combine into a 2% chance. Two 7% chances combine into a 0.1% chance.
Which...doesn't feel right to me. If we take the above Life/Sheth/Artosis numbers, but make Sheth 50% less likely to win against Life, and Artosis 50% less likely to win against Sheth, suddenly Artosis is 95% less likely to win against Life? The odds against him really increase by a factor of 20, when the odds of the two intermediate matchups only get worse by a factor of 2?
Compare to the logistic curve. When you combine two win percentages to predict a more distant match...
Two 31% chances combine into a 17% chance Two 23% chances combine into a 9% chance Two 17% chances combine into a 4% chance Two 9% chances combine into a 1% chance
Which is to say: if you take existing win percentages, and change them so that Sheth is 1/2 as likely to beat Life, and Artisis is 1/2 as likely to beat Sheth, then that makes Artosis is 1/4 as likely to beat Life (instead of 1/20 as likely).
Just intuitively, this just feels like a more reasonable way to combine percentages. If you told me with absolute certainty "Life beats sheth X% of the time" and "Sheth beats Artosis Y% of the time", and then asked me "What do you expect Life's winrate against Artosis to be?" My guess would be much closer to the logistic distribution than the normal distribution.
When I look at TheBB's posts, I see the gathering mass of a star being born.
This is insanely useful information, an excellent use of statistics, and I hope to (insert deity or otherworldly influence here) that you can get some academic use out of this project as well. (A paper, an essay, something.)
On February 08 2013 04:56 ACrow wrote: Good job, always love your list! Glad you found a bug, it still seems a bit weird seeing Scarlett that high on the list, but w/e, math does not lie and it's only a model not the truth (whatever truth is).
As a big Scarlett fan...yeah, I would not put her ahead of Hyun.
In general, the place the ratings feel a little wrong is when players play in an overly easy (or overly hard) group.
Actually, let me quickly note that MaSa is also not on this list (Aligulac has MaSa as Korean; Liquipedia has MaSa as Canadian; I believe Canadian is correct here).
But what I really want to point out here is.... Look at the 5th best foreign Terran. Bunny! Who is Bunny? Someone who participated in a Danish Starcraft tournament, and I guess got more wins than losses. Boom, 5th best foreigner Terran, apparently!
If you want to get highly rated by Aligulac, then play opponents weaker than yourself...which sums up a decent number of Scarlett's tournaments (WCS Canada, WCS North America, IPL qualifier for North America...).
Conversely, if you want a low rating on Aligulac, then play stronger opponents. (Most of the foreigners who participated in the MLG vs Proleague event had their ratings dip, usually by about 300 points right around October 2012. For example, look at the rating graph of qxc: http://aligulac.com/players/261/ ).
I don't really know if there's a good statistical way to fix this issue, however. If all the Danish people collectively decide to never play anyone outside of Denmark, some of them are going to end up with very high ratings, some of them are going to end up with very low ratings. Not a whole lot that can be done about it.
On February 08 2013 04:56 ACrow wrote: Good job, always love your list! Glad you found a bug, it still seems a bit weird seeing Scarlett that high on the list, but w/e, math does not lie and it's only a model not the truth (whatever truth is).
As a big Scarlett fan...yeah, I would not put her ahead of Hyun.
In general, the place the ratings feel a little wrong is when players play in an overly easy (or overly hard) group.
Actually, let me quickly note that MaSa is also not on this list (Aligulac has MaSa as Korean; Liquipedia has MaSa as Canadian; I believe Canadian is correct here).
But what I really want to point out here is.... Look at the 5th best foreign Terran. Bunny! Who is Bunny? Someone who participated in a Danish Starcraft tournament, and I guess got more wins than losses. Boom, 5th best foreigner Terran, apparently!
If you want to get highly rated by Aligulac, then play opponents weaker than yourself...which sums up a decent number of Scarlett's tournaments (WCS Canada, WCS North America, IPL qualifier for North America...).
Conversely, if you want a low rating on Aligulac, then play stronger opponents. (Most of the foreigners who participated in the MLG vs Proleague event had their ratings dip, usually by about 300 points right around October 2012. For example, look at the rating graph of qxc: http://aligulac.com/players/261/ ).
I don't really know if there's a good statistical way to fix this issue, however. If all the Danish people collectively decide to never play anyone outside of Denmark, some of them are going to end up with very high ratings, some of them are going to end up with very low ratings. Not a whole lot that can be done about it.
Very true, I believe a few pages back someone posted an explanation of how "islands" within a ranking system affects this. But please remember, this isn't "THE TRUTH". Bunny has very few matches, the problem is always when someone new enters in a scene with a lot of "new" players (non-ranked players starting on 1000). And yes it can become a problem if a subcommunity only plays each other.
On February 07 2013 09:55 Greenei wrote: hmm the 80%+ winrates seem to be really poorly predicted.
Also this could be because of the underlying model (maybe using normal distribution wasn't the best idea, logistic might be better after all... more on this in a later edition maybe), or because there really aren't that many games with 80%+ skill gap.
Logistic is what gets used in Chess ELO rankings specifically because it was found to be a better fit to the data. The only part the two functions handle significantly differently is the tail (so the 80% percentage gap cases).
I mean, specifically looking at the chart you posted:
Ignore the data noise for a moment and look at the fitted curve. The fitted curve starts dead even with the ideal curve, but slowly diverges (and from 50%-83% the actual game data very closely follows the fitted curve). This is what I'd expect to see if the probability distribution being used was collapsing too tightly.
The problem with the normal distribution is that it's not built for transitive operations.
If Life has an 84% chance of beating Sheth and Sheth has an 84% chance of beating Artosis Does this mean that Life has a 98% chance of beating Artosis?
Because this is what Normal distribution predicts when you combine two win percentages.
Two 31% chances combine into a 16% chance. Two 23% chances combine into a 7% chance. Two 16% chances combine into a 2% chance. Two 7% chances combine into a 0.1% chance.
Which...doesn't feel right to me. If we take the above Life/Sheth/Artosis numbers, but make Sheth 50% less likely to win against Life, and Artosis 50% less likely to win against Sheth, suddenly Artosis is 95% less likely to win against Life? The odds against him really increase by a factor of 20, when the odds of the two intermediate matchups only get worse by a factor of 2?
Compare to the logistic curve. When you combine two win percentages to predict a more distant match...
Two 31% chances combine into a 17% chance Two 23% chances combine into a 9% chance Two 17% chances combine into a 4% chance Two 9% chances combine into a 1% chance
Which is to say: if you take existing win percentages, and change them so that Sheth is 1/2 as likely to beat Life, and Artisis is 1/2 as likely to beat Sheth, then that makes Artosis is 1/4 as likely to beat Life (instead of 1/20 as likely).
Just intuitively, this just feels like a more reasonable way to combine percentages. If you told me with absolute certainty "Life beats sheth X% of the time" and "Sheth beats Artosis Y% of the time", and then asked me "What do you expect Life's winrate against Artosis to be?" My guess would be much closer to the logistic distribution than the normal distribution.
Yes, I know all this now. But thanks for putting it into words anyway.
I will try and see what happens. I expect some improvement, too.
On February 12 2013 07:23 felisconcolori wrote: This is insanely useful information, an excellent use of statistics, and I hope to (insert deity or otherworldly influence here) that you can get some academic use out of this project as well. (A paper, an essay, something.)
Thanks . I asked my advisor about whether or not the institute has a policy on publishing reports in topics that are outside the main area of research and he said it was fine as long as I found some statistician to look at it. (None of the people I usually work with know anything about statistics, lol.)
I've now converted everything to using the logistic distribution. You should see somewhat more conservative predictions now.
Updated prediction analysis:
It didn't help as much in the 80%+ regime as I thought it would. I'm thinking the problem is more related to the sudden arrival of new player pools (koreans in late 2010, kespa in 2012), and I may have to do something about that, such as one or more of:
- parameter smoothing over certain time periods - use time-dependent parameters
At the moment I'm a bit tired of the mathematical part and I'll go back to working on the website for a few weeks.
I have a question about the way the results are updated on the website. Because I was amazed by how fast the results of Francophone Championship Season 2 were considered this afternoon. Did someone submitted them ?
And a second related question : when someine submit you a result, do you add them manually ?
On February 17 2013 07:58 Boucot wrote: I have a question about the way the results are updated on the website. Because I was amazed by how fast the results of Francophone Championship Season 2 were considered this afternoon. Did someone submitted them ?
And a second related question : when someine submit you a result, do you add them manually ?
I think that was just kiekaboe being really fast. (No guarantees for the future.)
When someone non-admin submits games, we have to review them before they are properly added, yes.
Would it be possible for you to implement byes into the single elimination bracket prediction? Like if you write # or BYE or something. The ESF IPL6 seeding tourney going on at the moment, for example, has 7 players with one receiving a first round bye. Lots of other tourneys like the IEMs do it too, with 12-man playoffs.
In the case of 12-man, I assume you would format it like this:
Seed 1 BYE Seed 8 Seed 9
Seed 4 BYE Seed 5 Seed 12
Seed 2 BYE Seed 7 Seed 10
Seed 3 BYE Seed 6 Seed 11
Thus giving you:
1 vs. (8 vs. 9) 4 vs. (5 vs. 12) 2 vs. (7 vs. 10) 3 vs. (6 vs. 11)
Perhaps implementing BYE as a fake player with a rating of 0 in all match-ups?
On February 17 2013 08:27 MCXD wrote: Would it be possible for you to implement byes into the single elimination bracket prediction? Like if you write # or BYE or something. The ESF IPL6 seeding tourney going on at the moment, for example, has 7 players with one receiving a first round bye. Lots of other tourneys like the IEMs do it too, with 12-man playoffs.
Good point, that's another one of those features that the backend code already has but I haven't worked it into the frontend. It's on the list.
In the meantime you can use nemuke as a placeholder. He is currently the lowest ranked player (even counting the inactive ones), and so he will have the least impact. Poor guy. :D
There's no restriction on using the same player more than once, either.
As far as I can tell, putting in a dummy player then going down to the 'update the results' section and giving them a 2-0 win in their first match works too, it turns out. You can then just crop off the dummy player in the little code block you post on the forums too, for cleanliness.