I also have a FAQ. Constructive criticism is welcome, but I might already have answered your question there. Ideas on cool graphs and statistics to produce are definitely desired.
The current period ends on Wednesday, and I will publish a new list after that. Hopefully I can keep publishing new lists every two weeks and keep things relevant.
(I just had the domain lying around, the name doesn't mean anything.) (Yes, I'm not exactly a web designer. Hopefully it's still tolerable.)
Edit: Looks like my server plan is not up for this, lol.
On December 11 2012 02:35 Faust852 wrote: Why is Stephano 130th, behind players like Krass. He just beat 4 of the best code S players :o
The latest list is updated as of November 28th. There is a new one every two weeks. If you're referring to the Korea vs. The World games, they aren't included yet.
Stephano fell from 1st to 130th in four weeks (which includes winning LSC2 & and streaking against the Korean team at IPL 5 and falling out in group-stages in WCS & Dreamhack).
I think your system may need some tinkering, Though I agree that SC2 is indeed quite volatile, I think your system for determining current strength is overly reliant on the most recent results.
*I just saw that it does not include games from IPL 5
Does the rating for one period regard other periods at all? Regarding the "Stephano issue". Would you really take Stephanos actual rating (#130) for your new predictions or something different?
On December 11 2012 03:17 00Visor wrote: Does the rating for one period regard other periods at all? Regarding the "Stephano issue". Would you really take Stephanos actual rating (#130) for your new predictions or something different?
1. Yes, the method for computing the rating in one period depends on the rating from the preceding period. There is no forward dependence.
2. I agree that it looks funny that Stephano is so low, but yes, I would use it. I don't think there's a place for making subjective tweaks to individual players.
On December 11 2012 03:17 00Visor wrote: Does the rating for one period regard other periods at all? Regarding the "Stephano issue". Would you really take Stephanos actual rating (#130) for your new predictions or something different?
1. Yes, the method for computing the rating in one period depends on the rating from the preceding period. There is no forward dependence.
2. I agree that it looks funny that Stephano is so low, but yes, I would use it. I don't think there's a place for making subjective tweaks to individual players.
You definitely shouldn't be making tweaks to fit individual players. After all, it could be our judgment that is lagging behind
You're using the database from sc2charts/mystarcraft, correct?
Pretty cool man! Fact is, you just can't make a tournament perfect, like you would need to give ipl5 as one of the most stacked tournaments in history more points than the gsl code s which would be counter intuitive. So i'm fine with your way of doing it <3
On December 11 2012 03:17 00Visor wrote: Does the rating for one period regard other periods at all? Regarding the "Stephano issue". Would you really take Stephanos actual rating (#130) for your new predictions or something different?
1. Yes, the method for computing the rating in one period depends on the rating from the preceding period. There is no forward dependence.
2. I agree that it looks funny that Stephano is so low, but yes, I would use it. I don't think there's a place for making subjective tweaks to individual players.
You're using the database from sc2charts/mystarcraft, correct?
Yeah, it's the most complete and up-to-date I could find. Still not perfect though, I had to clean it up a bit and remove some duplicate entries. I'm also a bit concerned because, for example, they still don't have the Proleague games in there.
Having spent time researching providers for my own personal sites, I have never ran across this site before. After reading a few reviews, it looks like they have really poor customer service and poorly managed servers. I highly suggest finding a new provider, these guys don't seem very responsible nor is their pricing all that competitive.
Is there a definition of the Best? Most specialised somewhere? Can't seem to find any such description in FAQ.
Best overall Best vs. [Race] Most specialised v Race
Implies most wins? Wins/ losses against higher rated players? Appreciate clarification on this.. Thanks! (Edit: Hmm.. you mentioned Glicko. So I assume best vs should be "most improved after the new round of match calculation in terms of Glicko rating" Not sure what most specialised means tho....)
Anyway, from the way you are segmenting it (i.e. period by period) the rating is very short term focused i.e. difficult to see a long term rating for a player as it depends on their participation in tourneys to be listed.
Is there anyway to extend to multiple period rating calculation? Would be great for those who wants to manipulate or look up specific time frame (e.g. patch xxx time frame ELO/ WR)
Incredible site!!! Now I can get to quantify to my friends what I mean about protoss being the most underpowered race throughout sc2 history (see May 5, 2011 to May 31, 2012 in UP[underpowered] section).
On December 11 2012 12:19 BrokenMirage wrote: Incredible site!!! Now I can get to quantify to my friends what I mean about protoss being the most underpowered race throughout sc2 history (see May 5, 2011 to May 31, 2012 in UP[underpowered] section).
On December 11 2012 11:59 rrwrwx wrote: Seems weird that Leenock now has a higher rating than Mvp ever did.
Mvp plays mostly GSL
In a 1 year period, Leenock got 2 MLG wins, 1 IPL win, and 1 GSL and 1 MLG second place In a 1 year period, Mvp got 3 GSL wins, a GSL second place, and won an MLG, Blizzcon, and WCG.
Even if Mvp plays mostly GSL, he still achieved more results than Leenock in an equivalent amount of time.
On December 11 2012 11:59 rrwrwx wrote: Seems weird that Leenock now has a higher rating than Mvp ever did.
Mvp plays mostly GSL
In a 1 year period, Leenock got 2 MLG wins, 1 IPL win, and 1 GSL and 1 MLG second place In a 1 year period, Mvp got 3 GSL wins, a GSL second place, and won an MLG, Blizzcon, and WCG.
Even if Mvp plays mostly GSL, he still achieved more results than Leenock in an equivalent amount of time.
these ratings are mostly over small week periods, and leenock's had an intense leenockvember for sure.
On December 11 2012 03:17 00Visor wrote: Does the rating for one period regard other periods at all? Regarding the "Stephano issue". Would you really take Stephanos actual rating (#130) for your new predictions or something different?
1. Yes, the method for computing the rating in one period depends on the rating from the preceding period. There is no forward dependence.
2. I agree that it looks funny that Stephano is so low, but yes, I would use it. I don't think there's a place for making subjective tweaks to individual players.
You're using the database from sc2charts/mystarcraft, correct?
Yeah, it's the most complete and up-to-date I could find. Still not perfect though, I had to clean it up a bit and remove some duplicate entries. I'm also a bit concerned because, for example, they still don't have the Proleague games in there.
That's really a shame there isn't a better source for you, Proleague is essential. Good luck finding additional sources, maybe you can find a way to work with tlpd?
I'd love to see you get graphs like financial websites have for stocks, all slick with the zooms and compares and whatnot. I could procrastinate like crazy on a site with that.
Just a thought; since the ratings fluctuate quite much, is it possible to have an average rating for each player? For example the average rating for the past 3 or 6 months?
the advantage is that you put all the visualization on the client side, meaning you don't have the render PNG charts or send large HTML/XML data tables over the wire (JSON is much faster, especially when its compressed)
On December 11 2012 11:49 playa wrote: I don't get Ranged being the most specialized against T, when, according to TLPD, P vs T is his worst mu.
TLPD is really behind I think with their data. The other possibility is that he's on a real hot streak in vT, or that he's doing ok invT and terrible in the other matchups perhaps?
At first glance I thought this was awesome, but then I noticed the ridiculously low timeframe of just a few weeks. That timeframe doesnt really mean anything and ups and downs only show the daily form of the player or even some experimenting with new builds.
If you would use longer timeframes it would improve the actual precision of the data, since this much fluctuation is not really useful for predicting a trend.
No, I don't. Korean tournaments and players receive no special treatment. The GSL is difficult because good players play there; the players aren't good because they play in the GSL.
I can't agree with this and is a flaw in the system. I'm not sure how to fix it because it would be complicated, but not taking opponent's skill into consideration is a flaw. Giving the same weight for a win if someone beats say an MVP and if they beat a non or semi pro in an MLG open bracket is wrong.
Your reasoning of " The GSL is difficult because good players play there; the players aren't good because they play in the GSL" is also flawed because yes the players in GSL are good because they play in GSL because it's damn hard to qualify for it and to remain in it and not drop to code b.
No, I don't. Korean tournaments and players receive no special treatment. The GSL is difficult because good players play there; the players aren't good because they play in the GSL.
I can't agree with this and is a flaw in the system. I'm not sure how to fix it because it would be complicated, but not taking opponent's skill into consideration is a flaw. Giving the same weight for a win if someone beats say an MVP and if they beat a non or semi pro in an MLG open bracket is wrong.
Your reasoning of " The GSL is difficult because good players play there; the players aren't good because they play in the GSL" is also flawed because yes the players in GSL are good because they play in GSL because it's damn hard to qualify for it and to remain in it and not drop to code b.
nono i think you've got it wrong; what he means is that if you beat MVP in GSL finals, it's the same as beating MVP in some random online tournament.
On December 11 2012 11:53 lazyitachi wrote: Is there a definition of the Best? Most specialised somewhere?
Best: highest rating. Most specialised: largest number of standard deviations between general rating and matchup rating.
The tendency is that new players with few games have wild matchup ratings, while the more established players have pretty even ones. That's why they come off as "most specialised." I tried to fix this with the standard deviation weighing, but it didn't work satisfactorily.
On December 11 2012 12:26 Shellshock1122 wrote: We missed your stats in Code A yesterday Will they be making a return tonight?
Yeah, those have fallen wayside a bit. I'm sure I'll be there for the evening session.
On December 11 2012 13:36 Blisse wrote: Why Python 2.5.2, can I ask?
Hahaha... I had this virtual server and domain laying around, and I was halfway through the development before I realised it's on Debian Lenny, and so I'm condemned to use a bit outdated software. Thankfully all the libraries I needed were still available. Others have suggested me to use a different host, so I don't think I will bother trying to update this one.
No, I don't. Korean tournaments and players receive no special treatment. The GSL is difficult because good players play there; the players aren't good because they play in the GSL.
I can't agree with this and is a flaw in the system. I'm not sure how to fix it because it would be complicated, but not taking opponent's skill into consideration is a flaw. Giving the same weight for a win if someone beats say an MVP and if they beat a non or semi pro in an MLG open bracket is wrong.
Your reasoning of " The GSL is difficult because good players play there; the players aren't good because they play in the GSL" is also flawed because yes the players in GSL are good because they play in GSL because it's damn hard to qualify for it and to remain in it and not drop to code b.
Like opterown said, I do take the opponent's skill into consideration. I just give the same consideration to wins over Mvp in the GSL as I do to wins over Mvp at your grandma's dinner party.
I find your rating system underwhelming, if not inconsistant with its purpose:
I'll assume that you created this site and rating in order to have a base for your prediction tool (which is great by the way). The whole sense of predicting player performance is based on the fact that you think that skill is robust, namely that your win from yesterday means (somehow) that you're more likely to win tomorrow. On the other hand, if you chose a volatile rating system (like you did), it means, on the contrary that the skill is volatile. If you think that the game is too volatile, you just cannot make prediction.
I feel like it's better to have a maybe less accurate but more robust rating, so that the predictions would not be overly different from one period to the next because of a bad week.
EDIT : maybe I was too negative. I really like what you did and I will probably check both your proba to win and your site for rating (since TLPD is not at all up to date). I just don't think you chose a good rating system.
I saw a featured streamer I'm not very familiar with... So I decided to check how the player ranked here. It was very cool to see their vs P/T/Z rank etc, however it would be really cool to see an additional
Rating vT XXXX (#yyy, #YY [race])
where you would get an additional rating for how high a player ranks within their own race. Maybe it would be simpler/cleaner to just allow a sort by each rating.
Its a beautiful homepage. Looks neat and professional without the ever-present, dull character/artwork testosterone stuff. The data and stats are really awesome! Good work!! I have it bookmarked.
I love your work on TL threads when you put some stats for betting on a player and this website must have been very difficult to build but how a guy that nobody knows like MilkEA (#101) can be higher placed than players like IdrA, Stephano, NaNiwa, TheStC... I get the period reason but even with it, that makes no sense.
I hope you can find some adjustments for your rankings so we could have an undeniable ranking finally, what everybody wants since a very long time.
This is a great site, nice work. I do have a suggestion and that is being able to compare players side by side. For instance if there's a match between Parting vs Life it would great to be able to see their graphs and results side by side instead of having to open two tabs.
Really nice work :O added to favorites ! I just don't understand how come Mvp doesn't even make it to top 10, and someone like INnoVation who i've never heard of is ranked no. 4 O_o
Can you add an option to filter the list to show only non-Koreans? Useful for foreign events etc. Can you see the game results that this is based on? It the source code available?
Amazing work. Tought about doing something similiar few days ago, realized it's too much work for me. Some suggestions:
1) Like already mention: List the opponents in a period. 2) While you`re at it: Add the expected winrate against each opponent. 3) Performance in a period would be nice. 4) I would love to see a seperate online/offline rating besides a combined (and perhaps weighted) overall-rating.
I don't really like that optimize your model for prediction power. In order to predict the outcome of a game, I would add more factors (online/offline; tournament structure; how far are we in the tournament?; Travel?, ...). Rating (which should measure skill) is just one factor (albeit the most important) for predicting the outcome of a game.
Neat tool, but I'm not sure I can take any rating system seriously that has Stephano so low... He's obviously a top 40 player. Is there something I'm missing?
Love the site. Would like to see the highest rating a player has achieved on their individual pages, then you can compare their current performance to their peak easily. Also would be curious about having the all-time highest rating listed on the front page, so we can see how the current leaders compare to the all-time best.
When it says 2 week rating periods, what exactly does that entail? I imagine it can't be just games played within those 2 weeks, so what sort of emphasis do you have on older games vs newer ones?
On December 13 2012 06:23 Apolo wrote: Really nice work :O added to favorites ! I just don't understand how come Mvp doesn't even make it to top 10, and someone like INnoVation who i've never heard of is ranked no. 4 O_o
you are talking bogus, bro.
also read the faq, the rating system is really volatile and mvp hasn't done so well last season.
way too volatile imo. stphano at 130? yeah sure...
Also Stephano is indeed a little bit too low, even though i don't particularly like him. You seem to have altered the glicko rating a lot, as at sc2charts.net he's #13.
On December 13 2012 08:03 BluePanther wrote: Neat tool, but I'm not sure I can take any rating system seriously that has Stephano so low... He's obviously a top 40 player. Is there something I'm missing?
Yes, you're placing your own opinions higher than the actual data. Stephano has lost 60% of his games in the last month. He's on a downtrend, and that's what the data is showing. The numbers don't lie.
As someone with a mathematics degree, I love the site, awesome job. The only thing I'd like to see is Ratings Peak on the player page. Otherwise, very informative, and I like the period size, good call on that.
Whoa, just realised there have been more posts here.
Thanks for all the feedback and nice words . What I think I will prioritise next is:
– Overview over games. (So that it's transparent how a player has gotten the rating he or she has.) – Prediction tool for best-of-N matches (nothing more fancy than that to begin with.)
I hear what you're all saying about the volatility, and I guess maybe I went a bit overboard. This system is optimised for predictiability, and one that is less volatile will tend to suffer more upsets than it should, but it seems what people really want is a system that is a little more in line with how we judge performances over time. Which is fair. So here is what I could do:
(a) Lower the volatility in the future. This will stabilise things a bit, but will negatively impact predictive power. (b) Lower the volatility and recompute all ratings from the start. It's difficult to say what the latest rating list will look like in this case. (c) Publish two concurrent ratings, one volatile for predictions and one less volatile for rankings. I'm afraid this might confuse people. (d) ... any other ideas?
On December 13 2012 06:23 Apolo wrote: Really nice work :O added to favorites ! I just don't understand how come Mvp doesn't even make it to top 10, and someone like INnoVation who i've never heard of is ranked no. 4 O_o
On December 13 2012 23:02 TheBB wrote: Whoa, just realised there have been more posts here.
Thanks for all the feedback and nice words . What I think I will prioritise next is:
– Overview over games. (So that it's transparent how a player has gotten the rating he or she has.) – Prediction tool for best-of-N matches (nothing more fancy than that to begin with.)
I hear what you're all saying about the volatility, and I guess maybe I went a bit overboard. This system is optimised for predictiability, and one that is less volatile will tend to suffer more upsets than it should, but it seems what people really want is a system that is a little more in line with how we judge performances over time. Which is fair. So here is what I could do:
(a) Lower the volatility in the future. This will stabilise things a bit, but will negatively impact predictive power. (b) Lower the volatility and recompute all ratings from the start. It's difficult to say what the latest rating list will look like in this case. (c) Publish two concurrent ratings, one volatile for predictions and one less volatile for rankings. I'm afraid this might confuse people. (d) ... any other ideas?
On December 13 2012 23:02 TheBB wrote: Whoa, just realised there have been more posts here.
Thanks for all the feedback and nice words . What I think I will prioritise next is:
– Overview over games. (So that it's transparent how a player has gotten the rating he or she has.) – Prediction tool for best-of-N matches (nothing more fancy than that to begin with.)
I hear what you're all saying about the volatility, and I guess maybe I went a bit overboard. This system is optimised for predictiability, and one that is less volatile will tend to suffer more upsets than it should, but it seems what people really want is a system that is a little more in line with how we judge performances over time. Which is fair. So here is what I could do:
(a) Lower the volatility in the future. This will stabilise things a bit, but will negatively impact predictive power. (b) Lower the volatility and recompute all ratings from the start. It's difficult to say what the latest rating list will look like in this case. (c) Publish two concurrent ratings, one volatile for predictions and one less volatile for rankings. I'm afraid this might confuse people. (d) ... any other ideas?
Those are great ways to streamline, I think.
Perhaps you could add some options to filter the data. Your periods are rather small in terms of how you're actually presenting the data you have, perhaps you could make it adjustable, or give a few options.
Say you'd have: A: The current short periods B: A monthly tracker C: The last 6 months
Or any variation thereof I suppose. Again though, perhaps that clutters things up too much/makes it a lot more difficult to design the website.
At the guy who said to separate Korean tournaments and foreign ones. I actually like that the BB isn't doing that thus far. Considering many of the high end tournaments are almost exclusively Korean nowadays the TLPD Korean/International distinction almost seems updated in terms of getting a good sense of where people are at in certain matchups.
On December 13 2012 23:02 TheBB wrote: Whoa, just realised there have been more posts here.
Thanks for all the feedback and nice words . What I think I will prioritise next is:
– Overview over games. (So that it's transparent how a player has gotten the rating he or she has.) – Prediction tool for best-of-N matches (nothing more fancy than that to begin with.)
I hear what you're all saying about the volatility, and I guess maybe I went a bit overboard. This system is optimised for predictiability, and one that is less volatile will tend to suffer more upsets than it should, but it seems what people really want is a system that is a little more in line with how we judge performances over time. Which is fair. So here is what I could do:
(a) Lower the volatility in the future. This will stabilise things a bit, but will negatively impact predictive power. (b) Lower the volatility and recompute all ratings from the start. It's difficult to say what the latest rating list will look like in this case. (c) Publish two concurrent ratings, one volatile for predictions and one less volatile for rankings. I'm afraid this might confuse people. (d) ... any other ideas?
I would be really curious about c. Not as a permanent solution, but just to see how the rankings differ. My guess is that, no matter the ranking, there will be people complaining because some long-time player is not high enough, or some up and coming player is ranked too low, etc., and I kinda doubt a different ranking will change that. Still, I'd be curious to see the difference.
On December 15 2012 00:54 MarcoBrei wrote: nice work, but sc2proranks.com is better.
I would have expected some form of justification for this claim, but having checked out the site I can see why you didn't offer any.
Simply put, they don't have any serious justification for the arbitrary values they assign in their rankings process. The process also contains obvious flaws, such as treating all placements of a given sort equally rather than assessing runs by the actual opponents beaten along the way.
Why should anyone trust them over a method that aims for and tests itself against accurate predictions?
I added some results lists, so that people can see where the numbers come from, more or less.
If you now open a player page you will see, immediately below the graph, a list of games that have been added and which are scheduled for inclusion in the next period.
In the table for historical data, you will also see a "details" link. If you click it, you can see some information about the rating calculation for that player for that period. It shows which games were included, the average rating of the opposition, and the expected score for the player given the opposition. You should see that the rating adjustments correlate with how much the player over- or underperformed. Note that the correlation isn't necessarily exact, since there are a few other factors that come into play (see the FAQ for more details on those.)
I also want to thank Conti, Grovbolle and KristofferAG for aiding me with populating the database with results.
On December 15 2012 00:54 MarcoBrei wrote: nice work, but sc2proranks.com is better.
sc2proranks.com appear to be using a totally different idea. Of course while you are free to enjoy whatever system you desire, I don't particularly care for their method. Aligulac is more "results-oriented" (a player is rated above another if they can be expected to win vs. them right now) versus theirs which I would call "impact-oriented" (a player is rated above another if they were recently more in the spotlight), and like frogrubdown said, it's not at all clear how they come up with the values they use.
Oh, and aligulac.com is totally better looking. </shameless plug>
This looks like a superb site! I've always liked the statistics you post in LR threads (they tend to be quite accurate as well), so with the new more up to date data base it'll be even better now. Good job!!
One thing would be nice: to be able to sort for the three vs race win rates, so you could quickly view the best vs Z players for example.
If you now open a player page you will see, immediately below the graph, a list of games that have been added and which are scheduled for inclusion in the next period.
In the table for historical data, you will also see a "details" link. If you click it, you can see some information about the rating calculation for that player for that period. It shows which games were included, the average rating of the opposition, and the expected score for the player given the opposition. You should see that the rating adjustments correlate with how much the player over- or underperformed. Note that the correlation isn't necessarily exact, since there are a few other factors that come into play (see the FAQ for more details on those.)
I also want to thank Conti, Grovbolle and KristofferAG for aiding me with populating the database with results.
On December 15 2012 00:54 MarcoBrei wrote: nice work, but sc2proranks.com is better.
sc2proranks.com appear to be using a totally different idea. Of course while you are free to enjoy whatever system you desire, I don't particularly care for their method. Aligulac is more "results-oriented" (a player is rated above another if they can be expected to win vs. them right now) versus theirs which I would call "impact-oriented" (a player is rated above another if they were recently more in the spotlight), and like frogrubdown said, it's not at all clear how they come up with the values they use.
Oh, and aligulac.com is totally better looking. </shameless plug>
So true!
I'd love to help out in some fashion, but am not the most mathematically inclined of folks. Hope feedback/ideas were helpful man, really think this project has some potential and the site looks damn sexy.
On December 15 2012 00:54 MarcoBrei wrote: nice work, but sc2proranks.com is better.
I would have expected some form of justification for this claim, but having checked out the site I can see why you didn't offer any.
Simply put, they don't have any serious justification for the arbitrary values they assign in their rankings process. The process also contains obvious flaws, such as treating all placements of a given sort equally rather than assessing runs by the actual opponents beaten along the way.
Why should anyone trust them over a method that aims for and tests itself against accurate predictions?
Arbitrary values? Can you say what are you talking about? Did you actually read the FAQ? Tournaments are weighted based on prize pool, which is related to tournament relevance. Many people seems to prefer ratings based on what opponent some player has beaten, I simply can't understand why this is so popular, because it makes no sense if you think for more than 5 minutes. You can just look at the "unnoficial world champion" thread, which takes this concept to the limit. The so called unnoficial world champion is a joke. You said about predictions. If a rating system is designed mainly to make predictions, I can assure it will fail miserably.
Finally, let's take a look to the top 5 players of each site:
Aligulac: Sniper PartinG Leenock Life Effort
Sc2proranks: Hero PartinG Rain Taeja Leenock
Hero, top 8 in GSL, top 8 in BWC, champion of the last DreamHack and champion of the last NASL, is not even in top5 players of Aligulac. And Effort appears in top 5. Effort! Really?
Aligulac is a very good site, but his concept (as well other rating systems) seems to result in a weird list of top players.
On December 15 2012 00:54 MarcoBrei wrote: nice work, but sc2proranks.com is better.
I would have expected some form of justification for this claim, but having checked out the site I can see why you didn't offer any.
Simply put, they don't have any serious justification for the arbitrary values they assign in their rankings process. The process also contains obvious flaws, such as treating all placements of a given sort equally rather than assessing runs by the actual opponents beaten along the way.
Why should anyone trust them over a method that aims for and tests itself against accurate predictions?
Arbitrary values? Can you say what are you talking about? Did you actually read the FAQ? Tournaments are weighted based on prize pool, which is related to tournament relevance.
Yes I read the FAQ, which I linked to to illustrate how arbitrary their values are. For instance:
How the points of one tournament are distributed to players? Is it based on the prize gained of each player?
Actually no. Only the overall prize pool is used to determine the relevance of the tournament. The player position in one tournament will give points in this way: 1st: 100%; 2nd: 70%; 3rd and 4th: 45%, 5th to 8th: 25%; 9th to 16th: 10%
Want to guess where these numbers come from? If your answer wasn't 'Their ass', then you guessed wrongly.
The mere fact that the values have a vague tie to something correlated with player ability, positions in prestigious tournaments, does nothing to make the values assigned to said positions non-arbitrary. There is no justification for any numerical value that is used anywhere in the system, not even the cut-off of only giving any credit whatsoever to at least ro16.
Many people seems to prefer ratings based on what opponent some player has beaten, I simply can't understand why this is so popular, because it makes no sense if you think for more than 5 minutes.
People like ratings based who you've beaten because being good at starcraft is having a disposition or skill that one expects to normally and in the long run result in wins. Beating the best player in the world to get to the round of 32 is extremely strong evidence that you have such a skill. But, if you get eliminated in the next round, you will get zero credit for this on your favored system. Beating some of the worst players in the world to get to the round of 16, on the other hand does not provide nearly as much evidence of skill.
Every run to a given position in a tournament involves a different difficulty level depending on whom you faced. There is no reason not to tie a player ranking directly to the actual players they have faced.
You can just look at the "unnoficial world champion" thread, which takes this concept to the limit. The so called unnoficial world champion is a joke.
What!?!? That isn't this concept taken to the extreme, because it considers an absurdly small subset of the total gameplay evidence for who is the best player. The concept taken to the extreme is something like how the rankings on Aligulac work, because they take account of all the evidence from wins and losses, not just those against a single player selected for no particularly good reason.
You said about predictions. If a rating system is designed mainly to make predictions, I can assure it will fail miserably.
Once again, you provide no reasons for this claim because you have none. Successful predictions are how you test statistical models like these. That's how it works in baseball stats, poll-based election prediction models (which were monumentally successful this past cycle), everywhere. If your non-sense claim were correct, then it would undermine a lot more than this sc2 ranking system. And yet somehow, with no specific reason given, you think you can a priori rule out the success of prediction based models.
Finally, let's take a look to the top 5 players of each site:
Aligulac: Sniper PartinG Leenock Life Effort
Sc2proranks: Hero PartinG Rain Taeja Leenock
Hero, top 8 in GSL, top 8 in BWC, champion of the last DreamHack and champion of the last NASL, is not even in top5 players of Aligulac. And Effort appears in top 5. Effort! Really?
Aligulac is a very good site, but his concept (as well other rating systems) seems to result in a weird list of top players.
You once again demonstrate an amazingly powerful a priori insight that your fellows lack. Tell me, if you can be so confident about which players are actually the best currently without consulting the best available evidence, then why do you need any model to make predictions?
This is not to say that one should have a credence of 1 that the aligulac rankings are correct, especially given sc2's volatility. But the idea of evaluating solely against the perceived best players is absurd. People get perceived as the best (among other noisy factors) by winning high-profile events, regardless of whether their runs to get there provide the best evidence of their goodness. We should expect our perceptions of who is the best to be flawed to the extent that it relies directly on such unreliable evidence.
Hero is high because he goes to high-paying but relatively easy tournaments, he does this because he has the option. See how well he fares on the Korean TLPD (hint: Not that well).
On December 15 2012 23:59 MarcoBrei wrote: Hero, top 8 in GSL, top 8 in BWC, champion of the last DreamHack and champion of the last NASL, is not even in top5 players of Aligulac. And Effort appears in top 5. Effort! Really?
Aligulac is a very good site, but his concept (as well other rating systems) seems to result in a weird list of top players.
Not your achievements are relevant, but WHO you win against and how often you lose. Hero plays tons of tournaments with some disappoint results (going 1-3 at MLG and so on) and the foreign tournaments don't have the toughest competition. HerO beat 1 korean at BWC, and 2 koreans (ForGG not really that notable at the moment) at Dreamhack. Its pretty logical he is not in the Top5.
Efforts results seem to be greatly influented by the MLGvsProleague tournament.
Yeah, like I said, that system rewards namepower and -recognition. You get recognised if you win a lot of titles, so a ranking of titles (which is what it is, essentially) will obviously correlate well with the public view of who the top players are. And that's fine, really, just keep in mind what you're looking at. Sc2ProRanks is a ranking of who actually won tournaments lately and Aligulac is an attempt to rate who would win in a hypothetical game between player X and Y right now. If you tried to use the former to predict game outcomes I daresay it would fail quite spectacularly.
Because of this they aren't really comparable, in my opinion. It was never my intention to design a system that confirmed what we already know. I don't find that very interesting, and I don't find their system very interesting, either.
I claim (and I haven't even seen his games) that Effort is probably better than many is giving him credit for.
I have not completely dived into the rankings system yet.. but one thing that sticks out for me and bothers me.. is that you have decided that tournaments are to be weighted for relevence based on prizing. This is a hugely subjective variable that often times can have very little to do with actual skill levels that attend a specific event. A better idea would have been to set up a standard, based mostly on numerical values to determine the 'tier' of each event. Once a baseline structure for this has been established, then each tournament can be placed in a 'Tier' level based on the participants that attend.
Also, and I have touched on this in the past when talking about rankings, but any rankings system that uses a raw value to determine overall points is extremely flawed for eSports. Meaning, that if a player attends 10 events and earns points at each, regardless of how they finish, they will most likely be ranked higher then a player that attends only 5 events and yet has much better relative finishes.
As soon as I can get the time needed, I will release full details of my Global Points System that works to alleviate both of the issues that I have addressed.
Other then those 2 problems, the site looks good and any rankings at this point are better then none (in most cases).
Good stuff and I am know it must have taken you a good amount of time.
On December 15 2012 00:54 MarcoBrei wrote: nice work, but sc2proranks.com is better.
I would have expected some form of justification for this claim, but having checked out the site I can see why you didn't offer any.
Simply put, they don't have any serious justification for the arbitrary values they assign in their rankings process. The process also contains obvious flaws, such as treating all placements of a given sort equally rather than assessing runs by the actual opponents beaten along the way.
Why should anyone trust them over a method that aims for and tests itself against accurate predictions?
Arbitrary values? Can you say what are you talking about? Did you actually read the FAQ? Tournaments are weighted based on prize pool, which is related to tournament relevance. Many people seems to prefer ratings based on what opponent some player has beaten, I simply can't understand why this is so popular, because it makes no sense if you think for more than 5 minutes. You can just look at the "unnoficial world champion" thread, which takes this concept to the limit. The so called unnoficial world champion is a joke. You said about predictions. If a rating system is designed mainly to make predictions, I can assure it will fail miserably.
Finally, let's take a look to the top 5 players of each site:
Aligulac: Sniper PartinG Leenock Life Effort
Sc2proranks: Hero PartinG Rain Taeja Leenock
Hero, top 8 in GSL, top 8 in BWC, champion of the last DreamHack and champion of the last NASL, is not even in top5 players of Aligulac. And Effort appears in top 5. Effort! Really?
Aligulac is a very good site, but his concept (as well other rating systems) seems to result in a weird list of top players.
Wow you really fall short on understanding statistics.
Prediction is the best way to determine one's true skills. The best player today is the player who has the highest probability of winning tomorrow. That's it.
Tournament price pools are still arbitrary and can be very misleading.
You mention that you are borrowing heavily from Glicko; I'm assuming from Glicko-2?
Why not show the player's volatility rating/range in addition to their score? I see players start with a 1000 rating, are they starting with an RD of 350? You mention assigning and using category modifiers, how do you determine the player's category?
Apologies if this was explained in your write-up; I read it twice over trying to reference between what I knew about Glicko(-1). I'm a math nerd at heart, but not in education.
On December 16 2012 01:00 csn_JohnClark wrote: I have not completely dived into the rankings system yet.. but one thing that sticks out for me and bothers me.. is that you have decided that tournaments are to be weighted for relevence based on prizing.
On December 16 2012 01:00 csn_JohnClark wrote: I have not completely dived into the rankings system yet.. but one thing that sticks out for me and bothers me.. is that you have decided that tournaments are to be weighted for relevence based on prizing.
No he doesn't?
I believe he was referring to the other ratings site that was being argued about before, sc2proranks.com.
New feature: predict matches. (Will expand with fancy graphics when I get time.)
On December 16 2012 06:11 Nublakhan wrote: You mention that you are borrowing heavily from Glicko; I'm assuming from Glicko-2?
Why not show the player's volatility rating/range in addition to their score? I see players start with a 1000 rating, are they starting with an RD of 350? You mention assigning and using category modifiers, how do you determine the player's category?
Apologies if this was explained in your write-up; I read it twice over trying to reference between what I knew about Glicko(-1). I'm a math nerd at heart, but not in education.
No, Glicko-1. That was what I tried first and I got it working, so I didn't try anything more fancy.
I don't show the rating deviation because for almost everyone it's at the floor. (I had to use a pretty high floor to keep up with rapid changes. That's why the ratings are so volatile, and that's why it's so easy for players to keep their RD at the minimal allowed level.) I just didn't think it'd be interesting information.
Actually players start with a rating of 0 and a deviation of 0.5. The ratings you see on the site are scaled by adding 1 and multiplying by 1000, since this creates a scale that people are familiar with. The top players are usually around 1.5 in the internal scale. This corresponds to starting RD of 500. I use a RD floor of 0.13 (=130). I am debating lowering it to 0.1 in the future, and increasing the decay a bit. Presumably the scene has "settled" enough now to allow something like this.
On December 16 2012 06:44 TheBB wrote: New feature: predict matches. (Will expand with fancy graphics when I get time.)
Great job!
New feature request: Do the same just for groups: Let people enter a few names (4, usually), choose the format from a dropdown menu (round-robin, GSL, etc.), and have it calculate the group results.
Of course, the next step would be to calculate entire tournaments!
This looks interesting, but it seems very volatile. Sniper wins the GSL so his rating catapults from 2513 to 2954, but then he loses a BO1 to Gumiho, who is a measly 18th, and his rating plummets to 2068. Am I missing something here?
On December 11 2012 00:59 TheBB wrote: (Yes, I'm not exactly a web designer. Hopefully it's still tolerable.)
If you did the web design, I'd say that's pretty good. I like the visual style. Even code-wise it's better than teamliquid.net's (main page) mess— although that's not saying much.
If you're looking for feedback though: - try to use em instead of pixel for sizes (like widths) of containers containing text though, so that regardless of the font and text size used, it will scale properly for the user. - I personally think making the whole row a link is a no no. Maybe it's just me, but I find it to be really annoying. - There's no reason [that I can think of] not to use HTML table elements (table, tr, td, etc) to display your data; that's what tables are for.
Non html/css related: - a sort function would be nice - so would race-specific stats, or other stats in general
Personally I'm not into following this sort of thing at all, but I'm sure many others appreciate the effort you're putting in.
On December 17 2012 11:38 JohnAdams wrote: This looks interesting, but it seems very volatile. Sniper wins the GSL so his rating catapults from 2513 to 2954, but then he loses a BO1 to Gumiho, who is a measly 18th, and his rating plummets to 2068. Am I missing something here?
I think you are. Sniper has never been 2954? He's one of those whose rise has been very steady.
On December 11 2012 00:59 TheBB wrote: (Yes, I'm not exactly a web designer. Hopefully it's still tolerable.)
- try to use em instead of pixel for sizes (like widths) of containers containing text though, so that regardless of the font and text size used, it will scale properly for the user. - I personally think making the whole row a link is a no no. Maybe it's just me, but I find it to be really annoying. - There's no reason [that I can think of] not to use HTML table elements (table, tr, td, etc) to display your data; that's what tables are for.
Non html/css related: - a sort function would be nice - so would race-specific stats, or other stats in general
1. Ok! 2. Fair enough. 3. Well, you can't make the whole row a link with an HTML table. 4, 5. Yeah, it's "in the pipeline" so to speak. I try to dedicate an hour or two each day but I can't always do that.
On December 18 2012 08:25 opterown wrote: ok hmm after looking at recent results i think you may have them a bit too volatile, haha
Well, do I have good news for you then.
I made some tweaks today and I think I can make it a bit less volatile without impacting the predictive power. There are four parameters:
– RD (rating deviation) decay. How fast does uncertainy grow when a player doesn't play. Currently 0.01. – Initial RD. How uncertain is the rating of a new player. Currently set at 0.5. – Minimal RD. Currently set at 0.13. – Period length. Currently 14 days. I won't touch this one.
A player's rating changes quickly if his or her RD is high. Thus a large minimal RD will create volatility among "stable" players, a large RD decay will create volatility among players who play less frequently, and a large initial RD will create volatility amont totally new players.
Here is a plot showing the predictive power of the original system.
How did I make this? Well, I went through every game in the training data set (containing almost 50000 games), and computed the ratings at the time the game was played, and assigned it a "slot" corresponding to how certain it was that the assumed stronger player would win. The slots are ranges of probabilities, i.e. 50-55%, 55-60% and so on. This is the "predicted winrate" of the x-axis. The black jagged line shows the actual winrate for each slot, and the dashed black line (slanting the other way) shows the number of games that was associated to each slot.
The dashed blue line shows the linear fit weighted by number of games, and the dashed red line shows the "ideal," namely actual winrate=predicted winrate over the board.
So you can see that the system works pretty well already, but ok, so maybe it's too volatile. Can we fix that?
This uses a higher decay rate and a lower minimum. Essentially this means that we allow the ratings of the most frequently playing players to become "more certain" but that the information of their skill level decays faster when they don't play.
Here I have upped the initial RD to 0.6 to try to fix the slight offset. Right now I think it looks almost perfect.
So this is what will happen. In a week, when the time comes to publish the new list, I will recompute all ratings, using a minimal RD of 0.06, initial RD of 0.6 and RD decay of 0.04.
What you should see is that the ratings of the most frequent players will be much more stable, but the ratings of players who play rarely will become unstable faster than before. Additionally, new players will adjust somewhat quicker than before.
Also, Conti has added a ton of missing SPL games to the database, so hopefully that will help with the Kespa players.
is there any way of calculating the new rating and new predictions yourself? with new i mean the "Results for next list" games. could you tell us how to calculate those ratingchanges, so i can do it myself when i need to?
On December 22 2012 23:24 Greenei wrote: is there any way of calculating the new rating and new predictions yourself? with new i mean the "Results for next list" games. could you tell us how to calculate those ratingchanges, so i can do it myself when i need to?
Hi, I am one of the contributors of games to the site, and currently as far as I know it works in a way where we add data directly to his database, but I am not sure if the functionality/logic is located in an online version, obviously TheBB will be able to tell you, but since not all games are updated the second they are played, you will not have a "clean" rating because some games might not be added yet even though they have been played if you get to update the rating yourself .
im glad you got a nice shout out by tlo at hsc.. im a statistics mayor and love to see some mathematical work, dont get your model over saturated, just stick to your data and keep it simple.. for example the best football predictions are only based on market values, if you base your research on 'upsets' you might get specific results right, but overall it gets off very fast.
-The results are much too onesided when comparing mid-tier players, and top players. For example, your prediction about Leenock vs Sting for Fight Club was overwhelmingly in Leenock's favour(97.6%). You know that Sting won, of course, and it was an upset, but not as much of an upset as your prediction made it sound. Starcraft 2 is a game where most top-tier, or mid-tier players can take games of each other seemingly at random. You should probably move the predictions towards the mean.
-The predictions don't seem to take into account head-to-head results, which can somehow defy the players rankings or win-rates in that watchup. For example Goody's win-loss record versus Stephano is 7W-9L, while the (generally considered) much better player, PuMa, is only 2W-6L.
On December 24 2012 12:19 BrokenMirage wrote: -The results are much too onesided when comparing mid-tier players, and top players. For example, your prediction about Leenock vs Sting for Fight Club was overwhelmingly in Leenock's favour(97.6%). You know that Sting won, of course, and it was an upset, but not as much of an upset as your prediction made it sound. Starcraft 2 is a game where most top-tier, or mid-tier players can take games of each other seemingly at random. You should probably move the predictions towards the mean.
It should be pushed towards 50/50 if there's a higher uncertainty for the players, shouldn't it?
On Thursday when the new list comes I will recompute all the ratings from the start using some different parameters. Hopefully this will help with many of your issues.
is there any way of calculating the new rating and new predictions yourself?
Yeah, but it involves a bit of programming. There is no closed form expression. This feature would be kinda cool to add to the site, I agree.
Is there a way to see ELO for let's say top 10 players through 2 years on the same chart?
Not yet.
The results are much too onesided when comparing mid-tier players, and top players. For example, your prediction about Leenock vs Sting for Fight Club was overwhelmingly in Leenock's favour(97.6%).
This is because the ratings adjust very quickly, so a player on a hot streak will be very highly rated. When the new ratings come on Thursday, they won't be so volatile, so presumably the top will be closer to the mid tier. Maybe.
I don't want to just adjust my predictions toward the mean based on gut feeling. Based on historical data, the assumed stronger player wins almost exactly as many games as he or she should according to the ratings, if not more in some cases.
It should be pushed towards 50/50 if there's a higher uncertainty for the players, shouldn't it?
Yes.
The predictions don't seem to take into account head-to-head results, which can somehow defy the players rankings or win-rates in that watchup.
That's right. There is a simple Bayesian model that can do this, but I need to work out a good way to weigh past results (recent ones vs. older).
The predictions don't seem to take into account head-to-head results, which can somehow defy the players rankings or win-rates in that watchup.
That's right. There is a simple Bayesian model that can do this, but I need to work out a good way to weigh past results (recent ones vs. older).
I don't think it's a good idea to take head-to-head into consideration, because even though there do seem to be some players who struggle against a particular opponent in a match up where they do quite well otherwise (hello MKP vs Mvp :p), it doesn't seem to be a factor the majority of the time.
On December 26 2012 04:02 OrbitalPlane wrote: wow this is really impressive. i wish we would had a ladder like that. Blizzard hire that guy and make it happen!
Well, Blizzard's matchmaking system on ladder is already extremely good, isn't it?
On December 26 2012 04:02 OrbitalPlane wrote: wow this is really impressive. i wish we would had a ladder like that. Blizzard hire that guy and make it happen!
Well, Blizzard's matchmaking system on ladder is already extremely good, isn't it?
the match making is great. The rating system is horrible. (Even if you take out the bonus pool which inflates the rating.) It's impossible to track your own development with the blizzard ranking.