|
Yeah, like Conti said, the numbers listed next to each opponent is the rating of the opponent (in the given matchup) at the time the match was played.
Edit: The traffic stats skyrocket when this thread is bumped at a US-friendly time, lol.
|
opterown
Australia54784 Posts
ok hmm after looking at recent results i think you may have them a bit too volatile, haha
|
On December 18 2012 08:25 opterown wrote: ok hmm after looking at recent results i think you may have them a bit too volatile, haha Well, do I have good news for you then.
I made some tweaks today and I think I can make it a bit less volatile without impacting the predictive power. There are four parameters:
– RD (rating deviation) decay. How fast does uncertainy grow when a player doesn't play. Currently 0.01. – Initial RD. How uncertain is the rating of a new player. Currently set at 0.5. – Minimal RD. Currently set at 0.13. – Period length. Currently 14 days. I won't touch this one.
A player's rating changes quickly if his or her RD is high. Thus a large minimal RD will create volatility among "stable" players, a large RD decay will create volatility among players who play less frequently, and a large initial RD will create volatility amont totally new players.
Here is a plot showing the predictive power of the original system.
How did I make this? Well, I went through every game in the training data set (containing almost 50000 games), and computed the ratings at the time the game was played, and assigned it a "slot" corresponding to how certain it was that the assumed stronger player would win. The slots are ranges of probabilities, i.e. 50-55%, 55-60% and so on. This is the "predicted winrate" of the x-axis. The black jagged line shows the actual winrate for each slot, and the dashed black line (slanting the other way) shows the number of games that was associated to each slot.
The dashed blue line shows the linear fit weighted by number of games, and the dashed red line shows the "ideal," namely actual winrate=predicted winrate over the board.
So you can see that the system works pretty well already, but ok, so maybe it's too volatile. Can we fix that?
This uses a higher decay rate and a lower minimum. Essentially this means that we allow the ratings of the most frequently playing players to become "more certain" but that the information of their skill level decays faster when they don't play.
Here I have upped the initial RD to 0.6 to try to fix the slight offset. Right now I think it looks almost perfect.
So this is what will happen. In a week, when the time comes to publish the new list, I will recompute all ratings, using a minimal RD of 0.06, initial RD of 0.6 and RD decay of 0.04.
What you should see is that the ratings of the most frequent players will be much more stable, but the ratings of players who play rarely will become unstable faster than before. Additionally, new players will adjust somewhat quicker than before.
Also, Conti has added a ton of missing SPL games to the database, so hopefully that will help with the Kespa players.
|
Just properly flicked through the site for the first time and I really like the work that's going into it.
|
Awesome. Nice design too. Good job, thx!
|
is there any way of calculating the new rating and new predictions yourself? with new i mean the "Results for next list" games. could you tell us how to calculate those ratingchanges, so i can do it myself when i need to?
|
On December 22 2012 23:24 Greenei wrote: is there any way of calculating the new rating and new predictions yourself? with new i mean the "Results for next list" games. could you tell us how to calculate those ratingchanges, so i can do it myself when i need to?
Hi, I am one of the contributors of games to the site, and currently as far as I know it works in a way where we add data directly to his database, but I am not sure if the functionality/logic is located in an online version, obviously TheBB will be able to tell you, but since not all games are updated the second they are played, you will not have a "clean" rating because some games might not be added yet even though they have been played if you get to update the rating yourself .
|
Is there a way to see ELO for let's say top 10 players through 2 years on the same chart?
|
On December 23 2012 00:52 Odoakar wrote: Is there a way to see ELO for let's say top 10 players through 2 years on the same chart? Not currently, but I assume that it is something TheBB would implement when/if he has the time for it
|
|
im glad you got a nice shout out by tlo at hsc.. im a statistics mayor and love to see some mathematical work, dont get your model over saturated, just stick to your data and keep it simple.. for example the best football predictions are only based on market values, if you base your research on 'upsets' you might get specific results right, but overall it gets off very fast.
|
Some feedback on the predictions:
-The results are much too onesided when comparing mid-tier players, and top players. For example, your prediction about Leenock vs Sting for Fight Club was overwhelmingly in Leenock's favour(97.6%). You know that Sting won, of course, and it was an upset, but not as much of an upset as your prediction made it sound. Starcraft 2 is a game where most top-tier, or mid-tier players can take games of each other seemingly at random. You should probably move the predictions towards the mean.
-The predictions don't seem to take into account head-to-head results, which can somehow defy the players rankings or win-rates in that watchup. For example Goody's win-loss record versus Stephano is 7W-9L, while the (generally considered) much better player, PuMa, is only 2W-6L.
|
On December 24 2012 12:19 BrokenMirage wrote: -The results are much too onesided when comparing mid-tier players, and top players. For example, your prediction about Leenock vs Sting for Fight Club was overwhelmingly in Leenock's favour(97.6%). You know that Sting won, of course, and it was an upset, but not as much of an upset as your prediction made it sound. Starcraft 2 is a game where most top-tier, or mid-tier players can take games of each other seemingly at random. You should probably move the predictions towards the mean.
It should be pushed towards 50/50 if there's a higher uncertainty for the players, shouldn't it?
|
So, this is what a Nate Silver for Starcraft looks like. This will be awesome for my liquibet ranking!
|
On Thursday when the new list comes I will recompute all the ratings from the start using some different parameters. Hopefully this will help with many of your issues.
is there any way of calculating the new rating and new predictions yourself? Yeah, but it involves a bit of programming. There is no closed form expression. This feature would be kinda cool to add to the site, I agree.
Is there a way to see ELO for let's say top 10 players through 2 years on the same chart? Not yet.
The results are much too onesided when comparing mid-tier players, and top players. For example, your prediction about Leenock vs Sting for Fight Club was overwhelmingly in Leenock's favour(97.6%). This is because the ratings adjust very quickly, so a player on a hot streak will be very highly rated. When the new ratings come on Thursday, they won't be so volatile, so presumably the top will be closer to the mid tier. Maybe.
I don't want to just adjust my predictions toward the mean based on gut feeling. Based on historical data, the assumed stronger player wins almost exactly as many games as he or she should according to the ratings, if not more in some cases.
It should be pushed towards 50/50 if there's a higher uncertainty for the players, shouldn't it? Yes.
The predictions don't seem to take into account head-to-head results, which can somehow defy the players rankings or win-rates in that watchup. That's right. There is a simple Bayesian model that can do this, but I need to work out a good way to weigh past results (recent ones vs. older).
|
It TheBB the new stats bonjwa?
Great site, I like the layout and feel. Very useful so far while poking at it.
|
wow this is really impressive. i wish we would had a ladder like that. Blizzard hire that guy and make it happen!
|
On December 26 2012 02:59 TheBB wrote:Show nested quote +The predictions don't seem to take into account head-to-head results, which can somehow defy the players rankings or win-rates in that watchup. That's right. There is a simple Bayesian model that can do this, but I need to work out a good way to weigh past results (recent ones vs. older).
I don't think it's a good idea to take head-to-head into consideration, because even though there do seem to be some players who struggle against a particular opponent in a match up where they do quite well otherwise (hello MKP vs Mvp :p), it doesn't seem to be a factor the majority of the time.
|
On December 26 2012 04:02 OrbitalPlane wrote:wow this is really impressive. i wish we would had a ladder like that. Blizzard hire that guy and make it happen!  Well, Blizzard's matchmaking system on ladder is already extremely good, isn't it?
|
On December 26 2012 04:27 slowbacontron wrote:Show nested quote +On December 26 2012 04:02 OrbitalPlane wrote:wow this is really impressive. i wish we would had a ladder like that. Blizzard hire that guy and make it happen!  Well, Blizzard's matchmaking system on ladder is already extremely good, isn't it?
the match making is great. The rating system is horrible. (Even if you take out the bonus pool which inflates the rating.) It's impossible to track your own development with the blizzard ranking.
|
|
|
|