|
Recently I used the TLPD game database to calculate elo ratings split by matchup (as opposed to the overall ones that are currently in the DB). These are interesting because you can see who is/was best at a particular matchup even if they weren't necessarily the best player overall. I also find it interesting that the ratings for the most part correspond *very* well with common intuitions on which players were/are good at which matchups, which seems to indicate that they're at least somewhat correct/useful even when they surprise us .
Notes: - These are peak ratings, not current ones. - I've given the top 10 in each matchup here. - Players of each race in the MU are listed together because their ratings are derived from the same set of games and therefore are comparable. - Only the relative values between players in a particular MU matter, not the absolute values. - "Inflation" is probably not an issue here because the average rating of players in this system is fixed at 2000.0 by definition. - I used k = 25, which I think is the same as what TLPD uses for overall ratings (I also calculated overall ratings and ended up with values very close to the TLPD ones). - Only players who have played at least 20 games in the relevant matchup are included on the lists.
So:
All-time ZvT/TvZ peak ratings:
iloveoov: 2232
Jaedong: 2225
NaDa: 2205
sAviOr: 2197
Sea: 2197
BoxeR: 2195
GoRush: 2188
XellOs: 2185
YellOw: 2183
Hwasin: 2177
All-time PvZ/ZvP peak ratings:
sAviOr: 2240
Bisu: 2165
rA: 2157
July: 2154
Anytime: 2143
ChoJJa: 2131
YellOw: 2125
Kal: 2107
Luxury: 2107
JinNam: 2105 ( Jaedong is #11 by 1 point)
All-time PvT peak ratings:
Flash: 2224
Stork: 2206
iloveoov: 2179
rA: 2175
NaDa: 2174
Reach: 2154
Midas: 2151
Kingdom: 2143
Sea: 2136
Much: 2124
All-time ZvZ peak ratings:
Jaedong: 2204
ChoJJa: 2178
JJu: 2161
GoRush: 2160
Luxury: 2149
July: 2142
sAviOr: 2131
YellOw: 2128
ZergMaN: 2118
GGPlay: 2099
All-time PvP peak ratings:
BeSt: 2185
Stork: 2180
Reach: 2139
Pusan: 2137
Anytime: 2125
rA: 2104
Kingdom: 2099
Much: 2080
Bisu: 2078
free[gm]: 2069
All-time TvT peak ratings:
XellOs: 2193
Sea: 2192
NaDa: 2183
Flash: 2178
Iris: 2177
iloveoov: 2159
Goodfriend: 2153
Casy: 2145
Midas: 2141
Canata: 2138
There are lots of interesting things here, including both confirmed suspicions (yep, Jaedong is really good at ZvZ and Flash is really good at TvP) and things that might be a little more interesting, such as Xellos and Sea being #1 and #2 in TvT. Savior's peak ZvP was *far* beyond anyone else, so no wonder the guy who dethroned him made it to #2.
|
wow this is very interesting 0.0
|
Melbourne5338 Posts
IIRC TLPD used something like K=20 for players with less than 20 games and K=40 for others or the otherway around.
|
This is an awesome idea. Major props to you sir. I have to say, I'm surprised to see Goodfriend in the TvT ratings. Everyone else on that list has some (usually a lot) of fans, but I've rarely heard of Goodfriend. Damn that person who edged out JD in ZvP by one place, but I just know that will change in the series vs Much.
|
Bisu looks more and more like a one-matchup-wunderkid.
Also surprised me that July's ZvP was *that* good. Wasn't around back then, but I tended to hear more about his ZvT virtuoso (ie. iloveoov) than his ZvP.
|
From the looks of it, iloveoov was the all-around most dominant player ever.
|
CA10828 Posts
nada
|
On July 09 2008 16:27 pachi wrote: IIRC TLPD used something like K=20 for players with less than 20 games and K=40 for others or the otherway around. Probably the other way around, but at any rate yeah, I didn't use those exact values. It shouldn't make much difference for this purpose, since I'm not directly comparing these ratings to the overall ones (and anyway, the overall ones I calculated with k=25 were within about +-10 points of the TLPD values on average).
|
The list is surprisingly accurate though obviously there are clear flaws in it to draw concrete conclusions from it.
|
Really glad oov is still on each list in a high spot.
|
No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close.
|
On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close. If you care about records you can look at records. Ratings take into account the quality of the opposition too.
|
On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close.
The time period of said 27 game rapestreak plus time/maps/them being just STATS (seriously, when did stats matter when you're about to play someone?).
I motion you bump up Sea's TvT by two points so he can post smiley faces when he sees this thread.
|
On July 09 2008 16:55 gravity wrote:Show nested quote +On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close. If you care about records you can look at records. Ratings take into account the quality of the opposition too. I did. Except for some wins against thewind and silver, oov was playing the best of the best for the time. Yellow, jju, July, etc. ELO is suppose to reflect comparative dominance, in this it has failed. OOV dominated against the best zvt's in a far longer stretch, but the poorer stats of his opponents have misconstrued the results. In reality, Jaedongs streak is laughable compared to Oov's.
|
On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close.
It's just ELO ratings. I mean, it's an interesting pointing system, but it's not the gospel truth. Actually I think this list has been arranged pretty well considering how flawed the all-time ELO peaks list is.
|
Eh, wasn't Xellos considered the god of TvT for a while, with that being his best MU by far? How is him being #1 surprising?
|
Good to hear that Flash's TvP is the best - hope he can manage to keep it that way and improve his other MUs as well .
|
United States10774 Posts
|
kind of wished savior would come back, ya know?
|
Why isnt flash in the tvz elo ratings, he had a record of like 13-4 a few games back and has only lost to lux recently? E: 1000th post
|
Hong Kong20321 Posts
lol thats raelly intresergin!! good job sir
|
ELO rating have consistenly increased with time, with players getting smaller streaks to boot.
Having an adjustment for the upward inflation would be nice.
|
On July 09 2008 17:00 ScarFace wrote:Show nested quote +On July 09 2008 16:55 gravity wrote:On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close. If you care about records you can look at records. Ratings take into account the quality of the opposition too. I did. Except for some wins against thewind and silver, oov was playing the best of the best for the time. Yellow, jju, July, etc. ELO is suppose to reflect comparative dominance, in this it has failed. OOV dominated against the best zvt's in a far longer stretch, but the poorer stats of his opponents have misconstrued the results. In reality, Jaedongs streak is laughable compared to Oov's.
Because there was a time where JD in like 40 ZvT games had only 5 - 6 loses and was on 12 games wining streak then he had 3 -4 loses mixed with wins and then again went on another 10 games winnig streak , 1 or 2 loses does not lower your ELO that much compared to the dominance of every terran there was at that time , that kept his ELO high . So i think is quite accurate if you ask me . He certainly can challenge Oov's ELO , Oov could never restore his ELO peak after his streak , thats because July beat him in like 8 games after that ...
Edit: It is a shame that July is not in the ZvT ELo peak rankings
|
its cool that nal_ra is in all the respective p match ups
|
On July 09 2008 17:26 Scaramanga wrote: Why isnt flash in the tvz elo ratings, he had a record of like 13-4 a few games back and has only lost to lux recently?
He's 12th.
|
comparing elo picks is not very accurate, cause they had fewer games back than.
|
On July 09 2008 17:47 L wrote: ELO rating have consistenly increased with time, with players getting smaller streaks to boot.
Having an adjustment for the upward inflation would be nice. The average rating is always exactly 2000 so I don't think inflation is a big issue. The only way inflation of sorts could occur is if on average, players retire with less than 2000 points, but I wouldn't expect that to be a big issue. It's possible to check it though.
edit: in fact I will soon.
|
On July 09 2008 18:12 lamarine wrote: comparing elo picks is not very accurate, cause they had fewer games back than.
You are right . I think this topic is very good and it would be good if someone can edit it when someone breaks an ELO peak , because there are players that are at their peaks right now at their strongest or stronger MU like BEST for example. I'm curious how high will BEST and JD set their ELO peaks in the mirror MUs and can someone break them .
|
On July 09 2008 17:04 Letmelose wrote:Show nested quote +On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close. It's just ELO ratings. I mean, it's an interesting pointing system, but it's not the gospel truth. Actually I think this list has been arranged pretty well considering how flawed the all-time ELO peaks list is.
So how is the all-time ELO peaks list flawed?
The TvT rankings really want me to see a game between XellOs and FlaSh. They've never played O.O. XellOs is known for owning high-level Terrans of their time.
|
Spenguin
Australia3316 Posts
Reach and Xellos is there no end to how good they are
|
It turns out that the average retired rating is 1955 compared to the average active rating of 2015 (using my system, and this is for overall ratings). I'm not sure how much inflation this would cause in the system as it's a function of time and new players come into the system with 2000 points regardless. This doesn't suggest large inflation though, as we would expect the skill of top players to have improved since the early days of BW.
Really, the whole question is irrelevant since not enough top players have ever retired from BW to make a significant impact on the amount of ratings points available. It's only been around for 10 years after all .
edit: in fact according to http://en.wikipedia.org/wiki/Elo_rating_system, player improvement over time will generally cause *de*flation, and since average players have certainly improved a lot since 1999, I doubt *in*flation is the problem, if anything.
|
On July 09 2008 18:22 ohhsuup wrote:Show nested quote +On July 09 2008 17:04 Letmelose wrote:On July 09 2008 16:52 ScarFace wrote: No offense but I dont think is very accurate. Jaedong and oov are nearly identical on tvz/zvt. But who was the one who went on a 27 game win streak? Yeah....I just cant see how they're even remotely close. It's just ELO ratings. I mean, it's an interesting pointing system, but it's not the gospel truth. Actually I think this list has been arranged pretty well considering how flawed the all-time ELO peaks list is. So how is the all-time ELO peaks list flawed? The TvT rankings really want me to see a game between XellOs and FlaSh. They've never played O.O. XellOs is known for owning high-level Terrans of their time.
It's flawed for a number of reasons. I would have thought it would be obvious but I guess I'll point them out anyhow. Please take a minute to put aside your absolute belief in some pointing system because some of the things that these points hint is beyond stupid.
Do you realize all of the highest ELO peaks were achieved post 2003? It's probably because the ELO ratings only take official Kespa matches into account which means that players of today with their 5 day proleague system benefits tremendously, and great players of yesteryear like IntoTheRain have embarassing ELO peaks because many of their matches were played on prestigious tournmaments that died out before the formation of Kespa.
So we're left with players like Iris having higher ELO peaks than Boxer. Lucifer with higher a ELO peak than IntoTheRain. Nal Ra and Nada having their ELO peaks past their actual prime. Actually, the ELO ratings are dominated by players of today with a few of the past legends squeezed here and there. I used to have a problem with this, but I don't anymore. Every pointing system has a flaw of some kind. I DO have a problem though, when people say ignorant things like "OOOOH Sea has the 2nd best TvT eva!!!" because his ELO points happens to represent him well.
|
On July 09 2008 20:53 Letmelose wrote: So we're left with players like Iris having higher ELO peaks than Boxer. Lucifer with higher a ELO peak than IntoTheRain. Nal Ra and Nada having their ELO peaks past their actual prime. Actually, the ELO ratings are dominated by players of today with a few of the past legends squeezed here and there. I used to have a problem with this, but I don't anymore. Every pointing system has a flaw of some kind. I DO have a problem though, when people say ignorant things like "OOOOH Sea has the 2nd best TvT eva!!!" because his ELO points happens to represent him well. If you're talking about absolute strength then obviously Iris at his peak was way better in Boxer at his peak (and so forth) due to the continually increasing overall skill level, so I don't see the problem with this.
But yes, there's no such thing as a perfect rating system and this is just one way of measuring a player's skill, but I happen to think it's a pretty accurate one as far as things go.
|
Cayman Islands24199 Posts
On July 09 2008 22:48 gravity wrote:Show nested quote +On July 09 2008 20:53 Letmelose wrote: So we're left with players like Iris having higher ELO peaks than Boxer. Lucifer with higher a ELO peak than IntoTheRain. Nal Ra and Nada having their ELO peaks past their actual prime. Actually, the ELO ratings are dominated by players of today with a few of the past legends squeezed here and there. I used to have a problem with this, but I don't anymore. Every pointing system has a flaw of some kind. I DO have a problem though, when people say ignorant things like "OOOOH Sea has the 2nd best TvT eva!!!" because his ELO points happens to represent him well. If you're talking about absolute strength then obviously Iris at his peak was way better in Boxer at his peak (and so forth) due to the continually increasing overall skill level, so I don't see the problem with this. But yes, there's no such thing as a perfect rating system and this is just one way of measuring a player's skill, but I happen to think it's a pretty accurate one as far as things go. his point is not about absolute strength but relative dominance.
|
nada is the only one whos top 3 in all his matchups seems accurate to me.
|
kingdom being as far down as 7th on the PvP list makes me feel dirty
|
On July 09 2008 22:48 gravity wrote:Show nested quote +On July 09 2008 20:53 Letmelose wrote: So we're left with players like Iris having higher ELO peaks than Boxer. Lucifer with higher a ELO peak than IntoTheRain. Nal Ra and Nada having their ELO peaks past their actual prime. Actually, the ELO ratings are dominated by players of today with a few of the past legends squeezed here and there. I used to have a problem with this, but I don't anymore. Every pointing system has a flaw of some kind. I DO have a problem though, when people say ignorant things like "OOOOH Sea has the 2nd best TvT eva!!!" because his ELO points happens to represent him well. If you're talking about absolute strength then obviously Iris at his peak was way better in Boxer at his peak (and so forth) due to the continually increasing overall skill level, so I don't see the problem with this. But yes, there's no such thing as a perfect rating system and this is just one way of measuring a player's skill, but I happen to think it's a pretty accurate one as far as things go.
Of course current day players are better than past players in terms of absolute strength. Hell, B team players of today are better than Boxer of 2001. What does absolute strength have to do with anything especially since you extended the discussion to "all time".
If your argument is "the overall skill level is higher now, so it doesn't matter if ELO ratings doesn't do older players justice". Then I guess to each his own. I personally was way more impressed by Boxer's domination during his prime than Iris's, ehem, "domination".
|
For the record, I'm totally fine with ELO ratings when it comes to comparing players of today. Hell, I even like the concept of taking the relative strength of players into account.
I just wish people would stop forming opinions about the "domination" of players from different eras purely by looking at the ELO peaks. It can lead to pretty retarded conclusions. I'm not saying every conclusion drawn from ELO ratings is wrong (the list on this thread is surprisingly accurate despite its faults), but it gets pretty frustrating when threads like this spawn ignorant comments by people who get too impressed by a bunch of numbers.
|
United States695 Posts
Nice job this is really interesting. Especially interesting to see some modern players topping those charts like BeSt's PvP and Jaedong being all over the place.
|
I think that makes rA the most balanced P player ever =D WOOT KANG MIN!
|
United States12607 Posts
This is fascinating, thanks.
More proof that BeSt's PvP is the greatest of all time.
|
Dang Best is already the best PvPer ever. But I guess that is expected from someone who goes 23-5 these days....damn. Makes me wish that July had lost to Backho so Best could pump his elo up even higher :-P
On another note, what's up with Flash's TvZ?
|
United States12607 Posts
In response to Letmelose, my statement above is not really based on the ELO rankings here but on all of the PvP's I've ever watched. You bring up some really good points about the validity of ranking players solely by ELO peak.
Would have edited this in rather than posting again, but I'm getting some kind of weird bug
|
On July 09 2008 23:06 oneofthem wrote:Show nested quote +On July 09 2008 22:48 gravity wrote:On July 09 2008 20:53 Letmelose wrote: So we're left with players like Iris having higher ELO peaks than Boxer. Lucifer with higher a ELO peak than IntoTheRain. Nal Ra and Nada having their ELO peaks past their actual prime. Actually, the ELO ratings are dominated by players of today with a few of the past legends squeezed here and there. I used to have a problem with this, but I don't anymore. Every pointing system has a flaw of some kind. I DO have a problem though, when people say ignorant things like "OOOOH Sea has the 2nd best TvT eva!!!" because his ELO points happens to represent him well. If you're talking about absolute strength then obviously Iris at his peak was way better in Boxer at his peak (and so forth) due to the continually increasing overall skill level, so I don't see the problem with this. But yes, there's no such thing as a perfect rating system and this is just one way of measuring a player's skill, but I happen to think it's a pretty accurate one as far as things go. his point is not about absolute strength but relative dominance. Well, given that the system forces the rating of new players to 2000.0, and given that the average strength of a new player keeps increasing, the system is a better measure of absolute strength than relative, so we should expect player's peaks to be higher in the modern era. Although I'm not enough of an expert on these systems to say for certain what the balance between relative vs. absolute measurement is for this particular implementation.
|
United States10328 Posts
This is cool! However, the TLPD sadly doesn't contain every game played in proleagues/starleagues/other competitions... so it could be slightly off :/
|
United States12607 Posts
On July 10 2008 01:30 ]343[ wrote: This is cool! However, the TLPD sadly doesn't contain every game played in proleagues/starleagues/other competitions... so it could be slightly off :/
I'm almost positive TLPD contains every KeSPA-sanctioned game ever, and all games from major tourneys prior to the existence of KeSPA.
|
I believe you're wrong. I remember checking a few times to find certain games go un-recorded. They may be added later, but some of the Proleague games didn't seem to be added for a stretch at the beginning of this year. (Can't say the same about the summer; they're usually added hours after the games are played.)
I completely agree with the ELO peaks and it definitely reflects what I thought to be accurate. During Jaedong's great ZvT stretch, I reflected on how good he was relative to oov's peak and I decided that he was probably even with ovv.
There was never any doubt about Savior being the #1 ZvP'er in the world at any stage. No one was as dominant as he was.
|
These rankings explaing everything about Sea's play
he is basically: #2(almost 1) TvT #3 TvZ #5 TvP
wow ;o!
and
iloveoov's picture below is just ridiculous!
this is his TvZ in his prime
|
Russian Federation4333 Posts
Casy not in the TvZ rankings? Wierd.
What place is he?
|
nal_ra appears in all lists!!! he is the protoss god of all times!!! =)
|
On July 10 2008 03:19 TheTyranid wrote: Casy not in the TvZ rankings? Wierd.
What place is he? 14th. It's a little surprising but there are a lot of good TvZ players and as far as I remember Casy was known more for his excellent M&M micro than for being super-dominant in the matchup altogether.
|
1. There is definitely inflation. ELO neither gives us a comparison of absolute skill nor does it truly even give us a comparison of relative dominance. If it did the former, then the list would be wholey dominated by current players; if it did the latter, we could expect the bonjwas of past to have a more complete sweep of the ratings.
If ELO compared relative dominance, then Boxer's ELO peak would be for sure top 10. Instead he is 22nd. He amassed 4 medals -- 3 gold -- with a truly awesome win record of 70% within his first 14 months as a progamer. He won with greater frequency than Flash (68% in his first 14 months) against the best of his era, but the difference in ELO is not even close. Oov had to win close to 80% for his first year in order to get close to Flash's ELO peak. Nada experienced peak ELO in IOPS, long after his skill had peaked.
Player deflation is not a problem because 1. players tend to retire below 2000, 2. when good players crash they tend to struggle to ever get back up again. If you look at current ELO ratings for past champions, you find: Xellos - 2026 Ra - 2003 ggplay - 2058 casy - 1991 ...etc
When a player falls from the top, the points they had gained get recycled back into the system. Meanwhile, points from players who are leaving at ELO below 2000 get permanently added to the system. Unlike in chess, dominance in Starcraft is short-lived, so ELO ratings change rapidly and new blood is cycled through very quickly.
2. A serious problem with ELO is that it assumes that you only win if you are the better player in absolute skill, but in Starcraft a lot depends on the map you play on and your playing style compared to your opponent's, not to mention luck. Having the right set of builds for the maps you play and/or facing off against the right set of opponents (your best match-up, styles you easily counter, etc) can lead to a string of victories that you wouldn't necessarily achieve on other maps or against other players.
It's not that I don't appreciate your work. This is very interesting stuff and there is currently no way better than ELO to compare peak performance. But in the end, I think stats if used correctly give a more coherant picture since you can break things down to examine trends as well as specializations in maps and match-ups and styles.
ELO cxannot tell you things like Oov > Nada > Xellos > Oov, which fans observed to be the case back around 2004. The ELO just gives one the highest peak. But if we consider a system of only those 3 gamers and hypotehtically assume that they win perfectly against the player they beat (Oov 100% vs nada, nada 100% vs xellos, xellos 100% vs oov), then the player who will achieve highest ELO would be the player who has the highest ration of games played against the person they beat relative to the number of games against the person who beats them. In this system where clearly no one is the best, a best is assigned. So while the stats may not be as absolute in terms of player ranking, the ELO ratings which do give us that also come with potentially serious flaws that CANNOT be unravelled.
|
On July 10 2008 02:37 InfeSteD[rA] wrote:These rankings explaing everything about Sea's play he is basically: #2(almost 1) TvT #3 TvZ #5 TvP wow ;o! and iloveoov's picture below is just ridiculous! this is his TvZ in his prime
wait, so oov won his FIRST 27 vsZ's ..oO
|
I'm glad oov got to play Jaedong a few times before he retired, if not it would've been a huge upset for me personally O_O
|
Jaedong is probably in the ZvP peaks right about now .
|
yeah i thinks he will climb some more spots
|
something to notice is the good position of nal_ra in all rankings
|
On July 10 2008 01:12 EtherealDeath wrote: Dang Best is already the best PvPer ever. But I guess that is expected from someone who goes 23-5 these days....damn. Makes me wish that July had lost to Backho so Best could pump his elo up even higher :-P
On another note, what's up with Flash's TvZ?
Flash has always been good in TvZ, but never as dominant as other terrans have been.
He's lost 3 of 5 series in the MSL/OSL in which he's played zergs.
Bo3: Rumble - win, 2-0 Jaedong - win, 2-1 Luxury - loss, 0-2
Bo5: GGPlay - loss, 3-2 Jaedong - loss, 3-1
Mainly because of game trading with Jaedong, he really hasn't been able to get on any kind of streak. Meanwhile, Jaedong continued to win vT after winning the MSL and losing to Flash in the other leagues (plus he had streaks of 13 and 8, and was 33-8 ZvT for the 8 months before that).
|
On July 10 2008 21:59 UnS)DeathTrap wrote: something to notice is the good position of nal_ra in all rankings
I was glad to see a few posts expressing that sentiment in this thread. Kang Min's position on these lists should spark quite a bit of attention from some of the more oblivious folk around here, too. The PvZ rankings especially seem to reflect reality, as Bisu is the only protoss I've seen that could handle zergs better than rA (albeit in a different and less dramatic fashion).
It's especially noticeable that rA is so high on all the lists simply because he is among the old school gamers out there. An earlier posted noted how much harder it was for the older players like Boxer to get a high ELO.
It's a shame he never got the respect he deserved on the power ranking.
|
On July 10 2008 17:03 Mortality wrote: Player deflation is not a problem because 1. players tend to retire below 2000, 2. when good players crash they tend to struggle to ever get back up again. If you look at current ELO ratings for past champions, you find: Xellos - 2026 Ra - 2003 ggplay - 2058 casy - 1991 ...etc
When a player falls from the top, the points they had gained get recycled back into the system. Meanwhile, points from players who are leaving at ELO below 2000 get permanently added to the system. Unlike in chess, dominance in Starcraft is short-lived, so ELO ratings change rapidly and new blood is cycled through very quickly.
As the Wiki link I posted shows, there can still be deflation even in this situation, because more points are not injected into the system when players get better over time - the average for new players stays at 2000 even though new players are much better now than then. This may very well be more significant than the retirement effect. But it's true that it's hard to say exactly how much the system is inflated/deflated.
2. A serious problem with ELO is that it assumes that you only win if you are the better player in absolute skill, but in Starcraft a lot depends on the map you play on and your playing style compared to your opponent's, not to mention luck. Having the right set of builds for the maps you play and/or facing off against the right set of opponents (your best match-up, styles you easily counter, etc) can lead to a string of victories that you wouldn't necessarily achieve on other maps or against other players.
Yes, this is an issue, as maps can have a big impact on balance. To get ideal accuracy, you'd actually want ratings for every matchup on every map, but unfortunately there isn't a large enough sample size - typically a player will only ever play a few games in a given matchup on a given map.
ELO cxannot tell you things like Oov > Nada > Xellos > Oov, which fans observed to be the case back around 2004. The ELO just gives one the highest peak. But if we consider a system of only those 3 gamers and hypotehtically assume that they win perfectly against the player they beat (Oov 100% vs nada, nada 100% vs xellos, xellos 100% vs oov), then the player who will achieve highest ELO would be the player who has the highest ration of games played against the person they beat relative to the number of games against the person who beats them. In this system where clearly no one is the best, a best is assigned. So while the stats may not be as absolute in terms of player ranking, the ELO ratings which do give us that also come with potentially serious flaws that CANNOT be unravelled.
Yes, the system does assume that skill is transitive (ie if a>b and b>c then a>c) and therefore doesn't directly account for circles like this. However, I don't think this is a particularly common situation over long time periods or for large numbers of games and therefore hopefully it doesn't have too much effect. You're right that it's possible for fans to observe trends that no simple numerical system can capture, but the Elo system does have the advantage of being consistent and unbiased.
|
omg, its only zergs in the ZvZ rankings
|
On July 10 2008 23:08 gravity wrote:Show nested quote +On July 10 2008 17:03 Mortality wrote: Player deflation is not a problem because 1. players tend to retire below 2000, 2. when good players crash they tend to struggle to ever get back up again. If you look at current ELO ratings for past champions, you find: Xellos - 2026 Ra - 2003 ggplay - 2058 casy - 1991 ...etc
When a player falls from the top, the points they had gained get recycled back into the system. Meanwhile, points from players who are leaving at ELO below 2000 get permanently added to the system. Unlike in chess, dominance in Starcraft is short-lived, so ELO ratings change rapidly and new blood is cycled through very quickly.
As the Wiki link I posted shows, there can still be deflation even in this situation, because more points are not injected into the system when players get better over time - the average for new players stays at 2000 even though new players are much better now than then. This may very well be more significant than the retirement effect. But it's true that it's hard to say exactly how much the system is inflated/deflated.
The Wiki article does not demonstrate that because it does not take into account that an old star who rises to 2200 may fall back to 2000 and therefore have ALL of his points recycled into the system. For players like Ra and Casy, that is very close to being reality.
Furthermore, there is constant addition of new blood. Most of the progamers who actively participate in leagues today are post-LYH (Boxer) era and the few exceptions, players like ChRh, Yellow and Reach, are below 2000 in 1v1 rating.
Most of today's progamers are post-LYY (Nada) era even.
The fact that Boxer's ELO peak is so low despite his incredible success (AND streakiness -- something which the ELO system favors) is itself proof of inflation since nobody else from that era had an ELO close to his. I doubt anyone other than Boxer surpassed 2200 until Nada came along. By then new blood had come along and things began to change. Players from the LYH era lost momentum against the LYY era stars such as Ra and Xellos and Chojja -- some of whom had longevity compared to the LYH era stars.
It's not until the post-LYY era that ELO ratings really started to level out. At that point, enough of the game had been invented so that old stars could remain near the top for quite some time.
Also, a system that accurately measures SC skill SHOULD feature some bias. The question is not whether or not it should be biased, but in what areas it should be biased and how much. The reasons for this are 1. the maps themselves are inherantly biased, so winning as T on a T favoring map is less impressive than winning as T on a map bad for T -- but it should also be taken into considertaion that a player may specialize on a map that other players of his race are bad at like Anytime on Nemesis or Flash on Katrina, and 2. SL games are given higher priority than PL games and should therefore count for more. Particularly, a player still alive in SL who faces a player no longer alive in SL is at a disadvantage in that the PL-only player may fine tune builds for the maps his team wants him to work on and then use them on the SL player in PL, but the SL player is better off saving creative builds for SL.
Anyway, I think we can agree that there is no 100% satisfying way to rate players.
|
I'm sure someone said this or something but elo has special conditions depending on who doing it im sure kespa has something special they use and so does tlpd for chess at lest i believe it was //Players below 2100 -> K factor of 32 used //Players between 2100 and 2400 -> K factor of 24 used //Players above 2400 -> K factor of 16 used
Also cuz your using a fixed k there is no inflation you worry about inflation because when someone with a score less then 2100 wins over a guy with over 2100 different k give different results although the playing may get +32 to his score the high score player only gets -24 as his max and thus inflation
But then again i think the currently chess league has a like k= bleh until a certain amount of games
Also im amazing you could do all that considering your elo calculations had to be time accurate to get proper ratings.
Anyways what did you use to do all this excell?
|
Cayman Islands24199 Posts
question, what's the formula for the 'big bang' points. everyone starts out evenly at a certain number?
|
On July 11 2008 03:07 Mortality wrote:Show nested quote +On July 10 2008 23:08 gravity wrote:On July 10 2008 17:03 Mortality wrote: Player deflation is not a problem because 1. players tend to retire below 2000, 2. when good players crash they tend to struggle to ever get back up again. If you look at current ELO ratings for past champions, you find: Xellos - 2026 Ra - 2003 ggplay - 2058 casy - 1991 ...etc
When a player falls from the top, the points they had gained get recycled back into the system. Meanwhile, points from players who are leaving at ELO below 2000 get permanently added to the system. Unlike in chess, dominance in Starcraft is short-lived, so ELO ratings change rapidly and new blood is cycled through very quickly.
As the Wiki link I posted shows, there can still be deflation even in this situation, because more points are not injected into the system when players get better over time - the average for new players stays at 2000 even though new players are much better now than then. This may very well be more significant than the retirement effect. But it's true that it's hard to say exactly how much the system is inflated/deflated. The Wiki article does not demonstrate that because it does not take into account that an old star who rises to 2200 may fall back to 2000 and therefore have ALL of his points recycled into the system. For players like Ra and Casy, that is very close to being reality. Furthermore, there is constant addition of new blood. Most of the progamers who actively participate in leagues today are post-LYH (Boxer) era and the few exceptions, players like ChRh, Yellow and Reach, are below 2000 in 1v1 rating. I think you misunderstood what I was trying to say. Sure, players retiring with an elo of less than 2000 does cause inflation, but the averages I posted earlier (1955 average for retired vs 2015 average for active) show that this isn't a huge effect. On the other hand, deflation (in the sense of a certain fixed rating representing higher and higher skill as time progresses) occurs because on average, Starcraft players keep getting better, but the average number of points available barely increases. Boxer started with 2000 points and so does a new player, but a new player would kick 1999 Boxer's ass very easily, which means that the new player should actually have a rating of 2300+. This easily overwhelms the retirement effect that you're talking about, so that the system is likely actually significantly deflated in terms of absolute skill measurement. Basically, there aren't enough points to go round to represent the greater average skill of modern players compared to old ones.
It might be better to do things like chess and use a provisional rating for new players instead of a fixed 2000 rating, but that would make the calculation more complicated . I might look it up and try to implement such a system if it doesn't seem too bad, to see what the effect on the results is.
|
On July 11 2008 03:46 oneofthem wrote: question, what's the formula for the 'big bang' points. everyone starts out evenly at a certain number? Yes, in this version of the system everyone starts at 2000.
|
On July 11 2008 03:13 IzzyCraft wrote: Anyways what did you use to do all this excell? I used a Ruby script. First I downloaded the relevant parts of TLPD as HTML files with wget and cleaned up the results with a HTML-Text converter to make them easier to parse.
|
MyLostTemple
United States2921 Posts
wow this is quite interesting. thankyou!
|
Nice. I've been wondering if there was something like this out there.
|
|
|
|