Musings on Elo-like algorithms in StarCraft

SiskosGoatee

Albania1482 Posts

October 04 2013 00:53 GMT

Okay, Elo (IT'S A NAME, NOT AN ACRONYM)-like algorithms have been widely used in StarCraft, some better like Aligulac, some poorer like TLPD Elo. A quick overview on how they work:

- Everyone has a score, the indication of their skill level
- When you play someone, if you win, after the game your score goes up, if you lose, it goes down
- How much it goes up or down depends on the skill level of your opponent and your own. If you just beat a really good player it goes up more than if you just beat a weak player.
- Variants of it add things like confidence factors to make it more reliable

In theory, this system should stop you from Elo-farming, just picking weak opponents to beat. You will gain so little points from just beaating players you know you can beat and you will loose so many from that one loss you will eventually have to that stupid cheese that it's not worth it. In theory, Elo-like algorithms are the most accuratate indication of skill we have currently.

Elo-algothims implicitly assume a couple of things about the game they are applied to though:

- One's skill can be compressed into a single number with accurate reeults, skill in this sense is transitive.
- One's skill is static, it doesn't change all the time

Now, in almost any game, the above are flat out not true. The issue is, how close are they to true. We're obviously dealing with an approximation. The further these two are from the truth, the less reliable Elo-like algorithms become for a game.

And Finally, Elo-like algorithms allow for putting an ammount of 'weight' on a match, this is basically a multiplication factor on the amount of points you may win or lose. Weight isn't hard science at all. The Glicko algorithm introduces a confidence factor in the skill of someone, if the system is less confident in the skill of aplayer, the weight of any matches with that player is reduced.

TLPD Elo, why it's bad.

TLPD Elo simply took the weighting used in chess matches I have long suspected, someone on reddit claiming to be involved in the original implementation confirmed this. May be lying but people don't just go on the internet to do this. The weighting of chess matches is pretty much a mistake for StarCraft. The weighing of chess matches is very high because the majority of professional chess matches end in a draw. Chess at this moment is fast approaching its skill ceiling. The majority of ches analysts believe that when both black and white deliver a perfect game, the game should draw. Chess professionals are ever closing in on this perfect game so that a very large quantity of chess matches results into a draw. Eventually resulting in the feared state of chess called 'Draw Death'. High level chess matches can be Bo20's to overcome this. A bo5 in chess would actually be pretty meaningless. The first player to take a game can draw the remaining games quite easily. That is to say, if they aren't all draws.

For this reason. TLPD Elo gives way too many points for each win; deducts way too many for each loss. Which leads to the top player changing pretty much every time I click on it. This is not the idea of Elo, it's supposed to give a more constant view on who is the best, you're not supposed to automatically be the best after having won one tournament. It's about persistent results, but this is how the TLPD Elo system works which makes it pretty useless in my opinion.

SC2charts, why it is better

The SC2Charts algorithm claims to have had help from none other than Mark Glickman to adulst the algorithm to StarCraft and it shows. The best player does not constantly rotate around on that site and it seems all around more reliable. It also uses the Glicko algorithm from Glickman which expands Elo with a confidence factor. The system maintains a confidence variable in someone's skill level which says how confident the system is of the accuracy of the skill. If a player keeps having results the system does not expect, the confidence goes down, in reverse, it goes up. The SC2 MMR system is much like this in fact.

Aligulac

Aligulac offers the novel idea of simply giving people Elo (Glicko) per matchup and making the total score the average between the three. There are certainly arguments in favour, but also against. It could be argued that if 50% of pro players is Protoss, then one's *vP should be weighed more strongly in the average. But we'll take it. It allows the rating system to better express strong and weak matchups, which is instrumental in StarCraft. If you, say, take Innovation on in a PvT it uses Innovation's TvP rating instead of his average one.

Why Elo-like ratings might not be suitable for StarCraft at all

Like I said, the effectiveness of Elo relies on a couple of assumptions. These are never fully true, but they need to be 'true enough'. If skill level of a single player rapidly fluctuates and goes up and down, Elo becomes more difficult to use. If it is a perfect constant, Elo will very accurately converge upon it. I would not be surprised if StarCraft skill levels with players being 'figured out' at some point. Actually moves around too rapidly in comparison to chess for Elo to actually be of significant use. Elo is not designed like a WCS-points like system where you get points for a win. Elo in fact can deduce points if you win a finals 4-1 but the system expected you to win 4-0. Elo is designed to converge on your true skill level. You can lose points with Elo even if you win a tournament because the system expected you from your current rating to win it in an even better way.

Most importantly however. It is clear that one's skill is hard to compress into a single number in StarCraft, Aligulac does it in three numbers which is a good start. Regardless. Aligulac's model cannot express the 'Polt > MMA > Mvp > Polt' relationship in a mirror matchup. And these thngs are quite common in StaarCraft. People's styles playing up well against another player. Elo relies on the idea that if you are likely to beat player X. And X is likely to beat Y. The you must also be likely to beat Y. Something seemingly far from true in StarCraft.

Aligulac gives us information about the accuracy of its predictions. This is misleading. It gives you the the accuracy of its prediction of games in general (I assume, it doesn't tell you exactly what it gives you in the graph but this is the only thing with enough sample size), As in, if it predicted Life to beat Taeja with 60% and Life wins. And then it predicts HongUn to beat LiveForever with 40% accuracy and in favour of LiveForever. The statistics then reveal an accurate prediction.

That's not what we are after. We know already that it works well when you take a large sample of players. What we're interested in is individual results between players. The system says there is a 60% chance of Life to beat Taeja for sake of argument. What we want to see is if that's true. We ant 100 games between both and see if 60 of them go to Life or something in that vicinity. The point is that that isn't feasible to ever test. You aren't ever going to get a significant amount of games between two players to test this.

We can know Aligulac is accurate when taken over an entire player pool. But it's impossible to test the actual accuracy of the predictions between two specific players because two specific players simply do not meet enough in StarCraft within a time frame before their ratings change too much for it to be meaningful to ever test this. And this is exactly the thing we are interested in because this is the thing the doubters of Elo for StarCraft raise. That Elo is insufficiently capable to express the idea that that in StarCraft skill is not transitive by approximation and that people have each other's number too often. And if we could test this, the accuracy of it would give exactly as a quantity the how accurately it can deal with this situation.

Elo Islands

Elo is not an absolute ranking, it's relative to a pool of players in fact. An Elo Island is a situation that occurs when some players inside the larger pool of players play each other statistically waaaay more often than all the other players. ProLeague is an example of an Island. THe 3 WCS Regions also arguably create Islands. What this will lead to is that the rating of players in such a case will more and more be an indication of the relative strength within that Island. Not the entire pool. If you have a perfectly isolated Island it just becomes a new player pool essentially.

Elo relies on the assumption that there is sufficient bleed through to compare ratings. It has often been argued that it isn't. In hindsight, many people now realize that the initial high ratings of good KeSPA players was mostly because of the ProLeague Island. Obviously the Proleague skill level is high, but it's not as high as the GSL/OSL. So if you play a lot in ProLeague your Elo raiting will appear higher than it should be compared to a GSL player because KeSPA players play in ProLeague quite a lot more often than eSF players do in GSTL. This led to a situation where they had ratings which would suggest they would beat a lot of GSL players which they couldn't live up to.

The various foreign scenes are also Islands, this explains why all these raiting systems have a lot of foreigners up a lot higher than people expect them to be. Their rating is more relative to other foreigners than it is relative to the entire world.

I have long thought about improving Elo for StarCraft by giving each match a 'distance' factor. Which specifies how isolated players are from each other. If players play each other often or play players often which play the other player then the distance is low and the weight should be low. But if they are in completely different ISlands then the distance is high, and so the weight. Consequently this would speed up the bleed through of points between differen Islands.

itsjustatank

Hong Kong9165 Posts

October 04 2013 04:44 GMT

in an international scene with play taking place across the internet, I fail to see how your 'distance' factor does any benefit at all

ObviousOne

United States3704 Posts

October 04 2013 05:21 GMT

On October 04 2013 13:44 itsjustatank wrote:
in an international scene with play taking place across the internet, I fail to see how your 'distance' factor does any benefit at all

Unless I am mistaken, I believe it's a metaphorical distance. As in, how often players are likely to meet, or in other terms, how much overlap there is in their "islands" when those islands are expressed as a Venn diagram. More overlap might he something like both play in the same tournaments and on the same ladder region thus statistically being more likely to encounter and play one another. More overlap = less "distance".

itsjustatank

Hong Kong9165 Posts

October 04 2013 05:26 GMT

that's not something easily modeled. at the end of the day, all Elo-derivative systems simply attempt to rank players based on performance. it's in prediction where things really begin to fall apart.

SiskosGoatee

Albania1482 Posts

October 04 2013 05:48 GMT

Of course distance is metaphorical. It's a function of how likely players are to meet or meet people who meet them. And it's actually quite easy to model. Take this approach:

A: The direct distance between two players is simply the normalization of what percentage of all matches they played, they played against each other. In that sense a distance of of 1 Between Goody and TLO means they play each other at the exact average of how often you expect people to play against each other across the globe. A distance of 0.5 means they play each other two times as often as the average expected value.
B: Simply compute the shortest distance between two players and use that as the effective distance. That is to say. If Naniwa's distance to Flash is 9. But Naniwa has a distance to DRG in 3 and DRG has 0.5 to Flash, the shortest distance between Flash and Naniwa would be 3.5. You can apply any amount of modification depending on how many players one had to travel through to arrive at the shortest distance. (this is again not hard science, but something you fiddle with until it seems to work)
C Use the shortest distance as a weighing tool in determining the importance of matches. The longer the distance, the more the match counts. This serves to more quickly disperse the appropriate points between Islands.

As in, say that there is an EU Island and a KR Island. Vortix leads the EU Island and consequently seems to have a disproportionaly high amount of points. He faces of against a KR player the rare times he does, but since the distance is so high. The match carrise more stake. Vortix is expected to lose the match and will lose way more points than he normally would (because he's actually worse than his rating implies). Vortix will take his lost points back to the EU Island and by beating EU players again they too will be hit by it more easily. Thereby more quickly giving the EU and KR Islands appropriate amounts of points for their skill level.

Grovbolle

Denmark3813 Posts

October 04 2013 07:42 GMT

We are always open to people's feedback, as long as it is conveyed in a civilized manner

Good read.

seequeue

United States47 Posts

October 04 2013 07:52 GMT

Last I read, starcraft uses a bayesian system, not elo.

SiskosGoatee

Albania1482 Posts

October 04 2013 08:05 GMT

On October 04 2013 16:52 seequeue wrote:
Last I read, starcraft uses a bayesian system, not elo.

From what the outside analysis reveals. The SC2 MMR uses a Bayesian extension of Glicko which is essentially MicroSoft TrueSkill. This still falls under 'Elo like algorithms' though in my opinion. 'Elo' itself is actually the most primitive in this class of algorithms and many improvements have been made to address its flaws.

TrueSkill itself is indeed unique in that it does not rely on skill being at least some-what constant, I give you that, it is in fact specifically designed towards the idea that it is not.

tar

Germany991 Posts

October 04 2013 08:19 GMT

very enlightening read! I was always wondering why the TLPD top list changed that fast. ty!

fezvez

France3021 Posts

October 04 2013 09:12 GMT

#10

On October 04 2013 17:05 SiskosGoatee wrote:

Show nested quote +

I can confirm this.

However, reading your post still does not explain why we should change from Elo-like systems.

If you wanted to say "Elo systems are not perfect", then yes, I agree with you.

But it's by far the best system we have to rate a player. It is a mathematically well founded system, that does not apply perfectly to reality (see the transitivity problem of Polt, MMA and Mvp) but it applies damn well most of the time

TheBB

Switzerland5133 Posts

October 04 2013 09:21 GMT

#11

I have long thought about improving Elo for StarCraft by giving each match a 'distance' factor. Which specifies how isolated players are from each other. If players play each other often or play players often which play the other player then the distance is low and the weight should be low. But if they are in completely different ISlands then the distance is high, and so the weight. Consequently this would speed up the bleed through of points between differen Islands.

I like this idea, and I've had similar thoughts before. One problem is that it's really not so easy to accurately determine which island a player belongs to before he/she has already played several games. So I gather you need to pick some initial default choice based probably on nationality. Remember we calculate ratings in real time and we don't wait for a player to accumulate 20 games before calculating. So later on we need to assign them to an algorithmically chosen island. Will this impact their earlier games as well? Presumably island membership has to be a function of time. It's easy to come up with a static distance matrix, for example based on graph algorithms (see for example this older post of mine), but it's not totally obvious to me how to make it change.

Maybe once I've weeded out some of the latest bugs in Aligulac I can try experimenting with this. If you want to play around with my code, just let me know.

Some other things that I have gathered from experience:

You may not need a full distance matrix, just distances between islands is probably fine.
You may not even need many islands. The effect is likely to be sparse, i.e. after 3-4 islands the effect of further ones might be negligible. In fact you can probably get most of the change you want from just identifying Koreans and foreigners.
Other things should be able to depend on island membership too. I'm specifically thinking of initial rating for new players, which is something we already do to an extent. This has a big effect.

SiskosGoatee

Albania1482 Posts

October 04 2013 09:56 GMT

#12

On October 04 2013 18:12 fezvez wrote:

Show nested quote +

Well, I'm saying they are worse for StarCraft than other sports. I've actually long mused about a wholy different system which does not update ratings after a game but rather keeps a record of all games played as well as the date they are played at and exhausts a rating from that. The problem is that it might be computationally exuberant, It actually defines someone's 'true skill' in terms of an infinite series and it's capable of (no doubt unreliably) expressing a Polt > MMA > Mvp > Polt relationship.

On October 04 2013 18:21 TheBB wrote:

Show nested quote +

You may not need a full distance matrix, just distances between islands is probably fine.
You may not even need many islands. The effect is likely to be sparse, i.e. after 3-4 islands the effect of further ones might be negligible. In fact you can probably get most of the change you want from just identifying Koreans and foreigners.
Other things should be able to depend on island membership too. I'm specifically thinking of initial rating for new players, which is something we already do to an extent. This has a big effect.

Well, the distance strategy I outlined in the other post does not require you to put a player in an Island.

The strategy is quite simple:

A: Direct distance between two players is defined as simply the ratio of how many times they played each other opposed to how many games they played. Probably normalized for easy reading. So say the direct distance between two players is 1 if they played 1 if they played each other the average amount you'd expect on average. And 2 if they played each other for half that ratio.

B: The 'actual distance' is simply the shortest path. So even if the direct distance is say 5 but you can make a shortest path that comes down to 1.5 by taking a path through another player. you can use that, this also removes the problem of no having an 'undefined distance.

C: If no path at all can be constructed, then a default rather high value is used.

In this sense it is quite possible for a Korean player to have a high distance to other Koreans simply because of participating mostly against foreigners in some way.

Bwenjarin Raffrack

United States322 Posts

October 04 2013 12:43 GMT

#13

On October 04 2013 18:56 SiskosGoatee wrote:

Show nested quote +

This sounds a little similar to Rémi Coulom's Whole-History Rating which is said to produce better predictions than Elo, Glicko, TrueSkill, etc.

SiskosGoatee

Albania1482 Posts

October 04 2013 12:59 GMT

#14

Reading that paper it's not entirely what I had in mind but it has the same goals in mind and it's probably computationally cheaper than my idea which actually defines someone's rating in a form of mutual recursion in the ratings of others. The equation is then algorithmically solved by re-runing and re-running until you get close enough.

Otolia

France5805 Posts

October 04 2013 13:11 GMT

#15

Human psychology and the resulting behaviors cannot be expressed, measured or quantified by numbers.

Your whole argumentation about the Polt > MMA > Mvp circle is moot.

itsjustatank

Hong Kong9165 Posts

October 04 2013 14:12 GMT

#16

On October 04 2013 22:11 Otolia wrote:
Human psychology and the resulting behaviors cannot be expressed, measured or quantified by numbers.

Your whole argumentation about the Polt > MMA > Mvp circle is moot.

It's so fun to try though.

SiskosGoatee

Albania1482 Posts

October 04 2013 15:23 GMT

#17

Great ehh, but that's not the argument. The topic concede any such expression is going to be inaccurate. The topic just argues that for chess it is accurate enough to be meaningful, but for StarCraft it might not be.

Please or register to reply.

Musings on Elo-like algorithms in StarCraft

Completed

Ongoing

Upcoming