• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 18:33
CEST 00:33
KST 07:33
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Power Rank - Esports World Cup 202561RSL Season 1 - Final Week9[ASL19] Finals Recap: Standing Tall15HomeStory Cup 27 - Info & Preview18Classic wins Code S Season 2 (2025)16
Community News
BSL Team Wars - Bonyth, Dewalt, Hawk & Sziky teams10Weekly Cups (July 14-20): Final Check-up0Esports World Cup 2025 - Brackets Revealed19Weekly Cups (July 7-13): Classic continues to roll8Team TLMC #5 - Submission re-extension4
StarCraft 2
General
The GOAT ranking of GOAT rankings The StarCraft 2 GOAT - An in-depth analysis #1: Maru - Greatest Players of All Time EWC 2025 details: $700k total prize; GSL, DH Dallas confirmed Power Rank - Esports World Cup 2025
Tourneys
FEL Cracov 2025 (July 27) - $8000 live event Esports World Cup 2025 Sparkling Tuna Cup - Weekly Open Tournament Master Swan Open (Global Bronze-Master 2) Sea Duckling Open (Global, Bronze-Diamond)
Strategy
How did i lose this ZvP, whats the proper response
Custom Maps
External Content
Mutation #239 Bad Weather Mutation # 483 Kill Bot Wars Mutation # 482 Wheel of Misfortune Mutation # 481 Fear and Lava
Brood War
General
BGH Auto Balance -> http://bghmmr.eu/ Ginuda's JaeDong Interview Series [Update] ShieldBattery: 2025 Redesign BW General Discussion Dewalt's Show Matches in China
Tourneys
[Megathread] Daily Proleagues CSL Xiamen International Invitational [CSLPRO] It's CSLAN Season! - Last Chance [BSL 2v2] ProLeague Season 3 - Friday 21:00 CET
Strategy
Does 1 second matter in StarCraft? [G] Mineral Boosting Simple Questions, Simple Answers
Other Games
General Games
Total Annihilation Server - TAForever Stormgate/Frost Giant Megathread Nintendo Switch Thread [MMORPG] Tree of Savior (Successor of Ragnarok) Path of Exile
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Vanilla Mini Mafia
Community
General
Stop Killing Games - European Citizens Initiative Things Aren’t Peaceful in Palestine Russo-Ukrainian War Thread US Politics Mega-thread Post Pic of your Favorite Food!
Fan Clubs
INnoVation Fan Club SKT1 Classic Fan Club!
Media & Entertainment
[\m/] Heavy Metal Thread Anime Discussion Thread Movie Discussion! [Manga] One Piece Korean Music Discussion
Sports
Formula 1 Discussion 2024 - 2025 Football Thread TeamLiquid Health and Fitness Initiative For 2023 NBA General Discussion
World Cup 2022
Tech Support
Installation of Windows 10 suck at "just a moment" Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List
Blogs
Ping To Win? Pings And Their…
TrAiDoS
momentary artworks from des…
tankgirl
from making sc maps to makin…
Husyelt
StarCraft improvement
iopq
Socialism Anyone?
GreenHorizons
Eight Anniversary as a TL…
Mizenhauer
Customize Sidebar...

Website Feedback

Closed Threads



Active: 692 users

Designing ELO System

Forum Index > SC2 General
Post a Reply
llatszer
Profile Joined September 2010
United States70 Posts
August 09 2011 22:38 GMT
#1
Hey Teamliquid! I decided to show you guys my new ELO system that I have created and was going to ask you guys for your input. The program first of all, takes input of text files that has listed in it all the files of a particular tournament and then plays games and keeps recalculating ELO based on those results. I know there are ELO systems out there but I wanted to try to look into a system that could be user to some degree of accuracy, find out who the best players are. Here is a screenshot of the tournament files and throughout I will show you results based off of different systems.

[image loading]

The reason I actually came here, is to get some input on how the system will change ratings. The system starts out by taking a game, in this case, let’s says Marineking vs. Zenio as shown in the picture. Before I describe the system, it should be noted that I stole this whole system straight from the ELO System Wikipedia page. The system will find out an Expected score described by

[image loading]

Where Ea is the expected score for player A, Rating B is the ELO of player B, and Rating A is the ELO of player A. It then calculates the new ELO based on

[image loading]

Where Ra’ is the new ELO of player A, Ra is the old ELO for player A, K is the weighting factor, Sa is the score of player A, and Ea is the expected score for player A. The issues that I am having right now are that traditional chess ELO is almost impossible to use because of how Starcraft is used. In the Chess system, players start out with a high weighting factor for their first x amount of games to catapult them to where they should be. Then this weighting factor is lowered so that ratings are not extremely volatile. To me this does not work because a players skill relative to every other players IS volatile. To test out different weighting factors I set up the system with a base rating of 1200, and a K factor of 20, 40, and 80 respectively while only inputting premier Starcraft events (MLG, GSL, DreamHack, IEM, etc.) I get the following top 10 players:

[image loading]
[image loading]
[image loading]

It seems that a very high weighting factor makes it so that the people that are doing well most recently are the people with the highest ELO. I am not sure if this is desirable but it seems like Nestea should be leaps and bounds above everybody else in terms of ELO and this certainly does that. Also, a higher weighting factor stretches the gap out and makes it so the best players have a much much larger ELO than average or new players. I also think this is desirable but I really haven’t thought about the consequences of a higher weighting factor on how it affects people with a lower ELO.

Another point is how I calculate change in ELO. Each match is a game and has the same effect no matter what. A best of one has the same weight as a best of seven. I was thinking that there should be a multiplier on weight based on how long the series is. Best of seven has a multiplier of four where as a best of one has a multiplier of one.

Some other issues that have come up is whether or not to add in Open bracket from MLG, small tournaments and cups, team leagues, show matches, etc. I could add them all and do something similar to how longer matches might be weighted. Premier events have weight of 1, major events weight of .9, small events weight of .7, and I could figure out how those weigh into an actual players skill. I was thinking of weighting based off of prize money but event like MLG do not have a direct correlation between prize money and overall talent at the event.

Another issue is that some events go over a long period of time such as NASL and right now I kind of group them based on when they started. This is something that I could probably fix for the tournaments that have already happened and it shouldn’t be a problem for events from now on. And the last issue that I can think of right now is that some ladder systems have some type of security for players at the very top. Which means is a highly rated player loses to a very lowly rated player, they do not get dumped in ELO. Hon comes to mind but I guess that has to do with team rating. Does anyone know what this is? Do you think it should be implemented? A problem with that though, is that higher rated players could tend to stick on the top if they lose to lower rated player often.

Last thing I want to say before I conclude is how this differs from the TLPD ELO system. Based on what I know, the base rating for TLPD is 2000 with weighting of 40 for first some amount of games (I think someone in IRC said 20) and then 20 from then on. I thought I would state that for people so that there is something to compare it to. For right now, there is no weighting change based on games played my what I am doing.

Those are all the things I could think of for now. The reason I wanted to talk to TL about this is because I wanted to try and get a system that is fairly accurate for SC2 and I don’t think I could do it by myself because I tend to not think about everything . If you guys want me to test out something, clarify something, ask me questions, etc. feel free.

For a thread on how TLPD elo works:

http://www.teamliquid.net/forum/viewmessage.php?topic_id=59138

And for a thread that has some ideas on why ELO doesn't work

http://www.teamliquid.net/blogs/viewblog.php?topic_id=241535
Not_That
Profile Joined April 2011
287 Posts
Last Edited: 2011-08-10 00:42:23
August 10 2011 00:37 GMT
#2
This is a subject that is close to my heart. I think a good ELO based system is something sc2 can gain a lot from. For example when comparing between any 2 players, check who the world top are, or estimating a tournament favorites.

I've read the threads you linked, and I must say I am not convinced by the arguments for why ELO system doesn't work for sc2. I'll address my thoughts on them briefly (his original points in bold):

1) ELO only works in one direction
2) ELO does poorly with small amounts of games


- We don't have to start our ranking from today with everyone having X amount of ELO rating and go from there. With over a year of sc2, we can begin our ranking from any historical date that we choose and update the players rating from there. This helps mitigating both points. For those players who aren't quite known, and have played a small amount of 'ranked' tournaments games, their rating will obviously reflect poorly on their actual skill, but this problem appears unavoidable to me no matter what system you use (and yes, they should begin with a higher K factor to mitigate this)

3) ELO measures dominance

- I don't think this is as big concern as the post makes it out to be. In his example, if he is dominating his local community and flash is dominating the Korean scene, then flash is dominating a group of higher ranked players, thus his rating will end up higher. There is only so much points to be gained from having even a 90%+ win ratio vs lower skilled opponents, as can be observed by looking at the top players on the ladder who regularly face lower skilled opponents than themselves, and their points are kept in check (in other words, eventually gaining few points for victory and losing many points for a loss catch up to you and keeps you in reasonable range from your opponents).

The only thing to be worried of is if there is such a great disconnect between the Korean scene and the international one, that the ratings act very close to separately. For example if only 1% of games are between Koreans and non Koreans, then the ratings will be out of sync. The first solution I came up with was to attach greater significance to games between Koreans and non Koreans (higher k factor), but if the amount of games is very few I don't think it will be enough. Perhaps there is no other option than manually shift the ratings of one region to better reflect the actual skills. For example if we look at the data and see that 2000 rating Koreans are achieving the performance of 2300 rating when playing non Koreans, then perhaps the Koreans rating needs to be shifted by 300 points as a whole. This is obviously very crude but it's the best I can think of.

4) We don't live in an ideal world...

- Obviously we have to make do with what we have, but I think it's enough. Top players have histories of dozens if not hundreds of games against various opponents. If anything then it's the tier 2 players who only participated in one or a few tournaments that represent the greatest challenge to the system, since for them the data is scarce.


A few more comments / questions regarding your post:

I am not sure how many games did you include in your example analysis (k 20, 40, 80), but I suspect if you include more / go further back in time then there will be less variance in your results. I think part of the discrepency, as well as the obvious one that greater k factor results in higher importance for later results, is that initially all players begin with 1200 rating, which is obviously not representative of their skill. As you include more and more games, the effect of this error becomes mitigated.

Regarding different factors for best of 1,3,5,7, I think it's wrong to give different factors for each, at least the ones you mentioned. Consider the following:
You and I both have rating 1000. We face player X who has rating 1500. I play him in a best of 1, and you play him in a best of 7. In any single game you and I have 5% chance to beat him (made up number for 1000 and 1500 ratings). That means that I have 5% chance to take the series from him, and you have closer to 0.02% to take the series from him (if I still remember my statistics). Since some tournaments use bo1, some other formats, and some/most tournaments even have different series sizes for the different stages of the tournament, I suggest we only look at individual games, and give each game the same factor. I'm not sure how other ELO systems handle this, for example chess tournaments, I'd be curious to know so if anyone knows please say how.

While on this subject, I'd be very careful about assigning different factors for different types of events. It seems unnecessary to me, either an event is ranked or it isn't. A show match for example (such as the boxer - yellow showmatch) should be unranked because it is obviously not entirely competitive. Other than that if a player is playing in a tournament, you can only assume he's playing seriously and adjust his rating according to his performance, no matter if it is GSL finals or round of 32 in a small time international tournament.

I didn't find it in your post, could you perhaps elaborate on what you find inadequate in the TLPD system? What is it that you want to fix? You stated the difference of your system and the TLPD one is that the k factor doesn't change, but I'm not sure why you find the changing-k-system less desirable?

As for what k factor to choose, I think doing the standard way of new players starting with high k factor and it drops after certain of amount of games should work decently (varying k factor). I would try to tweak the starting k factor such that a new comer with amazing performance doesn't rise to the top of the list too quickly, beside that the best amount of games and the proper k factor can probably be found through running some simulations and seeing which value brings most players near their final ratings best.

I wish you success with your system. This is something I would really want to see work properly. The ladder is obviously bad at comparing the top players to one another, and anything is better than what we have now where the only way to compare 2 players is by the last time(s) they played each other, and which player won which tournament in the recent / non recent history.
llatszer
Profile Joined September 2010
United States70 Posts
August 10 2011 01:05 GMT
#3
To answer your point as to why I might think the TLPD system in inadequate, I think that their system is not necessarily wrong. I am just trying to test out different systems to see if I can understand how an ELO should be changed based on how a structure of a sport works. One thing I want to do is combine the foreign ELO with the Korean ELO, which TLPD does seperately right now. This causes a whole new can of worms which include foreigners booster each others ELO when they never even have to test their skill against the korean. The koreans on the other hand are "beating each other up" and some very good players may have lower ELO than their foreigner counterpart. As you said though, these types of things tend to work themselves out to a degree the longer the system is kept in place. As for the k-changing system that TLPD uses just doesn't seem like it is needed. You are trying to ease new players into the system while trying to keep the pros from losing too many points to a player that is actually better than the rating says he is. It just seems to me like as time goes on, these figures smooth each other out and the only real consequence to not having a k-changing system is that the change in the system will lower some.

I was thinking a ton about the k-factor and why it should or should not be changed and the only rational that I came up with was that if a game has a higher volatility of skill (in terms of change between who is arguably the best player) then a higher k-factor will show this more quickly. There is danger in having one too high though because if we put the k-factor at say 10000, then the winner of the most recent tournament is always number 1 in ELO. I did some testing where I added more and more tournaments and then did a k-value of 80, and it is EXTREMELY unforgiving. A player like Jinro who after his success getting to RO4 in two GSLs, he would lose like 60-70 points for a loss against very good players. The thing that seems right about it though, is that the people that have been on a tear are at the top and the people that have been struggling are not at the top. Someone like Jinro who has had very good results isn't completely screwed by a high-k system but unless you put up results then your not going to be in the top tier. So I guess that brings up the question, what is the system used for? Is it to determine who is favored in an upcoming match? Is it to determine the best player over the body of his work? Is it to determine some type of "skill rating"? I guess that is a question I need to answer.

I agree with you about the best of system where you basically just take it a game at a time. This gives people who dominate somebody in a BO7 a larger point boost than before and somebody who loses in a BO1 (like Nestea a couple of season ago) doesn't lose a bunch of points because of it. As for the leagues and stuff, it seems like I just need to decide what matters and what doesn't. I would like some opinions of stuff like money showmatches like the ones Destiny are doing, or The V etc. I feel like those matter and should be counted where as things like Yellow vs. Boxer should not be added.

The reason I did this was because I am trying to learn GUI and like to spend time programming as a way to relax after the day and felt like I should make something that I might enjoy making. I have a hard time motivating myself to do something I don't enjoy so I figured I would try to make something that interests me.
Please log in or register to reply.
Live Events Refresh
Next event in 11h 27m
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Nathanias 200
StarCraft: Brood War
Larva 416
Dota 2
monkeys_forever371
capcasts170
League of Legends
Grubby4663
Counter-Strike
Fnx 2299
Stewie2K687
taco 365
flusha346
Super Smash Bros
Mew2King109
AZ_Axe85
Liquid`Ken29
Other Games
tarik_tv18149
summit1g10224
gofns8282
FrodaN1833
C9.Mang0173
ViBE64
Sick42
PPMD26
Organizations
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 21 non-featured ]
StarCraft 2
• Hupsaiya 58
• poizon28 46
• musti20045 41
• RyuSc2 20
• davetesta17
• Adnapsc2 8
• Kozan
• AfreecaTV YouTube
• sooper7s
• intothetv
• IndyKCrew
• LaughNgamezSOOP
• Migwel
StarCraft: Brood War
• HerbMon 36
• FirePhoenix5
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• masondota22460
League of Legends
• Doublelift3914
Other Games
• imaqtpie1114
Upcoming Events
CranKy Ducklings
11h 27m
BSL20 Non-Korean Champi…
15h 27m
CSO Cup
17h 27m
BSL20 Non-Korean Champi…
19h 27m
Bonyth vs Sziky
Dewalt vs Hawk
Hawk vs QiaoGege
Sziky vs Dewalt
Mihu vs Bonyth
Zhanhun vs QiaoGege
QiaoGege vs Fengzi
FEL
1d 10h
BSL20 Non-Korean Champi…
1d 15h
BSL20 Non-Korean Champi…
1d 19h
Bonyth vs Zhanhun
Dewalt vs Mihu
Hawk vs Sziky
Sziky vs QiaoGege
Mihu vs Hawk
Zhanhun vs Dewalt
Fengzi vs Bonyth
Sparkling Tuna Cup
3 days
Online Event
3 days
uThermal 2v2 Circuit
4 days
[ Show More ]
The PondCast
5 days
Replay Cast
6 days
Liquipedia Results

Completed

CSL Xiamen Invitational
Championship of Russia 2025
Murky Cup #2

Ongoing

Copa Latinoamericana 4
Jiahua Invitational
BSL20 Non-Korean Championship
CC Div. A S7
Underdog Cup #2
IEM Cologne 2025
FISSURE Playground #1
BLAST.tv Austin Major 2025
ESL Impact League Season 7
IEM Dallas 2025
PGL Astana 2025
Asian Champions League '25

Upcoming

CSLPRO Last Chance 2025
ASL Season 20: Qualifier #1
ASL Season 20: Qualifier #2
ASL Season 20
CSLPRO Chat StarLAN 3
BSL Season 21
RSL Revival: Season 2
Maestros of the Game
SEL Season 2 Championship
uThermal 2v2 Main Event
FEL Cracov 2025
HCC Europe
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.