• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 19:26
CEST 01:26
KST 08:26
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
RSL Season 1 - Final Week6[ASL19] Finals Recap: Standing Tall10HomeStory Cup 27 - Info & Preview18Classic wins Code S Season 2 (2025)16Code S RO4 & Finals Preview: herO, Rogue, Classic, GuMiho0
Community News
Firefly given lifetime ban by ESIC following match-fixing investigation17$25,000 Streamerzone StarCraft Pro Series announced7Weekly Cups (June 30 - July 6): Classic Doubles7[BSL20] Non-Korean Championship 4x BSL + 4x China10Flash Announces Hiatus From ASL76
StarCraft 2
General
The GOAT ranking of GOAT rankings RSL Revival patreon money discussion thread Weekly Cups (June 30 - July 6): Classic Doubles Server Blocker RSL Season 1 - Final Week
Tourneys
RSL: Revival, a new crowdfunded tournament series FEL Cracov 2025 (July 27) - $8000 live event $5,100+ SEL Season 2 Championship (SC: Evo) $25,000 Streamerzone StarCraft Pro Series announced Sparkling Tuna Cup - Weekly Open Tournament
Strategy
How did i lose this ZvP, whats the proper response Simple Questions Simple Answers
Custom Maps
External Content
Mutation # 481 Fear and Lava Mutation # 480 Moths to the Flame Mutation # 479 Worn Out Welcome Mutation # 478 Instant Karma
Brood War
General
Flash Announces Hiatus From ASL BW General Discussion A cwal.gg Extension - Easily keep track of anyone Script to open stream directly using middle click ASL20 Preliminary Maps
Tourneys
2025 ACS Season 2 Qualifier [Megathread] Daily Proleagues Small VOD Thread 2.0 Last Minute Live-Report Thread Resource!
Strategy
Simple Questions, Simple Answers I am doing this better than progamers do.
Other Games
General Games
Path of Exile Stormgate/Frost Giant Megathread CCLP - Command & Conquer League Project The PlayStation 5 Nintendo Switch Thread
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Vanilla Mini Mafia
Community
General
US Politics Mega-thread Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine The Accidental Video Game Porn Archive Stop Killing Games - European Citizens Initiative
Fan Clubs
SKT1 Classic Fan Club! Maru Fan Club
Media & Entertainment
Movie Discussion! [Manga] One Piece Anime Discussion Thread [\m/] Heavy Metal Thread
Sports
2024 - 2025 Football Thread Formula 1 Discussion NBA General Discussion TeamLiquid Health and Fitness Initiative For 2023 NHL Playoffs 2024
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List
Blogs
Men Take Risks, Women Win Ga…
TrAiDoS
momentary artworks from des…
tankgirl
from making sc maps to makin…
Husyelt
StarCraft improvement
iopq
Trip to the Zoo
micronesia
Customize Sidebar...

Website Feedback

Closed Threads



Active: 552 users

GSL Code S Membership statistical analysis

Forum Index > StarCraft 2 Tournaments
Post a Reply
Normal
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 03:34 GMT
#1
So I've been working on a SC2 player ranking algorithm (see my other post).

So far I've only used the GSL, and I've only included player rankings, no race bias or map bias, or time-based skill evolution (all in progress and will be implemented as my data quantity increases).

Anyway, so I was looking over the list of Code S players and thought to myself that a lot of those players could easily have lost some of their matches and failed to qualify for Code S. So I wanted to see, based on the data, what was the probability of each player actually being in the Top 32.

Here are the results in a Google Spreadsheet

So as you look at that data, bear in mind, this data only obseving the GSL bracket final 64 player wins/losses is all the data in the world on the subject. This makes the algorithm non-ideal for prediction of the top skilled players. But it is ideal for assessing the uncertainty about the point system in actually getting the best players (at least for the top players).

Also bear in mind, this model implicitly assumes that not-qualifying for top 64 and not registering for the tournament are equivalent, which isn't a fair assumption, but there's no data available to fix this. JookToJung gets the raw end of this assumption. He must be very good to qualify all 3 seasons, but the model sees only his losing in the early rounds. This isn't something I like, but I don't have the proper data to correct this problem at this time.

So the table shows a lot of uncertainty about who actually belongs in Code S. There are plenty that could easy have been Code S if things turned out a slightly differently. July is easily Code S caliber, as is Ret, Loner only needed one more set and he'd be S class.

If I had more data on the qualifying rounds, I'm sure that people like JookToJung would look better. I might look into grouping all the players that have 3 or fewer games into one. Because they are hardly estimable with how little data there is on them.

But the higher up on the spreadsheet you go, the results get a lot more accurate since they are based on more games played. There are players that are clearly Top 32, a lot of people are really good, but the uncertainty associated with knowing their skills is fairly high (completely an artifact of not having a lot of data on them). The way the bracket system works, it just doesn't give very good estimates for the people who get knocked out in the first rounds.

Anyway, it is what it is. It should give you an underlying sense on what kind of information is in the data. You don't have to agree with the results, it's just what the data seem to be pointing to (under the constraints of the assumptions I had to make).
Treadmill
Profile Joined July 2010
Canada2833 Posts
December 10 2010 04:07 GMT
#2
This is pretty cool. Thanks a lot.
dissonantharmony
Profile Joined August 2010
United States46 Posts
December 10 2010 04:41 GMT
#3
Without going back through the match history, I'm curious to know why oGsTop ranks so high without being S Class...
Mip
Profile Joined June 2010
United States63 Posts
Last Edited: 2010-12-10 05:05:49
December 10 2010 04:52 GMT
#4
That's kind of the thing, the whole system is based on match history. Before the data, I say that all players are equally skilled and then let the data inform the skill parameters.

To answer your question, honestly I can't say for certain. The skill parameters are calculated using a complex mix of all of the data, borrowing strength from how good the opponents you beat are and how good the people they beat are, etc.

My guess is that in the case of oGsTop, he took out Polt who carried the information about beating MC and then he took a game off of FruitDealer. So though he was not in a lot of games, he had a pretty difficult bracket that he performed well in. Then he didn't get S class because he lost to FruitDealer in the Ro16 and didn't qualify in Seasons 2 and 3. The point system doesn't care who you lost to, this model does.
Drizz
Profile Joined August 2010
25 Posts
Last Edited: 2010-12-10 05:13:40
December 10 2010 05:11 GMT
#5
nice
wherebugsgo
Profile Blog Joined February 2010
Japan10647 Posts
December 10 2010 05:47 GMT
#6
Why is MC ranked below Jinro?

Why is Genius so low?

Why is Idra below Ret? Why is Ret so high?

Why is Butterflyeffect in there twice?

It seems like the weaknesses of certain playstyles, or at least certain weird occurences where one player beats another but loses to someone else is causing the rankings to become really weird. Players seem to get sandwiched between who they've won against and lost to. It looks really inaccurate.

Some players who have made it further into the GSL, or have qualified numerous times, and have been consistent, are being beaten out on this list by players who just have wins against them.
Treadmill
Profile Joined July 2010
Canada2833 Posts
December 10 2010 05:54 GMT
#7
On December 10 2010 14:47 wherebugsgo wrote:
Why is MC ranked below Jinro?

Why is Genius so low?

Why is Idra below Ret? Why is Ret so high?

Why is Butterflyeffect in there twice?

It seems like the weaknesses of certain playstyles, or at least certain weird occurences where one player beats another but loses to someone else is causing the rankings to become really weird. Players seem to get sandwiched between who they've won against and lost to. It looks really inaccurate.

Some players who have made it further into the GSL, or have qualified numerous times, and have been consistent, are being beaten out on this list by players who just have wins against them.

I think the reason for this is that winning is what matters.
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 06:06 GMT
#8
Winning is what matters at the end of the day. I didn't choose where the players went, but the model penalizes losing against low skill players, and rewards winning against high skilled players. I feel this is reasonable.

This isn't my model though, I don't want anyone to think that. This is the basic Bayesian Bradley-Terry model and it's used for thousands of pairwise comparison problems, so don't blame the method on me, it's just the commonly accepted Bayesian approach to pairwise comparison models. The data did all the determining of who goes where.

Butterflyeffect being in there twice is an error in the data and thanks for pointing it out. I copy-pasted the brackets from Liquipedia, and the player names are not consistent across seasons and sometimes not even within seasons.
danson
Profile Joined April 2010
United States689 Posts
Last Edited: 2010-12-10 06:11:10
December 10 2010 06:10 GMT
#9
yeahh not too sure about your algorithm...

idra made 2x 32s and 1x 16 and he only has a 24% chance of being in the top 32?

like the most basic of assumptions of that data would imply hes at LEAST the top 15-20, and seeing as how few people have actually qualified for all three gsls much less advanced in all 3 gsls hes probably much higher than that., i r confuse
danson
Profile Joined April 2010
United States689 Posts
Last Edited: 2010-12-10 06:24:38
December 10 2010 06:13 GMT
#10
On December 10 2010 15:06 Mip wrote:
Winning is what matters at the end of the day. I didn't choose where the players went, but the model penalizes losing against low skill players, and rewards winning against high skilled players. I feel this is reasonable.

This isn't my model though, I don't want anyone to think that. This is the basic Bayesian Bradley-Terry model and it's used for thousands of pairwise comparison problems, so don't blame the method on me, it's just the commonly accepted Bayesian approach to pairwise comparison models. The data did all the determining of who goes where.

Butterflyeffect being in there twice is an error in the data and thanks for pointing it out. I copy-pasted the brackets from Liquipedia, and the player names are not consistent across seasons and sometimes not even within seasons.



I cant read
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 06:16 GMT
#11
To answer your questions specifically, the spreadsheet for this post is based off of simulated plausible skill parameters for each player, and quantifies the percentage of times that each player is in the top 32 in their skill parameters.

Ret is high on this list because of the uncertainty associated with his skill level. Refer to my other post for the ranking where Ret is ranked lower.

Genius is low because his skill level has lower variance, but it is known to be smaller than Fruitdealer, Nestea,etc. His probability of being in the Top 32 is being dragged down by the uncertainty in the skill of others.

IdrA is also suffering from a smaller uncertainty. He's actually amazingly good, and plays super solid, which is reflected better in the ranking spreadsheet from my other post.

Don't misinterpret this spreadsheet, this one is NOT a ranking. It's just a measure of uncertainty about membership in the Top 32. It is related to their rank, but it is not exactly their rank.




wherebugsgo
Profile Blog Joined February 2010
Japan10647 Posts
December 10 2010 06:27 GMT
#12
I see, I understand now. It just seems weird that the likelihood of these players being in the top 32 doesn't actually hash with what we know to be consistent play.
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 06:28 GMT
#13
@danson I agree that the pre-lims should be included, and it does skew the results somewhat by not having them in there, esp. against people like IdrA and JookToJung who qualified all three seasons but didn't advance much.

If you know where to find that data, I would be immensely grateful, but I have not been able to track down anything from the pre-lims.

I totally understand any feelings of non-satisfaction with the current results. The model is great, it's backed by a lot of research, but it can't be better than the data that feeds into it. Which for now, is a problem as you've pointed out, as time goes on, however, these problems will go away.

If you or anyone else is interested in helping me find data and/or format data, PM me and we can trade skills. Right now, I'd really like someone who can parse the TL database and extract that information.

As of now, my data consists only of player names, but if we could extract the TL database information, we could get information like, which matchups are imbalanced overall? Which maps favor which race matchups and by how much? Plus an overall increase in prediction accuracy.


TyPsi5
Profile Joined May 2010
United States204 Posts
December 10 2010 06:30 GMT
#14
cool stuff -thanks for the effort
Plutonium
Profile Joined November 2007
United States2217 Posts
Last Edited: 2010-12-10 06:32:43
December 10 2010 06:30 GMT
#15
There is absolutely not enough data to extract any sort of conclusions from so far in SC2.

The game is still evolving, maps and luck play a huge factor, and the sheer lack of volume of games precludes any sort of meaningful analysis.

Additionally, the idea that losing in the prelims and not registering at all are equivalent is absolutely not a fair assumption. It massively biases the results in the favor of players who make a big run once but fail to qualify the other times, like Jinro, whereas a player like IdrA who made the top 32 every single tournament is somehow not in the top 32.
rwright
Profile Joined December 2010
1 Post
December 10 2010 06:33 GMT
#16
This should be interesting when there's more data.
Plutonium
Profile Joined November 2007
United States2217 Posts
Last Edited: 2010-12-10 08:16:15
December 10 2010 06:36 GMT
#17
hmm
Mip
Profile Joined June 2010
United States63 Posts
Last Edited: 2010-12-10 07:21:21
December 10 2010 07:00 GMT
#18
@Plutonium You're absolutely wrong about not being able to do any meaningful analysis. If you feel you can still make that statement after taking a Bayesian analysis class (you could not honestly do so), we can talk then, but you don't know what you're talking about.

I don't see how you can be peeved by a statistical analysis that is perfectly honest about it's assumptions. If I were to just present the results and didn't acknowledge my assumptions, you could say that it was not sound. But my assumptions I've been up front about, and they do not stop me from obtaining meaningful results.

The sheer luck of the matter is captured beautifully in the model. If you look at my original post, the predictions for last nights and tonights matches are both around 50/50. So yeah, the model works great and capturing our uncertainty.

Yeah, the edges are rough, but there's is information to be learned by beginning an analysis already. My top 32 has 26 of the same players as the point system, if there were absolutely no analysis that could be done, I would not be able to pull that off.

Your "add data" approach doesn't make much sense, the probability of winning a match also requires knowing who the opponent is, how do you propose I decide the skill of players I have never measured for the matches that I am arbitrarily adding into the data? And why 6 games? Why not, 4, or 8, or 20? Should I add data until I like the results? How sound of statistical practice is that?

My assumptions are based off of necessity, I had to make the implicit assumption that failing to qualify and not registering were the same thing because there is no data that allows me to separate the two.

You really shouldn't complain however, the point system does the exact same thing, you get 0 points for not entering, you get 0 points for failing to qualify, and no one QQs about that. The points are also rigged so that someone like Jinro will get more points than IdrA for his one entry, than IdrA who plays solid and qualifies every single time. Jinro only tried once, so take SSKS, he has failed to qualify twice, but because he made it to Ro8 once, he's ranked higher than IdrA by 200ish points.

Realistically, you should think of the current GSL point system as an approximation to an actual ranking system that assesses wins and losses fairly. The Bradley-Terry model that I'm using is backed by hundreds of research papers showing it's effectiveness in ranking competitions. As I get more data, I can relax most assumptions or they will simply wash out through repeated sampling. The biggest advantage I have with a the B-T model is that at the end of the day, I can make predictions based on the current state of knowledge provided by the data whereas with the point system, all you have is ranking. And as the amount of data increases, the predictions will be based off of even more knowledge.
wherebugsgo
Profile Blog Joined February 2010
Japan10647 Posts
December 10 2010 07:08 GMT
#19
On December 10 2010 16:00 Mip wrote:


You really shouldn't complain however, the point system does the exact same thing, you get 0 points for not entering, you get 0 points for failing to qualify, and no one QQs about that. The points are also rigged so that someone like Jinro will get more points than IdrA for his one entry, than IdrA who plays solid and qualifies every single time. Jinro only tried once, so take SSKS, he has failed to qualify twice, but because he made it to Ro8 once, he's ranked higher than IdrA by 200ish points.


Just wanted to point out that Jinro did not only try once, he tried both times prior. He had some difficulties and that's why he didn't qualify until GSL3. Hayder, also, for example, tried out for GSL2, but didn't make it till GSL3.
skipgamer
Profile Blog Joined April 2010
Australia701 Posts
Last Edited: 2010-12-10 07:13:00
December 10 2010 07:09 GMT
#20
On December 10 2010 12:34 Mip wrote: This makes the algorithm non-ideal for prediction of the top skilled players. But it is ideal for assessing the uncertainty about the point system in actually getting the best players (at least for the top players).


I challenge this statement.

If an algorithm is not ideal for prediction of the top skilled players, how can it then be ideal for assessing the uncertainty about the point system; the point of which is determining the top skilled players? :s

I think the data's cool and all, and it would be an awesome way of comparing players if the GSL was a 64 player invitational tournament. But because of the unavailability of data beyond the RO64 it's pretty inaccurate.
Plutonium
Profile Joined November 2007
United States2217 Posts
Last Edited: 2010-12-10 08:16:53
December 10 2010 07:14 GMT
#21
I understand that you do not have the data required to separate between failing to qualify and not registering. All I'm suggesting is that you at least attempt to compensate for that, instead of trying to handwave it away.



Koshi
Profile Blog Joined August 2010
Belgium38799 Posts
Last Edited: 2010-12-10 07:30:37
December 10 2010 07:29 GMT
#22
Data is just data. People should stop trying to find holes or errors in raw data.

I always appreciate any sort of data collecting. I always check first if it is legit, and the Bayesian Bradley-Terry model is. Then I just read it as it is presented, and remember the things that seem useful.

Do not forget that data is always misleading and sometimes even misrepresented. Just try to gather as much as possible and compare.
I had a good night of sleep.
Mip
Profile Joined June 2010
United States63 Posts
Last Edited: 2010-12-10 07:40:05
December 10 2010 07:38 GMT
#23
@Plutonium The model is not a problem at all and the results are not all that flawed. The Bradley-Terry model is ideally suited for this kind of data, there's no issue with that. You're not truly complaining about the model either, what I'm hearing from you is a complaint about the data, and that is not something that I have control over.

I don't even know why this is such a issue for you. We are talking about the same assumption that is built into the current ranking system. They give no points for any round prior to the Ro64 and they make no attempt to adjust for it for their rankings, they believe that only the Ro64 onward is rank-worthy, and because I have no data other than what they give me, I inherit that assumption implicitly by the data that is available, and there is nothing I can do about that.

This isn't even a tremendous problem like you are trying to claim it is. There are a couple players that it affects adversely, I'll give you that, but all the top players are ranked appropriately with reasonable skill levels estimated for each. It's not a greater problem than what the point system imposes though, so there's no loss as compared to the current alternative.

There is no feasible way for me to even gather data on >100 players that played in the 3 seasons. There is no central source to find which players didn't sign up, and which players failed to qualify. So even if I wanted to make your irresponsible arbitrary data additions, I could not do so. If you'd like to start making house calls to all the players to find out which tournaments they played in, then we can start to talk about measuring that sort of effect.
nath
Profile Blog Joined May 2010
United States1788 Posts
Last Edited: 2010-12-10 07:48:02
December 10 2010 07:43 GMT
#24
love how people who don't know shit are talkin to this kid. he used a good model that has good results. people complaining about the data have to realize that the data gets better and better...

also whoever talked about jinro 'trying the other two times, he entered more than once' LOL its clearly only ro64 and beyond...theres nothing about qualifications in this model.

gj man I love the model you chose; am not a stats major but i've worked with a lot of statistical methods in research and i liked what you did.

On December 10 2010 16:14 Plutonium wrote:
Talk all the statistical academics you want, It doesn't change the fact that your results are massively flawed by your unsound assumptions.

This took a lot of work, and I applaud you for that. However, your model is wrong.

I understand that you do not have the data required to separate between failing to qualify and not registering. All I'm suggesting is that you at least attempt to compensate for that, instead of trying to handwave it away.




He doesn't want to take that into account; it would cause more problems. Better to let the issue resolve itself over time as more data is collected than untwist your panties.
Founder of Flow Enterprises, LLC http://flow-enterprises.com/
Mip
Profile Joined June 2010
United States63 Posts
Last Edited: 2010-12-10 08:10:28
December 10 2010 07:51 GMT
#25
To all supportive people, thank you guys, you guys are awesome.

@skipgamer Let me clarify what I mean by, "Not ideally suited for prediction." What I mean is that I only gathered data on the GSL.

If I wanted to get prediction accuracy, I should gather as many outside tournaments as I can do add that data to the GSL data so that I have as much information about each player as possible.

On the other hand, if I want to do a fair ranking system for just the GSL (which is so far what I've been after). I absolutely must base my results only off of GSL data. So when I created these results it's under the context of, "Let's only rate these players based off of their GSL performance."

The next step is to pull in more data, and since no one has volunteered to help me yet, it might be a slow process. When I have all possible available data, then we can start talking in terms of "ideal prediction accuracy" (ideal according to our current state of knowledge anyway).

People keep calling these results "inaccurate." That may have some truth to it, but against the benchmark of the GSL point system, these results are at worst on par with that, so I really wouldn't worry about that at this point. The model and methodology are sound and the accuracy of the results will get better as more data comes in.

-----------------------------------------------------------------------------

Think about what we're trying to do, we have a bunch of progamers and they are all playing SC2. We want to know, what is their skill level, how do their compare to the other players?

Well we can't go measure their skill level directly. It's not something I can go stick a ruler on or anything. What I can see is what happens when they play a game. I can see whether they win or lose. This will provide hints at their skill level.

Take season 2 for example, We know FruitDealer is amazing because he won Season 1. FruitDealer loses to Foxer in Ro32, that's going to tell me that Foxer is at least skillful enough to beat FruitDealer, which is quite substancial. Then Foxer goes on to lose against NesTea in a nearly dead even match. That's going to hint toward thinking Foxer and NesTea are about at the same level.

When you take all the data together, you'll get hundreds of hints about how the players compare to each other. You'll still be uncertain exactly how much different their skills are from each other, but you should have an idea.

In simplistic terms, that is how this model is working, the wins and losses point toward the skill that that player has compared to who they are playing with.

So the issue with not having pre-Ro64 data is that everyone who losses in early rounds may get unfairly ranked downward because there's no data showing that they won their way through the Ro128, Ro256, etc. This is a real problem for players like IdrA and JookToJung like we've said before, but there's no available data to fix it. I'm sure someone at GomTV has it, but it's not publicly available.
Plutonium
Profile Joined November 2007
United States2217 Posts
Last Edited: 2010-12-10 08:09:37
December 10 2010 07:55 GMT
#26
I'm talking to my roomate who's a biostatistician and an avid GSL watcher.

He's taken a look at your data and analysis and assures me that everything you say is correct, and that I am "so wrong".

However, he says that releasing this data can be misleading because us people are stupid and don't understand what you're doing.

A few key points he makes:

1) The amount of uncertainty in this analysis needs to be emphasized. The results are going to be extremely noisy, and these results should not be taken too seriously.

2) Saying that not making it to the Ro64 and not registering is an acceptable assumption, given that we do not know this data.

3) It would be nice to "reward" those who qualify for the Ro64 multiple times. How much you reward this, however, is unclear. There are mechanisms for this - increasing initial means and reducing standard errors. However, the question is what values should be assigned to this, unless one wants to be more arbitrary about it.

4) He then starts talking reaching these values by integrating the chances of getting in to the Ro64 for GSL1,2, & 3, and using that data to create initial skill levels for players. This is a tricky proposition, and it may overcomplicate the model.
Hazuc
Profile Joined August 2010
Canada471 Posts
December 10 2010 07:57 GMT
#27
I'm currently studying actuarial science so this kind of stuff really interests me. Keep up the good work.
roadrunner343
Profile Joined November 2010
148 Posts
December 10 2010 08:06 GMT
#28
I just wanted to join in with the haters and tell you that this is not a perfect algorithm.

On the serious side, you already knew that (And admitted it) and I applaud the work you've done and love the data. Obviously we can't take it as 100% truth, but it gives us a lot of good data to work with. Thanks.
Mip
Profile Joined June 2010
United States63 Posts
Last Edited: 2010-12-10 08:35:23
December 10 2010 08:28 GMT
#29
@Plutonium They are not giving me a master's degree in "sucking at statistics" so I would trust me. =)

But yeah, since posting these last two threads, I've definitely experienced people misunderstanding me. It's hard to be perfectly clear on a forum where you kind of have to throw it all out there to see what people do and do not understand right away and what things need to be clarified, especially when I mostly converse with people who have been working for years on the same educational course that all speak the same language as me, so to speak.

When I came up with this idea, I thought it would be a good way to show much uncertainty there is in the model, it's like yeah, Jinro is awesome, but based on what we know about him from the data, there's a 25% chance that he might not actually be in the Top 32 best players in our pool. Maybe we'll know better the match that starts in an hour and a half =). Compare this to NesTea, where we think that there's a 1 in 200 chance than he might not actually be in the top 32, because he's won games against top players across multiple seasons.

The beautiful thing about Bayesian statistics is you can actually speak in the language of probability unlike in classical statistics where we speak in terms of meaningless confidence intervals.

In the data for this post, when you look at that spreadsheet, you can say that based on our current state of knowledge, assuming the model with it's assumptions (everyone cringe a little bit at the assumptions), we can say that Player Whoever has a whatever % chance of being in the top 32 most skilled.

Then you look over the list and say, "But player X doesn't have a %20 chance of being top 32." and to you I say, "You're probably right, but the data doesn't know that." The data doesn't troll the TL forums analyzing the crap out of everyone's play, it just considers the wins and losses that it's given.

Once the GSL gets going with Code S tournaments, we will see a great refinement in our estimates for all Code S players, because we will get so much data about each of them because all tournaments will feature them.
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 09:05 GMT
#30
Sometime this weekend or early next week I'll post my finals predictions, though I expect it will be somewhere close to 50/50, =).
Skytalker
Profile Joined October 2010
Sweden671 Posts
December 10 2010 09:27 GMT
#31
Keep the awesome work up! Add data! :D
Jaedong HWAITING!
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 11:25 GMT
#32
So, just watched the GSL. It was... awesome, I guess. I thought I'd throw in the new data and calculate probabilities for the final. If they were interesting I was going to make a new thread. Instead, here they are:

+ Show Spoiler +

Names ProbWinFinal
1 Rain 0.5055279
2 MC 0.4944721

Yeah, so how do you like that, 50/50, the data has no idea who to favor for these two. Had it been NesTea instead of Rain, it would give only a 35% chance to MC, but vs Rain, a dead even split. So go flip your coin, Heads bet Rain, Tails bet MC

roadrunner343
Profile Joined November 2010
148 Posts
December 10 2010 11:47 GMT
#33
Sorry if this is a retarded question, but I have no clue about statistics so...

Is there anyway to make the algorithm take which race they are playing into account? For example, I'm sure some pros have much higher win rates against certain races than others. Is this simply impossible, way too much work, or is it doable? Either way, I like your spreadsheet.
hybridsc
Profile Joined November 2010
United States63 Posts
December 10 2010 12:56 GMT
#34
You put TheWind in there twice, and also he is an S-class player if i recall correctly.
greycubed
Profile Joined May 2010
United States615 Posts
December 10 2010 13:34 GMT
#35
Same thing but with race color coding- https://spreadsheets.google.com/ccc?key=tH96B6ipTE7zgBFcbZ2MTAw&authkey=CPOA28ME#gid=0
http://i.imgur.com/N3ujB.png
GeorgeForeman
Profile Joined April 2005
United States1746 Posts
December 10 2010 13:44 GMT
#36
The real problem with the spreadsheet is that you don't list any error estimates. You can state assumptions all you want, but good statistical analysis of any type, Bayesian or Frequentist necessarily includes error estimates. If you as a statistician (or someone performing statistical analysis) doesn't give error estimates, there's little difference between what you're results and that of a strip mall psychic.
like a school bus through a bunch of kids
confusedcrib
Profile Blog Joined August 2010
United States1307 Posts
Last Edited: 2010-12-10 13:50:50
December 10 2010 13:48 GMT
#37
I think it's too early to be using algorithms to determine the best player, 3 GSLs and an unfigured out game mean anyone could do well. Idra once said "anyone in the top 32 could win the tournament" and he's a neigh sayer. But this may pan out over time, keep on it. And I agree with the above poster, is there any way you could implement a 2 sample Z-test or T-test and determine which player may be better than another, and do rankings that way?
I'm a writer for TeamLiquid, you've probably heard of me
Mitosis.
Profile Joined November 2010
Sweden16 Posts
December 10 2010 17:45 GMT
#38
Not sure if this is a recurring problem, but TheWind is listed at 0.5529 and not qualifying. At the same time there is a player at the bottom named "TheWinD" listed with only 0.0438 and marked as qualifying. I assume this is a glitch?
aristarchus
Profile Blog Joined September 2010
United States652 Posts
December 10 2010 18:39 GMT
#39
Suggestion for fixing the failing to qualify/not registering problem:

Why not add another fake player, something like "average person who lost in the final round of qualifying." Anytime someone qualified, give them a win over that player. Anytime they failed, give them a loss. It might be better to use a different fake player for each season, since the qualifying pool's difficulty presumably changed.

Obviously that wouldn't be ideal, but it might be less bad than ignoring the qualifying rounds entirely. (I have a math background, but no stat, so I trust your judgement, but thought I'd throw out the suggestion.)
Mip
Profile Joined June 2010
United States63 Posts
December 10 2010 20:46 GMT
#40
@roadrunner343 Yes, there is, it's not even difficult, I just add a couple terms to the model. I just need the proper data. In my other thread (linked to in the first post of this one), I go into what format my data is in currently and then later in the thread say what format I'd like it in. As soon as someone helps me get it in that format, or when I get around to doing it myself, race-matchups will be adjusted for and we can start talking about race advantages, and even map-race advantages, that will be a much more exciting discussion I think.

At the end of the day, I think I'm much more interested in how the races and maps are balanced than which players seem to be better right now.

@enjoyx and Mitosis Thanks for pointing that out, it's a pain in the butt to catch all of those. The TL brackets are inconsistent about how the capitalize names and whether or not their include clan tags etc. and it's caused a lot of players to get split into two people. I wish I had a data source that had data that didn't need cleaning, but alas no.

@GeorgeForeman and confusedcrib I'm glad you paid attention in your intro stats classes, but in Bayesian statistics, you can integrate over the uncertainty in your estimates to obtain a single number that takes into account all of the uncertainty you have in your estimate. We can say with Bayesian statistics that based on our current state of knowledge (priors + data provided) that the probability of Player X actually being Top 32 is Y%.

That you would bring up a t-test for this model immediately puts you at an intro stats level in my brain. Your instinct is correct for that level of stats knowledge, but in this case, it should not be a concern to you. You should think of those percentages in terms of what I described at the end of the paragraph above.

However, to appease you guys, I added a column of Standard Errors. If you are using your intro stats knowledge,however, you will misinterpret them because they mean different things if your data are not from a normal/gaussian distribution.

For a binary outcome, the variance is prob * (1 - prob), and then the standard error is the square root of that, but you have to throw away any thoughts that, for example, 3 standard errors gives you a confidence interval or any nonsense like that that you are taught in intro stats. For example, for NesTea, if you tried to do that, you'd get a confidence interval that included probabilities greater than 1. To do it properly, you'd have to convert to a odds ratio, compute confidence intervals, then convert back to a probability metric.

@aristarchus I could definitely make an indicator variable adjustment for that problem, but I don't even have data on who registered and didn't qualify vs who didn't register. Some of the players it's easy to find that information, like if they are on a major team, the liquidpedia site has all that. But for the less known people, I don't even know where to look, and I really don't want to look in a hundred different places to track down that information because if I start, I don't know that I can find everyone, and if I can't find everyone, who whole thing is a waste.
Disastorm
Profile Joined January 2008
United States922 Posts
December 10 2010 22:32 GMT
#41
On December 10 2010 16:51 Mip wrote:
Take season 2 for example, We know FruitDealer is amazing because he won Season 1. FruitDealer loses to Foxer in Ro32, that's going to tell me that Foxer is at least skillful enough to beat FruitDealer, which is quite substancial. Then Foxer goes on to lose against NesTea in a nearly dead even match. That's going to hint toward thinking Foxer and NesTea are about at the same level.


Is this actually true though? Do the calculations always assume things like if player a > b and player b > c then player a >c, because I know this isn't the case in most competitive gaming. There are always many cases of rock paper scissors relationships like a > b, b > c, c >a .
"Don't worry so much man. There won't be any more zergs left to QQ. Lots of QQ about TvT is incoming though I bet." - Vrok 9/21/10
See.Blue
Profile Blog Joined October 2008
United States2673 Posts
December 10 2010 22:39 GMT
#42
On December 10 2010 12:34 Mip wrote:
So I've been working on a SC2 player ranking algorithm (see my other post).

So far I've only used the GSL, and I've only included player rankings, no race bias or map bias, or time-based skill evolution (all in progress and will be implemented as my data quantity increases).

Anyway, so I was looking over the list of Code S players and thought to myself that a lot of those players could easily have lost some of their matches and failed to qualify for Code S. So I wanted to see, based on the data, what was the probability of each player actually being in the Top 32.

Here are the results in a Google Spreadsheet

So as you look at that data, bear in mind, this data only obseving the GSL bracket final 64 player wins/losses is all the data in the world on the subject. This makes the algorithm non-ideal for prediction of the top skilled players. But it is ideal for assessing the uncertainty about the point system in actually getting the best players (at least for the top players).

Also bear in mind, this model implicitly assumes that not-qualifying for top 64 and not registering for the tournament are equivalent, which isn't a fair assumption, but there's no data available to fix this. JookToJung gets the raw end of this assumption. He must be very good to qualify all 3 seasons, but the model sees only his losing in the early rounds. This isn't something I like, but I don't have the proper data to correct this problem at this time.

So the table shows a lot of uncertainty about who actually belongs in Code S. There are plenty that could easy have been Code S if things turned out a slightly differently. July is easily Code S caliber, as is Ret, Loner only needed one more set and he'd be S class.

If I had more data on the qualifying rounds, I'm sure that people like JookToJung would look better. I might look into grouping all the players that have 3 or fewer games into one. Because they are hardly estimable with how little data there is on them.

But the higher up on the spreadsheet you go, the results get a lot more accurate since they are based on more games played. There are players that are clearly Top 32, a lot of people are really good, but the uncertainty associated with knowing their skills is fairly high (completely an artifact of not having a lot of data on them). The way the bracket system works, it just doesn't give very good estimates for the people who get knocked out in the first rounds.

Anyway, it is what it is. It should give you an underlying sense on what kind of information is in the data. You don't have to agree with the results, it's just what the data seem to be pointing to (under the constraints of the assumptions I had to make).


Out of curiosity, as a math person, how did you compute the likelihoods?
GeorgeForeman
Profile Joined April 2005
United States1746 Posts
Last Edited: 2010-12-11 00:45:25
December 11 2010 00:44 GMT
#43
On December 11 2010 05:46 Mip wrote:
@GeorgeForeman and confusedcrib I'm glad you paid attention in your intro stats classes, but in Bayesian statistics, you can integrate over the uncertainty in your estimates to obtain a single number that takes into account all of the uncertainty you have in your estimate. We can say with Bayesian statistics that based on our current state of knowledge (priors + data provided) that the probability of Player X actually being Top 32 is Y%.

That you would bring up a t-test for this model immediately puts you at an intro stats level in my brain. Your instinct is correct for that level of stats knowledge, but in this case, it should not be a concern to you. You should think of those percentages in terms of what I described at the end of the paragraph above.

However, to appease you guys, I added a column of Standard Errors. If you are using your intro stats knowledge,however, you will misinterpret them because they mean different things if your data are not from a normal/gaussian distribution.

For a binary outcome, the variance is prob * (1 - prob), and then the standard error is the square root of that, but you have to throw away any thoughts that, for example, 3 standard errors gives you a confidence interval or any nonsense like that that you are taught in intro stats. For example, for NesTea, if you tried to do that, you'd get a confidence interval that included probabilities greater than 1. To do it properly, you'd have to convert to a odds ratio, compute confidence intervals, then convert back to a probability metric.


Kid, I'm a 4th year grad student working on my dissertation in statistics. I've TAUGHT an intro class. If you're going to talk down to someone, at least make sure you know more than they do. Asking for uncertainty estimates only connotes a "t-test" if you're too narrow-minded to consider anything else. As far as I can understand (which is difficult, since you didn't exactly explain it in either of your OPs) you've calculated a posterior distribution for each player's "true skill level". Using the means of these distributions as point estimates you constructed a ranking of them. (This was your previous post.) You've reported standard errors for these, though I'm not sure what those are. Are these numbers the posterior estimates for the standard deviation? Because that's not the same thing as a standard error.

Now, as best as I can tell, you took all of this data and calculated for each player, i, the probability that this player is better than all but at most 31 other players. In other words:

P(S_i>S_j | j is in T and T contains at most 31 elements)

Now, this last thing seems extraordinarily difficult to calculate, given that your estimates for each S_i all come with their own associated variances and that the posterior distribution is dependent upon each of the other. Basically, you've got a p-dimensional normal distribution (where p is the number of players in your data set) with a very confusing-looking covariance matrix. Maybe there's software that makes such a calculation trivial that I'm not aware of, but to me, that looks like a difficult problem. Bravo for taking the time to solve it.

Assuming this is your approach (and again, I'll emphasize that I'm forced to do a lot of inferring because your actual approach is nowhere explained with any degree of clarity), what you end up with are posterior probability estimates. If that is indeed what your spreadsheet is reporting, then I understand why you didn't report the standard deviation, as it's completely determined by the posterior probability estimate.

That said, I'm not sure how useful this second list is. I think the first (where you estimate each player's skill and rank them) does a far better job of not only giving us an idea of who the best players are but also give us an idea of how volatile the estimates are. This "are they REALLY top 32" stuff just muddles the issue IMO. Particularly, it's easy for people to confuse whether someone has a high probability of being top 32 because they're really, really good or whether it's because you've just got a lot of data that tells you to be pretty sure the guy is solid.

Just my $.02. I remember when I took Bayesian a couple of classmates did an analysis of SC1 where they tried to predict winners of matches based on maps, races, and the amount of days the players had since their last game. (I guess this was to measure prep time or something.) It was pretty fun stuff.
like a school bus through a bunch of kids
Mip
Profile Joined June 2010
United States63 Posts
Last Edited: 2010-12-11 08:15:16
December 11 2010 07:29 GMT
#44
I've been hesitant to be too technical in these threads because most of the audience doesn't have a stats background.

The data is a list of names in this format:
Winner Loser
--------------------
Player1 Player2
Player1 Player2
Player2 Player1
Player2 Player3
etc.

The likelihood is the Bradley-Terry model f(x) = exp(skill1)/(exp(skill1)+exp(skill2)).

The priors on the skill parameters are Normal(0,sigma^2) (Bradley Terry model is only dependent on the difference of the skills. Players with skills 100 and 101 would yield the same probability comparisons as if we subtracted 100 to make it 0 and 1, so the 0 mean is arbitrary. It's has same theoretical backing that the ELO system is based off of)

My professor said that sigma^2 could probably be fixed, to test, I just gave it a somewhat informative prior around 1 to see if it the data would alter it (they did not).

So the parameters are run through an MCMC algorithm. Had to use Metropolis steps to calculate draws from the posterior distributions of the skill parameters.

My first report was the mean of the posterior draws and the standard deviation of the posterior draws, then the mean - 2 standard deviations to give a sort of, "at their worst" skill parameter.

The second report, I took each draw from the skill parameters and took the top 32 for each one. Then I calculated the proportion of the times each player appeared in the top 32 over all posterior draws.
Vorlik
Profile Joined October 2010
1522 Posts
December 11 2010 08:02 GMT
#45
This is fascinating. I like it! :-]
Normal
Please log in or register to reply.
Live Events Refresh
BSL20 Non-Korean Champi…
18:00
RO8 Round Robin Group - Day 1
Bonyth vs QiaoGege
Dewalt vs Fengzi
Hawk vs Zhanhun
Sziky vs Mihu
Mihu vs QiaoGege
Zhanhun vs Sziky
Fengzi vs Hawk
ZZZero.O236
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Nina 136
ProTech61
StarCraft: Brood War
ZZZero.O 236
NaDa 83
Dota 2
monkeys_forever209
Pyrionflax149
canceldota49
League of Legends
Grubby4888
JimRising 109
Counter-Strike
fl0m1671
Heroes of the Storm
Khaldor353
Other Games
summit1g10115
ViBE201
Livibee121
Trikslyr66
Organizations
Other Games
gamesdonequick59833
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 20 non-featured ]
StarCraft 2
• davetesta86
• musti20045 36
• tFFMrPink 18
• HeavenSC 18
• Hupsaiya 0
• AfreecaTV YouTube
• sooper7s
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
StarCraft: Brood War
• Pr0nogo 2
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• masondota22523
League of Legends
• Doublelift4730
• Jankos1953
Other Games
• imaqtpie2037
Upcoming Events
Sparkling Tuna Cup
10h 34m
RSL Revival
10h 34m
Classic vs Clem
FEL
15h 34m
Elazer vs Spirit
Gerald vs MaNa
BSL20 Non-Korean Champi…
18h 34m
Bonyth vs Dewalt
QiaoGege vs Dewalt
Hawk vs Bonyth
Sziky vs Fengzi
Mihu vs Zhanhun
QiaoGege vs Zhanhun
Fengzi vs Mihu
Wardi Open
1d 11h
Replay Cast
2 days
WardiTV European League
2 days
PiGosaur Monday
3 days
uThermal 2v2 Circuit
3 days
Replay Cast
4 days
[ Show More ]
The PondCast
4 days
Replay Cast
5 days
Epic.LAN
5 days
CranKy Ducklings
6 days
Epic.LAN
6 days
BSL20 Non-Korean Champi…
6 days
Bonyth vs Sziky
Dewalt vs Hawk
Hawk vs QiaoGege
Sziky vs Dewalt
Mihu vs Bonyth
Zhanhun vs QiaoGege
QiaoGege vs Fengzi
Liquipedia Results

Completed

KCM Race Survival 2025 Season 2
HSC XXVII
NC Random Cup

Ongoing

JPL Season 2
BSL 2v2 Season 3
Acropolis #3
CSL 17: 2025 SUMMER
Copa Latinoamericana 4
Jiahua Invitational
2025 ACS Season 2: Qualifier
CSLPRO Last Chance 2025
Championship of Russia 2025
RSL Revival: Season 1
Murky Cup #2
BLAST.tv Austin Major 2025
ESL Impact League Season 7
IEM Dallas 2025
PGL Astana 2025
Asian Champions League '25
BLAST Rivals Spring 2025
MESA Nomadic Masters

Upcoming

CSL Xiamen Invitational
CSL Xiamen Invitational: ShowMatche
2025 ACS Season 2
CSLPRO Chat StarLAN 3
BSL Season 21
K-Championship
uThermal 2v2 Main Event
SEL Season 2 Championship
FEL Cracov 2025
Esports World Cup 2025
Underdog Cup #2
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.