|
On May 07 2013 17:19 Too_MuchZerg wrote: List 84 Start May 2nd, 2013 End May 15th, 2013 Active players 290 (10 new) Leading race Terran Terran (8%) Lagging race Protoss Protoss (10%) Games 803 PvT 95–65 (59%) PvZ 112–91 (55%) TvZ 112–84 (57%)
More like zerg lagging and badly :D The leading/lagging is an indicator of the 5 highest ranked of each race. It's in the faq :-)
On the period list you can see OP/UP fields, and in the infobox for each period, the same data is given as "leading" and "lagging" race. This is an indicator showing which races are most and least prominent near the top of the list. Specifically, for each race imagine a hypothetical player with a rating equal to the mean of the ratings of the top five players of that race, and imagine these three players playing very many games against each other. If the players were of equal strength, each of them would score about 50%, however, in reality, one of them may score, say, 10% more than that. The race that scores the most in this scenario is the "OP", or "leading" race, and the race that scores the least is the "UP", or "lagging" race.
This is provided as a way to analyse the metagame shifts near the top of the skill ladder, and should not be taken as actual evidence for real game imbalance.
http://aligulac.com/faq
|
I just want to say that I absolutely love what the team has done on Aligulac. I discovered it yesterday and have been playing around a ton with all of the prediction features, and just really love the whole site design.
Idea: would you be able to rate team decision-making in Team Leagues with the all-kill format? Theoretically, since your model is fairly accurate, a teams best decision would be to choose the player on their team with the largest difference in the vX rating for the challenger and the vY rating for the player that just won. Do some teams follow this guideline more closely? Do those teams win more often due to these decisions?
Question: Would adding in certain datasets like, for instance, every round of every Playhem daily, skew the ratings? I've read some concerns that a lack of cross-region play or an oversampled region can cause certain populations to have higher ratings than they should. What would a mountain of games like that do to the ratings, and how would it change that really cool visualization of different "communities"?
Props to you (and everyone who works on Aligulac) again for already answering more questions than I could hope to pose :D
|
Thanks again, your site is great
|
On May 08 2013 11:32 justdmg wrote: Idea: would you be able to rate team decision-making in Team Leagues with the all-kill format? Theoretically, since your model is fairly accurate, a teams best decision would be to choose the player on their team with the largest difference in the vX rating for the challenger and the vY rating for the player that just won. Do some teams follow this guideline more closely? Do those teams win more often due to these decisions? I don't want to promise to implement something like this, but I've actually looked at it before. Monk and Waxangel contacted me and wanted to do some evaluation of the coaches in proleague. I think this was after two rounds, when EG-TL sucked hard, and they wanted to know whether the EG-TL coach was to blame or not. (I don't think this was ever published, and it was two months ago, so I feel I can do it here instead.)
The idea we had was calculating rating discrepancies for each game played. That is, if a player has a mean rating of 1500, with 1400 vZ, 1500 vP and 1600 vT, that player has a discrepancy of -100 vZ, 0 vP and +100 vT. Clearly it is advisable to match him up against Terrans, and one of the jobs of a coach is to ensure that his players end up in good matchups.
In a sense, the mean discrepancy of a team is a measure of how much extra skill the coach is able to squeeze out of his team by manipulating the lineup.
So how did they do? I found these mean discrepancies for each team:
Woongjin Stars 4.2 STX SouL 0.5 CJ Entus -3.6 Team 8 -3.8 Samsung KHAN -8.4 EG-TL -8.5 KT Rolster -8.7 SK Telecom T1 -10.4 It is interesting to note that Woongjin is the only team with a significantly positive discrepancy, and for some reason SK is easily worst. EG-TL seems bad, but by no stretch of the imagination out of the league.
It is also curious that the overall mean discrepancy is negative, but one should note that this isn't necessarily a zero sum game. It's okay to choose a player with -50 discrepancy if that causes the opponent to get -100 (since you can control the race he gets to face, right). So actually what I should have been looking at is net discrepancy. It'd be interesting to redo these calculations using that rule instead.
If you're interested, in the spoiler is a list of mean discrepancies for each player. Poor Thorzain faced too many Zergs....
+ Show Spoiler +JyJ (T) SK Telecom T1 56.4 BarrackS (T) KT Rolster 48.0 JuNi (Z) Samsung KHAN 38.7 Rudy (T) Woongjin Stars 37.8 HuK (P) EG-TL 35.6 Dear (P) STX SouL 33.1 EffOrt (Z) CJ Entus 22.6 hOpe (Z) Samsung KHAN 21.7 Kop (T) Samsung KHAN 20.6 Zest (P) KT Rolster 18.6 Aria (P) Woongjin Stars 15.4 Hydra (Z) CJ Entus 15.2 ZerO (Z) Woongjin Stars 13.9 Shine (Z) Samsung KHAN 11.8 BrAvO (T) Woongjin Stars 11.5 PuMa (T) EG-TL 11.3 Jaedong (Z) EG-TL 10.7 hitmaN (Z) KT Rolster 10.5 Light (T) Woongjin Stars 9.8 MyuNgSiK (P) KT Rolster 9.6 INnoVation (T) STX SouL 9.1 BeSt (P) SK Telecom T1 7.7 Zenio (Z) EG-TL 7.1 Classic (T) STX SouL 6.2 Cure (T) Team 8 5.9 RevivaL (Z) EG-TL 5.9 JYP (P) EG-TL 5.1 Last (T) STX SouL 4.0 Trap (P) STX SouL 2.8 sOs (P) Woongjin Stars 2.5 free (P) Woongjin Stars 2.2 Bong (P) CJ Entus 2.0 ALBM (Z) Team 8 1.0 s2 (Z) SK Telecom T1 0.8 Soulkey (Z) Woongjin Stars 0.7 Mind (T) KT Rolster 0.5 soO (Z) SK Telecom T1 0.1 Bear (Z) STX SouL 0.0 hudadak (T) SK Telecom T1 0.0 HoeJJa (Z) KT Rolster 0.0 Flash (T) KT Rolster -0.1 Savage (Z) Team 8 -1.9 TRUE (Z) Team 8 -2.7 Reality (T) Samsung KHAN -3.4 SonGDuri (Z) CJ Entus -3.7 FanTaSy (T) SK Telecom T1 -3.8 JangBi (P) Samsung KHAN -5.3 Terminator (P) Team 8 -5.8 Stephano (Z) EG-TL -6.2 TY (T) Team 8 -9.1 Bbyong (T) CJ Entus -9.3 Argo (P) Team 8 -9.7 RorO (Z) Samsung KHAN -13.6 ParalyzE (P) SK Telecom T1 -15.1 Flying (P) Woongjin Stars -16.0 HerO (P) EG-TL -17.1 Action (Z) KT Rolster -17.7 Mekia (Z) Woongjin Stars -18.3 Sola (Z) Samsung KHAN -20.6 Stats (P) KT Rolster -22.0 Stork (P) Samsung KHAN -22.0 Comet (P) STX SouL -22.6 Turn (T) Samsung KHAN -23.7 Sacsri (Z) SK Telecom T1 -26.2 Bisu (P) SK Telecom T1 -27.3 hyvaa (Z) STX SouL -28.6 Size (Z) STX SouL -31.8 herO (P) CJ Entus -32.8 Rain (P) SK Telecom T1 -33.2 sKyHigh (T) CJ Entus -40.4 Motive (P) KT Rolster -43.9 BisAnG (P) Woongjin Stars -48.1 Crazy (Z) KT Rolster -48.2 TaeJa (T) EG-TL -53.2 mini (P) STX SouL -59.1 rare (Z) CJ Entus -67.5 ThorZaIN (T) EG-TL -104.6 sSak (T) SK Telecom T1 -105.9
|
On May 08 2013 11:32 justdmg wrote: Question: Would adding in certain datasets like, for instance, every round of every Playhem daily, skew the ratings? I've read some concerns that a lack of cross-region play or an oversampled region can cause certain populations to have higher ratings than they should. What would a mountain of games like that do to the ratings, and how would it change that really cool visualization of different "communities"? Kinda, you can think of it like this:
More games within a community will cause the ratings in that community to spread out, so the gap between the top and the bottom is larger. Since the international scene usually plays more often, you get something like this:
|----------------------------------------------------| International
|-------------------------| Korean More games across communities will cause the communities themselves to adjust relative to each other. So if we take the above ratings and then get some cross-region games, we might get something like this:
|----------------------------------------------------| International
|-------------------------| Korean So the reason you see international players mixed up in the top is two-fold. First, there aren't enough cross-region games (yet), or in other words, Koreans are still consistently gaining points from foreigners whenever they meet. Second, there international rating pool is more spread out.
There will be an update coming up which will cause offline games to be weighted about twice as much as online games. Since the Korean scene is mostly offline compared to the international scene, this will widen the Korean pool a fair bit, but the other problem is still present.
Btw, those charts are purely qualitative. In reality the difference isn't as extreme as I made it seem.
|
Hey,
I'd like to take a look at the databse but I'm not into programming at all. Is there some easy way to transform the data into a simple matrix, that I can use in Matlab, that shows chronologically the games and who won against who? That would be enough information for me.
|
Okay so I tried punching through this with a bit of handwork. At the moment I am only using the ELOsystem and no racespecific MUs. I have applied the 'function' 1000 times to increase the pointflow between the regions/kespa/esfplayers for players with more then 20 games and I think my result isn't that bad. My top 10:
1. SoS 1589.3 2. Innovation 1572.8 3. Leenock 1535.5 4. Soulkey 1530 5. RoRo 1515.5 6. Rain 1504.8 7. Flash 1496.1 8. Crazy 1494.2 9. Parting 1485.7 10. Life 1483.3
Though I have to say Kespa seems a little too strong here. Maybe a few less Iterations would be better. Also I only use ~45k matches. Don't know where the rest have gone. All in all this just means, that if the Kespapros and Esfpros are playing at the same strength that they played with up until now - Kespa will at some point smash ESF.
|
+ Show Spoiler +On May 08 2013 18:24 TheBB wrote:I don't want to promise to implement something like this, but I've actually looked at it before. Monk and Waxangel contacted me and wanted to do some evaluation of the coaches in proleague. I think this was after two rounds, when EG-TL sucked hard, and they wanted to know whether the EG-TL coach was to blame or not. (I don't think this was ever published, and it was two months ago, so I feel I can do it here instead.) The idea we had was calculating rating discrepancies for each game played. That is, if a player has a mean rating of 1500, with 1400 vZ, 1500 vP and 1600 vT, that player has a discrepancy of -100 vZ, 0 vP and +100 vT. Clearly it is advisable to match him up against Terrans, and one of the jobs of a coach is to ensure that his players end up in good matchups. In a sense, the mean discrepancy of a team is a measure of how much extra skill the coach is able to squeeze out of his team by manipulating the lineup. So how did they do? I found these mean discrepancies for each team: Woongjin Stars 4.2 STX SouL 0.5 CJ Entus -3.6 Team 8 -3.8 Samsung KHAN -8.4 EG-TL -8.5 KT Rolster -8.7 SK Telecom T1 -10.4 It is interesting to note that Woongjin is the only team with a significantly positive discrepancy, and for some reason SK is easily worst. EG-TL seems bad, but by no stretch of the imagination out of the league. It is also curious that the overall mean discrepancy is negative, but one should note that this isn't necessarily a zero sum game. It's okay to choose a player with -50 discrepancy if that causes the opponent to get -100 (since you can control the race he gets to face, right). So actually what I should have been looking at is net discrepancy. It'd be interesting to redo these calculations using that rule instead.
Haha of course you've already done it! And I totally agree, I think that the net discrepancy is what I was getting at (in a roundabout way) with my description. I was thinking today about how game # and match score effects a coaches decisions as well: Up 4-1 with a couple of Aces in your bag, playing a negative net discrepancy player isn't bad coaching: it's giving the "new guy" a chance in as close an analog to "garbage time" in basketball that you are really going to see.
Thanks for the explanation of the scoring as well, I'll be interested to see how the changes effect everything!
|
Hey, could you do me a favor and calculate something for me please?
I am trying to figure out a good "StartEloMatrix" and I use d=sum((won games of Player1- expected won games of Player1)^2) as a rough measurement as to how good my StartEloMatrix is. Could you calculate this d-value with your method of calculating Elo? I want to know how good my model is compared to yours. It's enough to do it just for the mainelo value.
For example if in one match the predicted outcome is 1.8-1.2 and the result is 2-1, then d= (2-1.8)^2=0.04. And then do it for all the matches and sum it up.
|
TheBB is on vacation right now. Also we use glicko not ELO as far as I know (not sure if that was what you asked though)
|
On May 09 2013 08:07 Greenei wrote: Okay so I tried punching through this with a bit of handwork. At the moment I am only using the ELOsystem and no racespecific MUs. I have applied the 'function' 1000 times to increase the pointflow between the regions/kespa/esfplayers for players with more then 20 games and I think my result isn't that bad. My top 10:
1. SoS 1589.3 2. Innovation 1572.8 3. Leenock 1535.5 4. Soulkey 1530 5. RoRo 1515.5 6. Rain 1504.8 7. Flash 1496.1 8. Crazy 1494.2 9. Parting 1485.7 10. Life 1483.3
Though I have to say Kespa seems a little too strong here. Maybe a few less Iterations would be better. Also I only use ~45k matches. Don't know where the rest have gone. All in all this just means, that if the Kespapros and Esfpros are playing at the same strength that they played with up until now - Kespa will at some point smash ESF.
This is a list I could agree with. Been saying forever ELO is 100x better system than what is being used here (which is based off glicko, but isn't even close tbh due to adjustments to race match ups and fake games and not actually taking proper RD into account).
Using the actual Glicko system would be great too
|
On May 13 2013 04:29 Figgy wrote:Show nested quote +On May 09 2013 08:07 Greenei wrote: Okay so I tried punching through this with a bit of handwork. At the moment I am only using the ELOsystem and no racespecific MUs. I have applied the 'function' 1000 times to increase the pointflow between the regions/kespa/esfplayers for players with more then 20 games and I think my result isn't that bad. My top 10:
1. SoS 1589.3 2. Innovation 1572.8 3. Leenock 1535.5 4. Soulkey 1530 5. RoRo 1515.5 6. Rain 1504.8 7. Flash 1496.1 8. Crazy 1494.2 9. Parting 1485.7 10. Life 1483.3
Though I have to say Kespa seems a little too strong here. Maybe a few less Iterations would be better. Also I only use ~45k matches. Don't know where the rest have gone. All in all this just means, that if the Kespapros and Esfpros are playing at the same strength that they played with up until now - Kespa will at some point smash ESF.
This is a list I could agree with. Been saying forever ELO is 100x better system than what is being used here (which is based off glicko, but isn't even close tbh due to adjustments to race match ups and fake games). Using the actual Glicko system would be great too Question: Is it as predictive as ours? Not being defensive, I am genuinely interested/curious :-)
|
On May 13 2013 04:17 Grovbolle wrote: TheBB is on vacation right now. Also we use glicko not ELO as far as I know (not sure if that was what you asked though)
Oh ok. Maybe he'll see it when he comes back.
I want to compare my specific ELO approach to his Glicko approach and see if we get similar results. Glicko is btw. pretty much a development of ELO.
This is a list I could agree with. Been saying forever ELO is 100x better system than what is being used here (which is based off glicko, but isn't even close tbh due to adjustments to race match ups and fake games and not actually taking proper RD into account).
Using the actual Glicko system would be great too
Using just ELO is not improving the list by much. The list that I posted looked like this, because I had a different startelo distribution. I have dropped it since, because it fucks with the predictability too much. I am now working on an alternative list that has good predictability AND makes sense to the knowlagable Starcraftplayer (read: adjusts the ELO in the foreigner and koreapool).
This is the List I am working with right now but is still subject to change:
Life Innovation Flash Symbol Parting SoS Leenock Soulkey Polt RoRo Bomber Rain Yoda Violet Yonghwa Squirtle (with data from 7.5.2013 or something)
|
Guys, LucifroN overtook Parting in the ranking today.
*braces for shitstorm*
|
Hello BB, is it possible to make a little statistic that shows how many times aligulac predicted the right winner out of all matches? Would be interesting to see its guessing statistics.
|
On May 16 2013 22:31 graNite wrote: Hello BB, is it possible to make a little statistic that shows how many times aligulac predicted the right winner out of all matches? Would be interesting to see its guessing statistics. It gets the right winner in 59.9% of games and 62.4% of matches. Doesn't sound terribly impressive, but with that hitrate it would make #2 on Liquibet, with 315 points.
|
Can you recognize a trend that it is getting more accurate?
|
On May 16 2013 22:56 graNite wrote: Can you recognize a trend that it is getting more accurate? Well that requires more than three lines of code. I'll have to get back to you.
|
|
I have a question about rating changes and how they relate to the intervals.
When new ratings are determined at each interval, are they taken game by game, or are they, as the layout on a period adjustment page (e.g. http://aligulac.com/players/48-INnoVation/period/85/) taken as a whole? In other words, is the rating determined by the fact that (at the time of linking) Innovation's TvZ for the period is 9-2 with an average opposition of 1772, or is this simply a summary and ratings are calculated game per game?
I ask this because if it's done as a whole per period, then Bo1s would have a much larger impact on the result because the likelihood function works on expected ratios, but individual games are binary. If a player has a 75% chance to win in a Bo1, and does win that, then this isn't balanced against a median outcome, but instead gives the player a much higher performance rating than what should actually be derived from such a scenario.
In other words, if a player with a rating of 1900 goes 1-0 against 4 players whose average rating is 1700, is it the same as going 4-0 against a single player in a single series? The former's median outcome from prediction, on a game per game basis would be the sum of the expected median outcomes of each individual one: 4-0. Whereas the expected median outcome of the latter is 4-3. If both scenarios are are weighted equally (where a person won 4 out of 4 games versus winning 4 out of 7 and losing 0 out of 7, even though the last 3 didn't have to be played) then chaining Bo1s would give a higher rating return than fewer long series.
|
|
|
|