|
@Grumbels: Sounds pretty much like what you'd expect. Only a few things I find fishy: -) the increased Terran vs Protoss winrate in your example when favoring T over Z. I'd expect the whole thing the other way around. -) not sure how you'd reach the 50% balance in the longrun, given that this is a matter of qualification format as well. Also I think you wont reach 50, but rather something in between the imbalance and 50, depending on the qualification format.
Anyways, great initiative. I will try to replicate your experiment tomorrow.
|
On July 12 2014 06:10 Big J wrote: @Grumbels: Sounds pretty much like what you'd expect. Only a few things I find fishy: -) the increased Terran vs Protoss winrate in your example when favoring T over Z. I'd expect the whole thing the other way around. -) not sure how you'd reach the 50% balance in the longrun, given that this is a matter of qualification format as well. Also I think you wont reach 50, but rather something in between the imbalance and 50, depending on the qualification format.
Anyways, great initiative. I will try to replicate your experiment tomorrow. Sorry, that's a typo. I meant the other way around.
I thought it would be fun to use "if you make protoss overpowered" as the main example everywhere specifically to annoy people, but then I thought it would undermine the point I was trying to make, so I changed it to terran. But then I forgot to change it elsewhere.
|
On July 09 2014 03:07 Grumbels wrote:Show nested quote +On July 09 2014 00:58 Spect8rCraft wrote:On July 09 2014 00:42 Grumbels wrote:The ultimate units in Starcraft 2 aren't that extreme compared to standard infantry, the game is scaled down to the point that a handful of marines can bring down massive space ships. To contrast, in Supreme Commander 2 the higher tech units can be absurdly powerful. + Show Spoiler +The mothership never worked out and Blizzard abandoned the Odin-esque thor. And Blizzard has essentially never bothered trying to make the carrier a useful unit, and I believe they also reverted some buffs for the battlecruiser in HotS beta under the pretext of it ruining other game modes(?). They've also reduced the mothership to irrelevancy and have publicly said that they're okay with capital ships making only rare appearances. (I think the tempest is different because it's a HotS unit and therefore has special protection.) In my opinion, ultimate units slow the game down too much, feel too awkward to handle, and the infrastructure requirements are too high so that you're forced into passive play. I think better odds lie with redesigning the battlecruiser so that it has some useful ability that doesn't stack, so that you're immediately rewarded for having at least one of them. I don't think Blizzard will bother to make any of them viable in any upcoming patch though. Although I can see the reason for a capital redesign, that would have to wait until LotV. I would rather not wait so long as to fix what might be amended now. D'you have any specifics on how to redesign the BC and/or carrier? Unit Supply Minerals Gas Build Time Battlecruiser 6 400 300 90 Carrier 6 350 250 120 Tempest 4 300 200 60 Colossus 6 300 200 75
Note that protoss units build quicker in practice because of chronoboost. There are a number of reasons for the colossus and tempest to see more use: - The battlecruiser and carrier are much more expensive and slower to manufacture than even the colossus and especially the tempest. This leads to huge opportunity and infrastructure costs, which makes getting only one of those units hardly worthwhile.
- Colossus attack upgrades are also more accessible.
- The colossus and tempest also fulfill necessary roles (ground control vs light units / deterrence vs zerg late-game), so you are more often immediately rewarded for building them, despite the set-up costs.
I think looking at these factors the obvious conclusion should be that battlecruisers and carriers have a quite different function than other T3 units (that of late-game ultimate army), one which is probably not necessary in this game, and not feasible at a pro level where timings are so strict. If you want them to see use you have to make them weaker and cut down on build time and cost so that they're more in line with the tempest. It would probably also help to give them a well-defined role, for instance like carriers had in Brood War. There is also the question of critical mass: brood lords benefit from broodlings obstructing, and therefore become exponentially more powerful in higher numbers as you can no longer shoot down all the broodlings in time to attack their sires. I think that's why you don't really see brood lords any more: it takes too long to get to critical mass without the addition of the old fungal, and you need enough brood lords that they become vulnerable to anti-air. Nevertheless, they do fulfill a certain function and can effectively space out ground armies once they reach critical mass. I don't know if exponential increase in strength for such units is specifically harmful, an argument can be made that it simply denotes the point when the unit can actually do its job -- brood lords can finally succeed at space control, and will still have numerous strategic weaknesses to offset this strength. Battlecruisers and carriers don't really reach critical mass, but that's also because they can't really do anything so it becomes pointless to talk about critical mass. You could say that the units would be more worthwhile to build if even a few of them could change the battle, but without a well defined role they will still feel a bit pointless imo. Though if the units are too weak they will never be strong at anything obviously, and have no function in the army, so it goes both ways. (sorry if this logic is a bit vague, slightlarge headache.. ) So yeah, reduce costs...
Battlecruiser is fine for cost. You don't want to make too much of them anyway. Use it for what it really is, a bait and switch tech forcer.
Just four or five BC's is easy to support consistent remaking, and they have reasonably powerful DPS, but even more powerful Effective HP. There's pluses and minuses to using them in TvZ or TvP. TvZ they can basically melt roach hydra with a few tanks and a gob of marines. Those hydras get retard magneted to the BCs allowing your real army to do the work without losing much units at all. They also tangle with mutalisks nicely, although need support like a Thor or handful of marines to break down a massive ball.
I've melted ultra/muta 200/200 balls with Thor/marauder/BC. In terms of bait and switch, the BC forces a large number of either corruptors or void rays (now tempests). Knowing that, you can have reactor ports ready to pump out vikings to fill the hole left by the four or five BCs that die while your main unit comp continues to kick ass because it soft or hard counters the hard counter to BCs that had to be massed in order to work properly.
Carriers are in a similar state as BCs. They are tech bait, but also meat shields. All those interceptors provide a very dangerous meat shield that bites back. I tend to deal with those like I did in brood war, massive amounts of marines, stim and kill all the intys, then send in wraith/viking to mop up. Thor javelin missiles help crush the inty ball. The only issue is if they also have high templar. EMP or focus fired siege tank works there.
The funny thing is that carriers and the old brood war reaver worked on the concept of using up resources to rebuild their method of attack in scarabs or ints. I think Swarm Hosts would not create such problematic gameplay if it cost 5 minerals every time they fired. Zerg would then have to use them with more care, and games wouldn't stalemate.
|
On July 12 2014 06:26 Grumbels wrote:Show nested quote +On July 12 2014 06:10 Big J wrote: @Grumbels: Sounds pretty much like what you'd expect. Only a few things I find fishy: -) the increased Terran vs Protoss winrate in your example when favoring T over Z. I'd expect the whole thing the other way around. -) not sure how you'd reach the 50% balance in the longrun, given that this is a matter of qualification format as well. Also I think you wont reach 50, but rather something in between the imbalance and 50, depending on the qualification format.
Anyways, great initiative. I will try to replicate your experiment tomorrow. Sorry, that's a typo. I meant the other way around. I thought it would be fun to use "if you make protoss overpowered" as the main example everywhere specifically to annoy people, but then I thought it would undermine the point I was trying to make, so I changed it to terran. But then I forgot to change it elsewhere. Not only do imbalances in other matchups in the game affect it that way – variations in mirror matchups can affect balance statistics. I wrote a blog on it a while back
Edit: I should issue a correction for that blog: I said that an individual player plays three different matchups in SC2. Scarlett apparently plays four.
|
On July 12 2014 03:54 Grumbels wrote: I made a tournament simulation for fun to check some things: Given a randomly distributed bracket with p, t, z of equal range of skill, - If you favor, for instance, terran in its match-up vs zerg, then zerg vs protoss will become zerg favored and protoss vs terran will become protoss favored. Although these effects are quite small compared to the advantage terran has vs zerg. Another interesting detail is that if you compare the results of favoring terran vs either only zerg, or both zerg and protoss, that you will find that TvZ is more broken in the former case just by looking at the win rate. - If you make mirror match-ups more 'coinflippy', for instance protoss vs protoss, then protoss win rates will fall down in other match-ups as more weaker protoss players are advancing. Similarly, if you make, say, terran vs terran more skill-intensive, terran will become favored vs zerg and protoss. - People say that only the balance at the top matters, but if you add additional good, if not great players of one race, they will produce more tournament winners, sort of depending on the values you choose. I think it matters if one race is "easier to play" than another race for tournament results even if top players are still equally matched. Obviously this is because the better player doesn't always win and as a race you're going to do well if you have more players near the top. - On the other hand, adding just one very strong player for a race can singlehandedly save their win rate, partly because a good player will play more ranked games than a weak player and have more influence on the win rates. - By manipulating the bracket you can give one race an almost 50% chance of winning the tournament instead of 33%. Not to encourage tournament organizers to do this, of course.
Actually, I tried looking for the effect where the win rates will self-correct to 50% over time, but I couldn't find it. Anyone knows why? (I'm just running single elimination brackets and checking win rates per match-up per round after giving one race a global rating boost. I suspected the win rates would be closer to 50 every subsequent round, but in fact the opposite happened) I think I'm misunderstanding the concept somehow, or maybe there's a problem with the simulation.
Anyway, I think that these results reinforce the idea that balance is more complex than looking purely at win rates. It's just some quick tests though, don't know how representative the results are or if I did something really dumb.
Any thoughts?
So, without being quite finished with my GSL simulator, my first testings seem to produce the asymptotic 50% winrates. But I'm not sure how much work I can put into that project until middle of the next week, that's why I'm giving a quick update on it long before I can post details or graphs, TT
Edit: Aaaaaaaaaaaand, mistake identified and its gone.
|
On July 12 2014 13:24 ChristianS wrote:Show nested quote +On July 12 2014 06:26 Grumbels wrote:On July 12 2014 06:10 Big J wrote: @Grumbels: Sounds pretty much like what you'd expect. Only a few things I find fishy: -) the increased Terran vs Protoss winrate in your example when favoring T over Z. I'd expect the whole thing the other way around. -) not sure how you'd reach the 50% balance in the longrun, given that this is a matter of qualification format as well. Also I think you wont reach 50, but rather something in between the imbalance and 50, depending on the qualification format.
Anyways, great initiative. I will try to replicate your experiment tomorrow. Sorry, that's a typo. I meant the other way around. I thought it would be fun to use "if you make protoss overpowered" as the main example everywhere specifically to annoy people, but then I thought it would undermine the point I was trying to make, so I changed it to terran. But then I forgot to change it elsewhere. Edit: I should issue a correction for that blog: I said that an individual player plays three different matchups in SC2. Scarlett apparently plays four.
Because she's a fucking boss-+ Show Spoiler +
|
On July 12 2014 23:01 Big J wrote:Show nested quote +On July 12 2014 03:54 Grumbels wrote: I made a tournament simulation for fun to check some things: Given a randomly distributed bracket with p, t, z of equal range of skill, - If you favor, for instance, terran in its match-up vs zerg, then zerg vs protoss will become zerg favored and protoss vs terran will become protoss favored. Although these effects are quite small compared to the advantage terran has vs zerg. Another interesting detail is that if you compare the results of favoring terran vs either only zerg, or both zerg and protoss, that you will find that TvZ is more broken in the former case just by looking at the win rate. - If you make mirror match-ups more 'coinflippy', for instance protoss vs protoss, then protoss win rates will fall down in other match-ups as more weaker protoss players are advancing. Similarly, if you make, say, terran vs terran more skill-intensive, terran will become favored vs zerg and protoss. - People say that only the balance at the top matters, but if you add additional good, if not great players of one race, they will produce more tournament winners, sort of depending on the values you choose. I think it matters if one race is "easier to play" than another race for tournament results even if top players are still equally matched. Obviously this is because the better player doesn't always win and as a race you're going to do well if you have more players near the top. - On the other hand, adding just one very strong player for a race can singlehandedly save their win rate, partly because a good player will play more ranked games than a weak player and have more influence on the win rates. - By manipulating the bracket you can give one race an almost 50% chance of winning the tournament instead of 33%. Not to encourage tournament organizers to do this, of course.
Actually, I tried looking for the effect where the win rates will self-correct to 50% over time, but I couldn't find it. Anyone knows why? (I'm just running single elimination brackets and checking win rates per match-up per round after giving one race a global rating boost. I suspected the win rates would be closer to 50 every subsequent round, but in fact the opposite happened) I think I'm misunderstanding the concept somehow, or maybe there's a problem with the simulation.
Anyway, I think that these results reinforce the idea that balance is more complex than looking purely at win rates. It's just some quick tests though, don't know how representative the results are or if I did something really dumb.
Any thoughts? So, without being quite finished with my GSL simulator, my first testings seem to produce the asymptotic 50% winrates. But I'm not sure how much work I can put into that project until middle of the next week, that's why I'm giving a quick update on it long before I can post details or graphs, TTEdit: Aaaaaaaaaaaand, mistake identified and its gone.  I think it makes sense though.
With my simulation I was singling out the top players (i.e. the ones advancing in the tournament) and if you strengthen one race it will not only have better representation but stronger top players. However, I suppose that if you had the middling players face each other in endless consolation matches, that you would start to replicate the predicted effects of asymptotic win rates that you might also see in the ladder.
I don't know if it's true, but I imagine that code A qualifiers will be closer to 50% than Code S/A in times of imbalance. And that GM/Masters winrates might be more revealing of imbalance than Diamond and below.
Which in my opinion possibly makes it dubious to state without proof that win rates from, say, the Korean scene are meaningless because of this tendency to go to 50%. I'm sure it could be true, but it probably depends on your sample, so in that case it's not categorically true, and then we'd really need someone with more statistical expertise to inform us.
|
On July 13 2014 02:58 Grumbels wrote:Show nested quote +On July 12 2014 23:01 Big J wrote:On July 12 2014 03:54 Grumbels wrote: I made a tournament simulation for fun to check some things: Given a randomly distributed bracket with p, t, z of equal range of skill, - If you favor, for instance, terran in its match-up vs zerg, then zerg vs protoss will become zerg favored and protoss vs terran will become protoss favored. Although these effects are quite small compared to the advantage terran has vs zerg. Another interesting detail is that if you compare the results of favoring terran vs either only zerg, or both zerg and protoss, that you will find that TvZ is more broken in the former case just by looking at the win rate. - If you make mirror match-ups more 'coinflippy', for instance protoss vs protoss, then protoss win rates will fall down in other match-ups as more weaker protoss players are advancing. Similarly, if you make, say, terran vs terran more skill-intensive, terran will become favored vs zerg and protoss. - People say that only the balance at the top matters, but if you add additional good, if not great players of one race, they will produce more tournament winners, sort of depending on the values you choose. I think it matters if one race is "easier to play" than another race for tournament results even if top players are still equally matched. Obviously this is because the better player doesn't always win and as a race you're going to do well if you have more players near the top. - On the other hand, adding just one very strong player for a race can singlehandedly save their win rate, partly because a good player will play more ranked games than a weak player and have more influence on the win rates. - By manipulating the bracket you can give one race an almost 50% chance of winning the tournament instead of 33%. Not to encourage tournament organizers to do this, of course.
Actually, I tried looking for the effect where the win rates will self-correct to 50% over time, but I couldn't find it. Anyone knows why? (I'm just running single elimination brackets and checking win rates per match-up per round after giving one race a global rating boost. I suspected the win rates would be closer to 50 every subsequent round, but in fact the opposite happened) I think I'm misunderstanding the concept somehow, or maybe there's a problem with the simulation.
Anyway, I think that these results reinforce the idea that balance is more complex than looking purely at win rates. It's just some quick tests though, don't know how representative the results are or if I did something really dumb.
Any thoughts? So, without being quite finished with my GSL simulator, my first testings seem to produce the asymptotic 50% winrates. But I'm not sure how much work I can put into that project until middle of the next week, that's why I'm giving a quick update on it long before I can post details or graphs, TTEdit: Aaaaaaaaaaaand, mistake identified and its gone.  I think it makes sense though. With my simulation I was singling out the top players (i.e. the ones advancing in the tournament) and if you strengthen one race it will not only have better representation but stronger top players. However, I suppose that if you had the middling players face each other in endless consolation matches, that you would start to replicate the predicted effects of asymptotic win rates that you might also see in the ladder. I don't know if it's true, but I imagine that code A qualifiers will be closer to 50% than Code S/A in times of imbalance. And that GM/Masters winrates might be more revealing of imbalance than Diamond and below. Which in my opinion possibly makes it dubious to state without proof that win rates from, say, the Korean scene are meaningless because of this tendency to go to 50%. I'm sure it could be true, but it probably depends on your sample, so in that case it's not categorically true, and then we'd really need someone with more statistical expertise to inform us.
yeah, it seems like the winrates are very close to the "true winchance" variable. I couldn't even create that 50%-effect when limiting to 2races only with 60:40 balance. The winrates are always like 60:40, and in terms of tournament winners it seems to be even rougher. (which of course makes sense, since many games with 40% balance of course leads to a much smaller chance to win a tournament). It's always the distribution that drops off, while the winrate still stays close to the balance.
I have been working with "skill values" following a normal distribution around mean=100, sd=20 with caps of 133 and 67. Maybe without these kinds of caps you may get some forms of "Superterrans" in the tournament that start winning "way more", though I highly doubt it since that in an overall playerpool of 128, I'm not finding too many of those 133capped ones to begin with, not even talking about the ones from one specific race... Not to mention that those Superterrans should still eventually face Superzergs (with 40:60) and Superprotoss (with 50:50) in my simulations that they cannot face right now.
Those tests however start to convince me that the situation for Terrans currently is probably not as harsh as I though, at least in one of their matchups. And that toplevel winrates seem to be much more trustworthy than I previously thought (given a reasonable samplesize of course).
|
Hydra so stong vs air.Toss need some timeings when he can use air.Maybe nerf hydra damage to mech-psionic or air-psionic units.Its mass oracle pvz buff.Maybe 3-6 oracles can fight vs hydra.
|
Actually, I tried looking for the effect where the win rates will self-correct to 50% over time, but I couldn't find it. Anyone knows why? (I'm just running single elimination brackets and checking win rates per match-up per round after giving one race a global rating boost. I suspected the win rates would be closer to 50 every subsequent round, but in fact the opposite happened) I think I'm misunderstanding the concept somehow, or maybe there's a problem with the simulation.
How does your model work? Does it take account qualification for events, and thus players may not show up in the statistics if they don't meet a certain level of results?
- On the other hand, adding just one very strong player for a race can singlehandedly save their win rate, partly because a good player will play more ranked games than a weak player and have more influence on the win rates.
What are we talking about here? GSL specifically? I think noone really is focussing too much on GSL win/rates, but rather are looking at Aliguac statistics. Is this what your trying to replicate here?
If you make mirror match-ups more 'coinflippy', for instance protoss vs protoss, then protoss win rates will fall down in other match-ups as more weaker protoss players are advancing. Similarly, if you make, say, terran vs terran more skill-intensive, terran will become favored vs zerg and protoss.
Interesting, haven't thought of this. Let's use this as an argument for terran never being OP in WOL.
|
So, I have been doing testings in greater scales now, trying to find the 50% effect or disprove the theory. This is what I'm doing:
Initial1: Create a playerbase (e.g. 128 or 1024players), that have a skillindicator and a raceindicator Initial2: choose 32players randomly from the playerbase and let them play a Code S tournament and record the stats Step1: choose 48new players from the playerbase and let those +the bottom 16 of the last CodeS play a Code A tournament. The next Code S are the top16 of this tournament and the top16 of the last Code S tournament. Step2: play a Code S and record the stats ... repeat Step1 and Step2 until 100CodeS have been played. Record the winrates of the CodeS tournaments.
Then repeat the whole process 100times with new playerbases and take an average over the winrates for each Code S. (so the average over 100 initial CodeS, the average over 100 "second" Code S...) For everything 50:50balanced, this creates the following graph:
![[image loading]](http://i.imgur.com/7MZCkrU.jpg) The amount of mirror matches is pretty even too.
For the blue matchup (race1 vs race2) having a 40:60 balance, it looks like this:
![[image loading]](http://i.imgur.com/pMXo9es.jpg) The amount of mirror matches is: race1: 54256 race2: 281895 race3: 134548 Note that, unlike what Grumbels was talking about and what I thought I'd see on a smaller scale, there seems to be hardly any effect on the other matchups.
For both matches of race1 being 40:60, I get this graph:
![[image loading]](http://i.imgur.com/txtM8yo.jpg) Mirrors: race1: 14404 race2: 263927 race3: 253063
And then I ran a race1<race2<race3<race1 (each with 40:60 balance) and got this result:
![[image loading]](http://i.imgur.com/T7FaOrh.jpg) Again with even amount of Mirrormatches (~140000each).
For the sake of clarity: All matches between two players are played in a Bo5. All tournaments are played Knock-Out style, start to end. A single game of a Bo5 between two players p1 and p2 gets decided in the following way: Create a random variable 1 or 0. If 1 gets created p1 wins, if 0gets created p2 wins. The probability to create 1 is given by the formula: skill1/(skill1+skill2)*(matchupindicator1v2) This of course means the chance to create a 0 is skill2/(skill1+skill2)*(matchupindicator2v1) The matchupindicator describes the winchance of race to win against another, e.g. the value for 50:50balance is 1 (so only skill matters). I have been working with a few calibrations for skilllevel. Currently, I'm creating Gaussian distributed values around 200 with a standard deviation of 50. So a standard player has roughly a value between 150 and 250, with extremes to 350 or 50 (and a mainly theoretical cap at 400 and 0). I realize this isn't the best calibration though, because given a player with 250 skill playing against one at 150skill, this merely produces a 62.5% winrate. Basically we could be talking about somewhat coinflippy matchups. I will try to work with this a little more, but given that results with other calibrations haven't been very different, my guess is that it does not matter that much for these grand scheme pictures.
I may be making mistakes as well, so keep in mind this is just a personal little programming excursion of mine.
|
Initial1: Create a playerbase (e.g. 128 or 1024players), that have a skillindicator and a raceindicator Initial2: choose 32players randomly from the playerbase and let them play a Code S tournament Step1: choose 48new players from the playerbase and let those +the bottom 16play a Code A tournament. The next Code S are the top16 of this tournament and the top16 of the last Code S tournament.
I don't understand this proces.
Why do you choose 32 players randomly from the database and let them play a code s tournament?
Here is what I would do:
Have a database of like 1xxx players with a random distribution of skill level points. Probably of winning should be based on skill level + race strenght.
The top 128 (this is based on skill level + race strenght) gets into a Code A win or get knocked out tournament The top 32 of in the Code A tournament qualifies for a Code S tournament. The losers don't play any more games. Code S tournament is also win or get knocked out.
In this proces you make sure that there will be fewer players of the UP race who gets recorded games, which means the win/rates of the UP race gets much closer to 50/50. With your methodology the UP race gets a solid amount of games regardless of their succes.
|
On July 13 2014 20:53 Big J wrote:
I have the feeling that player skill is not valued enough in your win formula. A 350 player has a winchange of 50% against someone with just 210 if he plays the weaker race.
Maybe try it with a lower imbalance setting?
EDIT: Do you actually see a accumulation of high skilled players in your model?
|
@Hider,
My methodology was: + Show Spoiler +- create a player pool of 256 players - - using Elo ratings to denote skill - - evenly distributed skill & race - for n times - - put them into a 256 man single elimination tournament with random seeding - - run the tournament and record the win rates per round and count winners - get results.
It was a first try to see if I could replicate some of the effects. It's not representative of the GSL where you have players moving up and down in leagues, but it sparked some discussion so I think it was a worthwhile effort.
It's possible to more faithfully recreate the GSL tournament format; and perhaps if one uses aligulac fairly realistic rating values could be assigned to the players. In that scenario I think you might be able to take the results of the simulation at face value, while in my initial testing one can only put stock in matters such as whether something is higher or lower than equilibrium.
Note that, unlike what Grumbels was talking about and what I thought I'd see on a smaller scale, there seems to be hardly any effect on the other matchups.
The red line seems a bit below 50% though, it seems significant enough. In my testing the effect would be about ten times smaller than the initial imbalance, so it's quite small but still noticeable. In comparison, the effect of making certain match-ups more / less 'skillful' were stronger.
(of course we're using different tournament formats and yours seems more realistic, which is why I'm hesitant to state the exact values)
I have been working with a few calibrations for skilllevel. Currently, I'm creating Gaussian distributed values around 200 with a standard deviation of 50. So a standard player has roughly a value between 150 and 250, with extremes to 350 or 50 (and a mainly theoretical cap at 400 and 0). I realize this isn't the best calibration though, because given a player with 250 skill playing against one at 150skill, this merely produces a 62.5% winrate. Basically we could be talking about somewhat coinflippy matchups. I will try to work with this a little more, but given that results with other calibrations haven't been very different, my guess is that it does not matter that much for these grand scheme pictures.
Should the calibration matter at all? Also, does it matter that you're using Bo5 instead of Bo1? I don't think it should have any effect on the results. No matter how small the rating difference, it will never be at exactly 50% and therefore will always show up given enough simulations, and no matter how high it will never be at exactly 100% either .
|
On July 13 2014 21:08 Hider wrote:Show nested quote +Initial1: Create a playerbase (e.g. 128 or 1024players), that have a skillindicator and a raceindicator Initial2: choose 32players randomly from the playerbase and let them play a Code S tournament Step1: choose 48new players from the playerbase and let those +the bottom 16play a Code A tournament. The next Code S are the top16 of this tournament and the top16 of the last Code S tournament. I don't understand this proces. Why do you choose 32 players randomly from the database and let them play a code s tournament? Here is what I would do: Have a database of like 1xxx players with a random distribution of skill level points. Probably of winning should be based on skill level + race strenght. The top 128 (this is based on skill level + race strenght) gets into a Code A win or get knocked out tournament The top 32 of in the Code A tournament qualifies for a Code S tournament. The losers don't play any more games. Code S tournament is also win or get knocked out. In this proces you make sure that there will be fewer players of the UP race who gets recorded games, which means the win/rates of the UP race gets much closer to 50/50. With your methodology the UP race gets a solid amount of games regardless of their succes.
The process of playing a tournament works like this: The 32 players get randomly matched into pairs. Those pairs play a Bo5. The outcome of a Bo5 is of course skill/race based.
The idea of the system: I have a starting point, at which the racial distribution starts evenly, since the players are randomly chosen for the first tournament. So I'd expect the winrates to be reflecting the balance (40:60, 50:50 and 50:50) In the consecutive tournaments, by means of the qualification progress, I'm making sure that the weaker players and the players of the UP race drop out. The amount of mirror matches of the UP race gets lower and the amount of mirrors of the OP race gets higher by time. Amount of mirror matches for the UP race (40:60 against race2): + Show Spoiler + Amount of mirror matches for the OP race (60:40 vs race1): + Show Spoiler + Amount of mirror matches for the balanced race (50:50 in both matchups) + Show Spoiler +
What does not change is the winrate, however. Note, I'm only recording Code S stats, which, apart from the first occurence does require you to qualify (either via Code S or Code A) before you can play there.
So yes, I am making sure that the UP race gets less games with that system! Not in the first GSL, but in the consecutive ones. You could say, I'm trying to simulate what happens when there once was balance (first GSL), and then a balance change occurs and makes one race lose ground.
The more I think about it, the more I believe the 50% effect does not exist. This is my current explanation: On first glance, it is right that only the better Terrans get to play when they are underpowered against, say Zerg. Now what happens is that Zergs start to replace Terrans in racial distribution. However, they are not overtaking by a degree that would allow the top Terrans to brutalize them and even out the winrates. Why? Because Protoss prevents those Zergs from entering the competition at that level, since that matchup is still balanced. A patchzerg beating a Terran still also has to be able to beat Protoss and Zerg players to reach the tournaments, which he simply does not. Even more since the amount of Terrans is low, his ability to beat Terrans is marginally important!
On July 13 2014 21:31 submarine wrote:I have the feeling that player skill is not valued enough in your win formula. A 350 player has a winchange of 50% against someone with just 210 if he plays the weaker race. Maybe try it with a lower imbalance setting? EDIT: Do you actually see a accumulation of high skilled players in your model?
yes. I have been watching through single 100tournament runs and I keep track of the amount of Code S participations and amount of Code S titles for each player. Here is an example of such a table: First column is the player number, by which the system loads in the player data. Second column is his race. Third column his skill. 4th the amount of Code S appearances (in 100 Code S seasons). 5th the amount of titles he has won (in 100 Code S seasons). + Show Spoiler +
If you look at player 34 or 41 of the balanced race3, you will find that those are extraordinarily strong compared to the others and thus have much better stats. In case of the overpowered race2 however, it this kind of skill can result in some dominating performances (e.g. player10), while average players can still win GSLs (e.g. player21). Meanwhile it takes a real beast of the UP race1, to even win a title (player 45).
|
On July 13 2014 21:39 Grumbels wrote:Show nested quote + Note that, unlike what Grumbels was talking about and what I thought I'd see on a smaller scale, there seems to be hardly any effect on the other matchups.
The red line seems a bit below 50% though, it seems significant enough. In my testing the effect would be about ten times smaller than the initial imbalance, so it's quite small but still noticeable. In comparison, the effect of making certain match-ups more / less 'skillful' were stronger. (of course we're using different tournament formats and yours seems more realistic, which is why I'm hesitant to state the exact values)
Yes, the red line is a tiny bit under the green one. The red line however denotes race1:race3, so this actually means that the underpowered race1 wins a tiny bit (between 0.5% to 1% on average) less against race3 in its balanced matchup. (back to what you originally said in your typo, haha) I can't quite explain that effect, so I didn't want to talk about it and I'm still looking for a mistake in my result-counting-system instead. Truth to be spoken, it seems fishy, but I can't quite see where I would make a mistake.
On July 13 2014 21:39 Grumbels wrote:Show nested quote + I have been working with a few calibrations for skilllevel. Currently, I'm creating Gaussian distributed values around 200 with a standard deviation of 50. So a standard player has roughly a value between 150 and 250, with extremes to 350 or 50 (and a mainly theoretical cap at 400 and 0). I realize this isn't the best calibration though, because given a player with 250 skill playing against one at 150skill, this merely produces a 62.5% winrate. Basically we could be talking about somewhat coinflippy matchups. I will try to work with this a little more, but given that results with other calibrations haven't been very different, my guess is that it does not matter that much for these grand scheme pictures.
Should the calibration matter at all? Also, does it matter that you're using Bo5 instead of Bo1? I don't think it should have any effect on the results. No matter how small the rating difference, it will never be at exactly 50% and therefore will always show up given enough simulations, and no matter how high it will never be at exactly 100% either .
I don't think it matters, especially in that great scale. That was mostly just to point out how to interprete these rather low winrates against each other ("coinflippy matchups"). If I'd increase the deviation further, I guess skill would matter more, but it average out over 100runs anyways. The Bo5 format is not very important either. Both of those things only matter for a single playerbase to see the effects of stronger/weaker playerbases for a race. In the 100run average, it's not interesting.
|
In the consecutive tournaments, by means of the qualification progress, I'm making sure that the weaker players and the players of the UP race drop out.
Yeh, and I don't think that's actually what happens with the Aliguac statistics. The players playing in a ROx tournament isn't randomly decided upon. I believe instead that it's much more realistic to assume that you need to have a certain level of succes (skills + race advantage) in order to qualify for a tournament in the first place. I realize this isn't always the case, but it happens often enough that it too a large extent makes the model close to useless if you doesn't take it into account
It's not enough to assume that only the weaker players drop out, when they shouldn't have been playing in the first place.
Let me try to use a simple example to show why your model doesn't go towards 50/50, but that if you make the adjustment that I believe is realistic, then it does.
Model 1
Assume the following; - 2 Races - Equal average skill level: - 4 players invited into a tournament that starts from the Semifinal. - The OP race gets a race point of 50 and the UP race gets race points of 25. - Probability of winning is based upon: Skill points + Race Points.
There are 4 players. Player 1: Skill points = 100. Race points = 50. Total succes points = 150 Player 2: Skill points = 100. Race points = 25. Total succes points = 125 Player 3: Skill points = 100. Race points = 50. Total succes points = 150 Player 4: Skill points = 100. Race points = 25. Total succes points = 125
Player 1 and Player 2 meets each other and lets assume that the probably of winning is determined in this way: (150/125)/2 = 60% probabliy for P1 of winning.
In the other game, P3 and P4 are playing. P of winning for P3 = 40%.
So what happens here is that the stronget race (presented by P3 and P1) goes to the final more often as they have a 60% probability to go the final while the weaker race has a lower probability. That means that total games played is higher for the stronger race and win/rates doesn't go towards 50% as your model implies.
Model 2
Make the following assumptions, - 8 players - Assume only the 4 best players gets invited into the 4P tournament. - Average skill level for each race is maintained, however, there are now 4 layers of skill.
Player 1: Skill points = 125. Race points = 50. Total succes points = 175 Player 2: Skill points = 125. Race points = 25. Total succes points = 150 Player 3: Skill points = 100. Race points = 50. Total succes points = 150 Player 4: Skill points = 100. Race points = 25. Total succes points = 125 Player 5: Skill points = 90. Race points = 50. Total succes points = 140 Player 6: Skill points = 90. Race points = 25. Total succes points = 115 Player 7: Skill points = 80. Race points = 50. Total succes points = 130 Player 8: Skill points = 80. Race points = 25. Total succes points = 105
So the tournament consist of P1, P2, P3 and P5. In this case, it's only P2 that is representing the weaker race
If we assume a random draw, then P2 is gonna have an average win/rate of: ((150*3)/(150+175+140)/2) = 48%.
Thus, quite close to 50%, but the total games played by the race is gonna be quite a bit lower than the total games played by the other races. So it's quite clear that if you take into account a certain "succes level" for being egiblbe to participate in the tournament, then win/rates goes towards 50/50. Only real indicator of balance is either total amount of mirror matches played or total amount of games played by the race.
|
@Hider, those values are completely arbitrary though.
|
On July 13 2014 22:33 Grumbels wrote: @Hider, those values are completely arbitrary though.
Yes, and you can get whatever number you want. You can even get win/rates for the UP race to be above 50%. And that's kinda the point: Don't trust the naked win/rates, because they do not take into account the effect that my model shows exists.
The larger the sample size with a higher amount of randomly distributed values, the closer this model will go towards 50/50.
Then you can start assuming that only X% of tournaments work that way and the other (1-X)% doesn't. Then the long-term equilbrium win/rate is gonna be lower than 50% indeed (but still higher than what the race imbalance would imply").
I think if we compare the implications of my findings to the real world, then my model just makes more sense than BigJ's. If we all can agree upon that terran (balance-wise) has become a lot worse over the last 3-4 years, then according to BigJ's model we would be able to find a huge effect in the win/rates. But if we look at Aliguac statistics, that's barely the case. Win/rates have only changed by 3-4 percentage point while the distriubtion of compeittive players have changed signifciantly.
BigJ's model would be not be able to predict that development, but it fits perfectly into the assumptions of my model.
|
First of all, you are not describing to what extend your system is imbalanced. The winrate you achieve in your example is 48%, but what if I was to tell you that the expected imbalance is 48%? (it is not quite, a rough estimation in Model2 says it is around 46%, but it's not really calculateable without a formula that determines how those skillpoints are being used to create wins, given the second point I'll make) Secondly, your formula of determing wins is wrong. It's not symmetric, since, if you are player1 in the calculation, you have an advantage over player2. The outcome of (p1 vs p2) is not the same as 1-(p2 vs p1). Thirdly, as Grumbels said, the values are arbitrary. I could just assign other skill values to create a different expected winrate for the underpowered race. I probably could assign it in a way that is is 48%, hence your example would then tell us that there is no "closer to 50%" effect.
Lastly, my model does prevent the players with the lower skill+race to play in Code S, apart from season1.They won't even make it through Code A usually. That's where in the theory the winrates should start growing towards 50, but they don't. After - say 50 - simulated seasons, only the highest skilled players of the playerbase are left in CodeS in my system. The randomly chosen ones for CodeA have to fight against 16 of those "best" players to have a chance for CodeS and get a mentioning in the statistics. Somewhere in the later seasons, SOME of the effect just HAS TO show up. It's just not probable (like at all), that enough of weaker race1 players make it through CodeA to prevent some higher winrate than 40%.
Edit: Also about aligulac. I think you are underestimating how many games in aligulac are played by very low level players. Just look at the latests results: http://aligulac.com/results/ I actually believe, that the qualification process to get a mentioning in aligulac is much, much lower than the one that I have implemented in my model. Simply, there is no qualification needed to attend a weekly tournament. But the amount of games played in a weekly tournament do probably exceed the amount of games played by professionals. Similarily, the amount of games played in the Dreamhack qualification rounds do exceed the amount of games in the later rounds when only professionals are left. All of those games are mentioned however.
Given how I calibrated my model, the best comparison is with the real CodeS.
|
|
|
|