• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 07:28
CET 12:28
KST 20:28
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
ByuL: The Forgotten Master of ZvT30Behind the Blue - Team Liquid History Book19Clem wins HomeStory Cup 289HomeStory Cup 28 - Info & Preview13Rongyi Cup S3 - Preview & Info8
Community News
2026 KongFu Cup Announcement3BGE Stara Zagora 2026 cancelled12Blizzard Classic Cup - Tastosis announced as captains15Weekly Cups (March 2-8): ByuN overcomes PvT block4GSL CK - New online series18
StarCraft 2
General
BGE Stara Zagora 2026 cancelled Blizzard Classic Cup - Tastosis announced as captains BGE Stara Zagora 2026 announced ByuL: The Forgotten Master of ZvT Terran AddOns placement
Tourneys
RSL Season 4 announced for March-April PIG STY FESTIVAL 7.0! (19 Feb - 1 Mar) Sparkling Tuna Cup - Weekly Open Tournament 2026 KongFu Cup Announcement [GSL CK] Team Maru vs. Team herO
Strategy
Custom Maps
Publishing has been re-enabled! [Feb 24th 2026] Map Editor closed ?
External Content
The PondCast: SC2 News & Results Mutation # 516 Specter of Death Mutation # 515 Together Forever Mutation # 514 Ulnar New Year
Brood War
General
BGH Auto Balance -> http://bghmmr.eu/ BSL 22 Map Contest — Submissions OPEN to March 10 ASL21 General Discussion Are you ready for ASL 21? Hype VIDEO Gypsy to Korea
Tourneys
[Megathread] Daily Proleagues [BSL22] Open Qualifiers & Ladder Tours IPSL Spring 2026 is here! ASL Season 21 Qualifiers March 7-8
Strategy
Simple Questions, Simple Answers Soma's 9 hatch build from ASL Game 2 Fighting Spirit mining rates Zealot bombing is no longer popular?
Other Games
General Games
Stormgate/Frost Giant Megathread Path of Exile Nintendo Switch Thread PC Games Sales Thread No Man's Sky (PS4 and PC)
Dota 2
Official 'what is Dota anymore' discussion The Story of Wings Gaming
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Five o'clock TL Mafia Mafia Game Mode Feedback/Ideas Vanilla Mini Mafia TL Mafia Community Thread
Community
General
US Politics Mega-thread Mexico's Drug War Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine NASA and the Private Sector
Fan Clubs
The IdrA Fan Club
Media & Entertainment
[Manga] One Piece Movie Discussion! [Req][Books] Good Fantasy/SciFi books
Sports
Formula 1 Discussion 2024 - 2026 Football Thread General nutrition recommendations Cricket [SPORT] TL MMA Pick'em Pool 2013
World Cup 2022
Tech Support
Laptop capable of using Photoshop Lightroom?
TL Community
The Automated Ban List
Blogs
Funny Nicknames
LUCKY_NOOB
Money Laundering In Video Ga…
TrAiDoS
Iranian anarchists: organize…
XenOsky
FS++
Kraekkling
Shocked by a laser…
Spydermine0240
Unintentional protectionism…
Uldridge
ASL S21 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 2537 users

Ladder-Balance-Data - Page 22

Forum Index > SC2 General
Post a Reply
Prev 1 20 21 22 23 24 26 Next All
hunts
Profile Joined September 2010
United States2113 Posts
July 13 2012 01:28 GMT
#421
On July 13 2012 10:23 Jadoreoov wrote:
First off I'd like to point out that the normality of the data doesn't really matter because of the Central Limit Theorem, so please stop discussing that like it matters.

Continuing with lolcanoe's analysis, I found the 99% confidence intervals for the difference in mean for each group.

Race Results
+ Show Spoiler +
For US:
ZvT
(62.0, 164.6)
PvT
(8.9, 115.0)
ZvP
(3.3, 99.4)

For EU:
ZvT
(19.6, 108.6)
PvT
(18.3, 113.2)
ZvP
(-45.3, 42.0)

US and EU:
ZvT
(51.5, 118.8)
PvT
(28.9, 99.6)
ZvP
(-11.1, 53.2)


As for US vs EU, the 99% confidence interval for the mean difference in MMR is:
(21.9, 77.1)

For each interval a positive difference indicates the mean of the first population is higher than the second, so for US vs EU it reads, 99% of such samplings will yield a result such that the mean MMR of the US player base is between 21.9 and 77.1 MMR higher than that of the EU player base.

The meaning of a 99% confidence interval for the mean is as follows:
If we were to randomly pick samples of the same size* from each population and found the difference of the means between the groups, 99% of such samplings would result in a difference of means within the given interval.

*By same size I mean the same sizes as were sampled to construct the interval, so if the interval were constructed by sampling 10 Zergs and 15 Protosses, it would be random samples of 10 and 15, respectively.

I've provided the MATLAB code I used for the analysis if anyone can run it and wants to do analysis on future data:

Helper Function
+ Show Spoiler +
function [lower,upper] = findInterval(pop1,pop2,confidence)
mu1 = mean(pop1);
mu2 = mean(pop2);
s1 = std(pop1,1);
s2 = std(pop2,1);
n1 = length(pop1);
n2 = length(pop2);
diff = mu1-mu2;
df = (s1^2/n1 + s2^2/n2)^2/((s1^2/n1)^2/(n1-1)+(s2^2/n2)^2/(n2-1));
tcrit = tinv(1-(1-confidence)/2,df);
s = sqrt(s1^2/n1 + s2^2/n2);
halfrange = tcrit*sqrt(s1^2/n1 + s2^2/n2);
lower = diff-halfrange;
upper = diff+halfrange;
end


Main script
+ Show Spoiler +
%script for calculating balance

%get data from file (would be ez if OP hadn't put quotes in the .csv, BAD!)
fid = fopen('balance.csv');
str = char(fread(fid))';
fclose(fid);

omitFirstLine = '(?<=\n).*';
stripped = str( regexp(str,omitFirstLine):end ); %strip first line
rawdata = textscan(stripped, '%s %s %d', 'delimiter',' \t\n,"',...
'MultipleDelimsAsOne', 1);

%define some constants (not saying protoss #1)
protoss=1;
zerg=2;
terran=3;
US = 1;
EU = 2;

%combine into one big array
col = length(rawdata{3});
data = zeros(col, 3);
data(:,3) = rawdata{3};
for i=1:col
if ( rawdata{1}{i}(1) == 'U')
data(i,1) = US;
else
data(i,1) = EU;
end

if ( rawdata{2}{i} == 'z')
data(i,2) = zerg;
elseif ( rawdata{2}{i} == 'p')
data(i,2) = protoss;
else
data(i,2) = terran;
end
end

%define filters
tF = data(:,2) == terran;
pF = data(:,2) == protoss;
zF = data(:,2) == zerg;
uF = data(:,1) == US;
eF = data(:,1) == EU;

%construct the 99% confidence intervals based on a two-sided t-test
%zerg vs protoss
confidence = 0.99;
place = eF | uF; %lets you quickly change if US,EU, or both (uF | eF)
[zpLower,zpUpper] = findInterval( data(zF & place,3), data(pF & place,3),confidence);
[ztLower,ztUpper] = findInterval( data(zF & place,3), data(tF & place,3),confidence );
[tpLower,tpUpper] = findInterval( data(tF & place,3), data(pF & place,3),confidence );
[UsEuLower,UsEuUpper] = findInterval( data(uF,3), data(eF,3), confidence);



Nice work, though it might be nice to narrow it down to a 95% CI to get a slightly better measurement I think. I'm too lazy to do it though :D
twitch.tv/huntstv 7x legend streamer
Jadoreoov
Profile Joined December 2009
United States76 Posts
July 13 2012 02:20 GMT
#422
Done:

95% confidence intervals for the EU and US combined:
ZvT:
(59.5, 110.7)
PvT
(37.3, 91.2)
ZvP
(-3.7, 45.5)

US vs EU
(28.5, 70.5)
lolcanoe
Profile Joined July 2010
United States57 Posts
Last Edited: 2012-07-13 02:46:42
July 13 2012 02:45 GMT
#423
On July 13 2012 10:23 Jadoreoov wrote:
First off I'd like to point out that the normality of the data doesn't really matter because of the Central Limit Theorem, so please stop discussing that like it matters.

No. No. No. No....More misinformation. Normal distributions are indeed pretty prevalent in the real world, and the central limit theorem is a good rule of thumb, but its these sorts of assumptions that have lost certain financial entities billions as well.

Take stock prices returns - approximately normal - but with a fat left-tail. If you used a normal distribution you would severely undervalue the possibility of total disaster and hence under-price risk. Hence, returns are best modeled with a modified distribution to account for the extremities. Or waiting times in a queue, where you have a very long right tail but a distinctly left weighted distribution (think about it, you have a minimum of 0, but a max of infinity, with a peak that is much closer to left than right).

Most of all, we are dealing with an entirely man-made distribution here. If you counted by league only, you'd have 20 20 20 20 20 EVENLY distributed. For MMR, the way the curve shaped is ENTIRELY shaped by modeling software. If Blizzard wanted to they could create a distribution of any type. With our data we can only guess the distribution and approximate our statistics under reasonable normal guidelines (after establishing that normality is a possible model).

Hope this makes sense, and I really encourage you to keep this in mind, especially if you ever plan to work on Wall Street in your life time.
Excalibur_Z
Profile Joined October 2002
United States12240 Posts
Last Edited: 2012-07-13 02:58:35
July 13 2012 02:53 GMT
#424
Yes, the MMR cap exists. A floor likely also exists.

Don't get defensive when other community members demand more thorough data or a stronger analysis. Understanding the ladder is a communal effort. lolcanoe and Lysenko bring up salient points that should be addressed in order to produce more concrete hypotheses, even if this means refuting existing hypotheses.

We call the reverse-engineered values (points -> adj.pts -> adj.pts with offsets removed) "MMR" because that's the closest representation of MMR we have. We know that the "actual" hidden MMR factors in an uncertainty value when determining the degree of change after a match, but it's unlikely that will ever be deciphered.

The league and division offsets used by the MMR tool are not exact, but they're somewhat close. Still, this introduces a margin of error. This is probably mitigated by the volume of data, and even the relatively arbitrary values that are calculated can be used when compared to each other for the purposes of gauging race balance, because the margin of error applies universally to each race and matchup.

One thing I want to be very careful about is considering any part of this interpretation as "final" data. Every other person who has posted theories about how the ladder works in the past has fallen into the same trap of interpreting his data incorrectly until it fits his conclusions, so it's important we don't repeat that mistake. The data must remain impartial. The only additional information we have about the ladder comes from Josh himself.

Also a special side note: the ladder isn't 20/20/20/20/18/2 anymore. There were some offset corrections and I don't know the new targeted distribution, but I would say conservatively it's closer to 20/20/20/20/16/4. I don't expect Blizzard to release the new target values.
Moderator
Jadoreoov
Profile Joined December 2009
United States76 Posts
July 13 2012 03:15 GMT
#425
@lolcanoe

The issue wasn't whether the distribution itself was close to normal at all. It can be the most skewed thing in the world. The issue is that the sample size is very large, so the distribution of the SAMPLING MEAN is approximately normal.

In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.


The students t-test assumes that the distribution of the sampling mean is approximately normal, but makes no assumptions regarding the underlying distribution of the data itself.
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 03:38 GMT
#426
Oh, it's nice that you guys are redoing what I did back at page 10, but now with more statistics.

- Yes, I think we have enough statistics, and the distribution is well behaved enough so that central limit theorem will give a sufficiently accurate estimate of the statistical error.

- However, it does assume that the samples are uncorrelated. OP, you said that you removed duplicates from the list, but do you think there can be other correlations in the list of samples? You probably know best exactly what is in the list. If there are still correlations, it means that the error should be larger than what you get from a central limit analysis. But it seems like the (small) signal will still be significant, even if the error is increased a bit. Hopefully there shouldn't be large correlations in there?
lolcanoe
Profile Joined July 2010
United States57 Posts
July 13 2012 03:40 GMT
#427
On July 13 2012 12:15 Jadoreoov wrote:
@lolcanoe

The issue wasn't whether the distribution itself was close to normal at all. It can be the most skewed thing in the world. The issue is that the sample size is very large, so the distribution of the SAMPLING MEAN is approximately normal.

You should scroll down the page you quoted.

"In a specific type of t-test, these conditions are consequences of the population being studied, and of the way in which the data are sampled. For example, in the t-test comparing the means of two independent samples, the following assumptions should be met:
Each of the two populations being compared should follow a normal distribution. This can be tested using a normality test, such as the Shapiro-Wilk or Kolmogorov–Smirnov test, or it can be assessed graphically using a normal quantile plot.
If using Student's original definition of the t-test, the two populations being compared should have the same variance (testable using F test, Levene's test, Bartlett's test, or the Brown–Forsythe test; or assessable graphically using a Q-Q plot). If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances.[7] Welch's t-test is insensitive to equality of the variances regardless of whether the sample sizes are similar.
The data used to carry out the test should be sampled independently from the two populations being compared. This is in general not testable from the data, but if the data are known to be dependently sampled (i.e. if they were sampled in clusters), then the classical t-tests discussed here may give misleading results."

(http://en.wikipedia.org/wiki/Student's_t-test#Assumptions) Keep in mind we are using a two-sample t-test here... you did scroll down right?
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 03:45 GMT
#428
On July 11 2012 16:27 Not_That wrote:
Show nested quote +
On July 11 2012 16:15 Cascade wrote:
On July 11 2012 16:05 Not_That wrote:
On July 11 2012 15:35 Cascade wrote:
On July 11 2012 15:13 Not_That wrote:
On July 11 2012 14:53 Cascade wrote:
On July 11 2012 14:39 Not_That wrote:
MMR distribution by races.
Click for full version.
[image loading]

Amount of players:
2014 Zerg
1784 Protoss
1516 Terran

The server does matter as MMR is non comparable cross servers. I've decided to remove KR and SEA and keep EU and NA as they are closest to each other in terms of MMRs, and that's where most of our data comes from.

Cool! Can you do 100 or even 200 granularity to make it easier to read? :o)
We are not trying to see any structure smaller than 200 MMR anyway.



Here you go:
[image loading]

We tried having % of total players on the y axis. The problem with that is that it doesn't have information regarding the amount of players. The dots at the edges of the graph look very strange, for example 100% of players above 3200 are Protoss. Obviously it's not very useful. We could snip the edges of the graph, but where? How many players are enough? Are 21 players between 2700 and 2750 enough? etc.

Thanks!

I mean % of the zerg players in that bin. That is, (number of zergs in that bin)/(number of zergs total). Just like you have plotted now, only divide all zerg entries with the number of zerg players, etc. Now the zerg plot is higher in mid-range, but it is not clear if that is because a larger fraction of zergs have mid-range MMR, or if there are just more zergs.


Good thinking.

Same graph normalized, each bar representing the percentage of players of each race in the bin:
[image loading]

Nice!

Now just put the error bars back on that plot, and it's perfect! *leaving*


How do I figure out error margins for a graph with granularity?
Fixed colors btw.

Sorry, missed this post...
The error is sqrt(N) in each bin, before normalisation. Then when you rescale, just scale the error with the same factor. Equivalently, the relative error in each bin is 1/sqrt(N). N is the number of entries in that that bin btw.

That way, when you group up bins, you can expect the error to go down a factor 2 if you go from 50 to 200 granularity.

When N gets too low (rule of thumb: it is ok down to N = 20), this error estimate starts becoming a bit shaky, but for a plot like this, it is good enough. Below N = 20, we wont be able to see much anyway I think, so the bin will just say that there is not enough statistics.
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 03:59 GMT
#429
On July 13 2012 12:40 lolcanoe wrote:
Show nested quote +
On July 13 2012 12:15 Jadoreoov wrote:
@lolcanoe

The issue wasn't whether the distribution itself was close to normal at all. It can be the most skewed thing in the world. The issue is that the sample size is very large, so the distribution of the SAMPLING MEAN is approximately normal.

You should scroll down the page you quoted.

"In a specific type of t-test, these conditions are consequences of the population being studied, and of the way in which the data are sampled. For example, in the t-test comparing the means of two independent samples, the following assumptions should be met:
Each of the two populations being compared should follow a normal distribution. This can be tested using a normality test, such as the Shapiro-Wilk or Kolmogorov–Smirnov test, or it can be assessed graphically using a normal quantile plot.
If using Student's original definition of the t-test, the two populations being compared should have the same variance (testable using F test, Levene's test, Bartlett's test, or the Brown–Forsythe test; or assessable graphically using a Q-Q plot). If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances.[7] Welch's t-test is insensitive to equality of the variances regardless of whether the sample sizes are similar.
The data used to carry out the test should be sampled independently from the two populations being compared. This is in general not testable from the data, but if the data are known to be dependently sampled (i.e. if they were sampled in clusters), then the classical t-tests discussed here may give misleading results."

(http://en.wikipedia.org/wiki/Student's_t-test#Assumptions) Keep in mind we are using a two-sample t-test here... you did scroll down right?

No need for that tone imo. We are all working together here as far as I know.

Yes, for these probability calculations to be mathematically accurate, you need normal distributions. But according to central limit theorem, the more you sample any distribution, the more it will look like a normal distribution. The better behaved (ie, normal distribution-like) the distribution is, the faster the convergence. So while these errors are not 100% mathematically accurate, with a distribution that is well behaved like this (no strong tails), and with sample sizes of thousands, they are close enough.
VediVeci
Profile Joined October 2011
United States82 Posts
Last Edited: 2012-07-13 04:05:29
July 13 2012 04:03 GMT
#430
On July 13 2012 08:21 lolcanoe wrote:
Show nested quote +
On July 13 2012 08:13 VediVeci wrote:
Requiring someone to have a college education is a bit of an ivory tower buddy.

I'm not requiring anyone to have anything. My criticisms are objectively based on the analysis and not the source.

There is no ivory tower here. I've proven that my methods can be applied in a statistically coherent and easily understandable way, so your accusations that my suggestions are impractical (or "ivory tower") are pretty moot.



Im not arguing that your methods aren't better, they probably are, (I didn't read your post very closely). You're attacks have been pretty consistently derisive, rude, and especially condescending though, in my opinion. And I know it's not a smoking gun, but his results seem pretty consistent with yours, so he didn't do too poorly.

And I'm glad you have such good insight into how the financial crisis happened and can tell us about it. Now that you're on the case we can rest assured it won't happen again!!

And skeldark, when I say you "manipulated" the data, I don't mean you did anything negative, I just mean you performed a series of calculations or "manipulations" on the data.

Edit: clarity
DwindleFlip
Profile Joined April 2011
United States32 Posts
Last Edited: 2012-07-13 04:27:50
July 13 2012 04:23 GMT
#431
All this talk just to deny the simple truth that terran is in rough shape. Sc2 WOL is abandonware to Blizzard now.




User was temp banned for this post.
lolcanoe
Profile Joined July 2010
United States57 Posts
Last Edited: 2012-07-13 04:56:10
July 13 2012 04:40 GMT
#432
On July 13 2012 13:03 VediVeci wrote:

Im not arguing that your methods aren't better, they probably are, (I didn't read your post very closely). You're attacks have been pretty consistently derisive, rude, and especially condescending though, in my opinion. And I know it's not a smoking gun, but his results seem pretty consistent with yours, so he didn't do too poorly.
Edit: clarity

He had at least a 50% chance of getting it right. I'm going to ignore the rest of the post has to not encourage further irrelevance from posters who self-admittedly don't read things carefully.


On July 13 2012 12:59 Cascade wrote:
Yes, for these probability calculations to be mathematically accurate, you need normal distributions. But according to central limit theorem, the more you sample any distribution, the more it will look like a normal distribution. The better behaved (ie, normal distribution-like) the distribution is, the faster the convergence. So while these errors are not 100% mathematically accurate, with a distribution that is well behaved like this (no strong tails), and with sample sizes of thousands, they are close enough.

Ok, let's separate the statements clearly so I can explain why your explanation is inaccurate and why his is pretty much entirely misplaced. I understand the confusion here because my high school math teacher needed to be corrected on the same misunderstanding.

Imagine a population with a distribution that is skewed in one way or another (not normally distributed). If you take a a sample, and increase the sample size from n in an orderly fashion, what happens? Eventually your sample size is the entire population and your sample distribution and population distribution are unsurprisingly identical! So in this 1 sample situation, the shape of the distribution is dependent on the population being sampled. If the population is normal, and only if it is, the sampling distribution will become increasingly normal as n grows. This idea is pretty intuitive once you imagine a sample size equal that of your population (that's exactly what's going on here). This is why a normality test is important!

The central limit theorem specifically relates to the distribution of sampling means and infinite random samples (which isn't exactly what we have here). The distribution of sampling means does NOT equal the sample distributions themselves, as you have incorrectly equated! It refers to the distribution of the AVERAGE values in each sample, and this distribution becomes increasingly normal, not as the number of samples increase but rather as n, the sampling size, increases. In this regard it makes complete sense (with a formal mathematical proof) why the population distribution tends to be irrespective of the distribution of sampling means!
Please look into http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/stat_workshp/cnt_lim_therm/cnt_lim_therm_02.html
to understand why neither of your posts are accurate and how a completely non-normal distribution can have normally distributed sample means as n increases.

Hopefully, you'll begin to understand how you guys are misapplying CLT!
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 04:42 GMT
#433
On July 13 2012 13:23 DwindleFlip wrote:
All this talk just to deny the simple truth that terran is in rough shape. Sc2 WOL is abandonware to Blizzard now.




User was temp banned for this post.

ahaha, ok guys, we are busted. We can stop all this statistics BS now. You know, the one we make up out of thin air as we type, completely baseless. We got called on the bluff, nothing more to say. Was fun while it lasted. No point in trying to pretend that analyzing data is of any use when we have people like DwindleFlip laying down the simple truth like a B40UwwwwzzZZZzz!!!11oneone
SeAK
Profile Joined September 2010
Canada69 Posts
July 13 2012 05:30 GMT
#434
Its always easier to rip something apart then it is to build something... kinda like what I just did
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 05:34 GMT
#435
On July 13 2012 13:40 lolcanoe wrote:
Show nested quote +
On July 13 2012 13:03 VediVeci wrote:

Im not arguing that your methods aren't better, they probably are, (I didn't read your post very closely). You're attacks have been pretty consistently derisive, rude, and especially condescending though, in my opinion. And I know it's not a smoking gun, but his results seem pretty consistent with yours, so he didn't do too poorly.
Edit: clarity

He had at least a 50% chance of getting it right. I'm going to ignore the rest of the post has to not encourage further irrelevance from posters who self-admittedly don't read things carefully.


Show nested quote +
On July 13 2012 12:59 Cascade wrote:
Yes, for these probability calculations to be mathematically accurate, you need normal distributions. But according to central limit theorem, the more you sample any distribution, the more it will look like a normal distribution. The better behaved (ie, normal distribution-like) the distribution is, the faster the convergence. So while these errors are not 100% mathematically accurate, with a distribution that is well behaved like this (no strong tails), and with sample sizes of thousands, they are close enough.

Ok, let's separate the statements clearly so I can explain why your explanation is inaccurate and why his is pretty much entirely misplaced. I understand the confusion here because my high school math needed to be corrected on the same misunderstanding.

Imagine a population with a distribution that is skewed in one way or another (not normally distributed). If you take a a sample, and increase the sample size from n in an orderly fashion, what happens? Eventually your sample size is entire population and your sample distribution and population distribution unsurprisingly identical! So in this 1 sample situation, the shape of the distribution is dependent on the population being sampled. If the population is normal, and only if it is, the sampling distribution will become increasingly normal as n grows. This idea is pretty intuitive once you imagine a sample size equal that of your population.(that's exactly what's going on here). This is why a normality test is important!

The central limit theorem specifically relates to the distribution of sampling means and infinite random samples (which isn't exactly what we have here). The distribution of sampling means does NOT equal the sample distributions themselves! It refers to the distribution of the AVERAGE values in each sample, and this distribution becomes increasingly normal, not as the number of samples increase but rather as n, the sampling size, increases. In this regard it makes complete sense (with a formal mathematical proof) why the population distribution tends to be irrespective of the distribution of sampling means!
Please look into http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/stat_workshp/cnt_lim_therm/cnt_lim_therm_02.html
to understand why neither of your posts are accurate and how a completely non-normal distribution can have normally distributed sample means as n increases.

You guys are misapplying CLT!

Ok, let me prove it for you then.
My claim is that if the set of samples is large enough, we can use the normal distribution with S/sqrt(N) width to estimate the errors. For simplicity, let me prove that the 2*S/sqrt(N) interval is close to 95%:

Let the distribution f(x) have an average 0 and standard deviation S. An average X from a sufficiently large (specified in the proof) set of N samples from f(x) will fall within 2*S/sqrt(N) of the average 0 with a probability between 0.93 and 0.97.
proof:
Calculating the average x from N samples (from many different sets, each of N samples) will give a distribution of averages A_N(x) that approaches a normal distribution as N goes to infinity, centred around 0, and with a width of S/sqrt(N). This is the CLT.

Specify "sufficiently large N" such that A_N(x) is similar to a normal distribution g(x) of width S/sqrt(N). Close enough so that the integral from -2*S/sqrt(N) to 2*S/sqrt(N) is between 0.97 and 0.93 (it is close to 0.95 for g). As A_N approaches g as N-->infty, this will happen for some N. The more similar f(x) is to a normal distribution, the lower N is required.

Now take a single average X from f(x), using N samples (this would be the OP). This average is distributed according to A_N(x), and with a sufficiently large N, the probability that X is between -2*S/sqrt(N) and 2*S/sqrt(N) is larger than 0.93, and smaller than 0.97. QED.

Then at what N it reaches "sufficiently large" is a trickier matter. But I am personally convinced (from experience) that with the well behaved distribution of MMR we see, and with thousands of samples, the errors are accurate enough so that the conclusion stands. Ie, that there is a significant signal that the terran MMR is lower than the zerg MMR. Due to the finite (aawwwww ) sample size there is little point in claiming confidence levels of exactly 0.99957353526452, but if this method gives a confidence level of 99.9% I think it is safe to say that you are more than 99% sure. This would also include other errors, such as correlations in the sample (as I was nagging about earlier).
skeldark
Profile Joined April 2010
Germany2223 Posts
Last Edited: 2012-07-13 07:25:42
July 13 2012 06:58 GMT
#436
Discussion:
I think its time to forget the past and start new again.
Most of us did not behaviour in the past like they should have ( me included)
After we do all agree on the main points we can let the personal stuff aside.


On July 13 2012 12:38 Cascade wrote:
- However, it does assume that the samples are uncorrelated. OP, you said that you removed duplicates from the list, but do you think there can be other correlations in the list of samples? You probably know best exactly what is in the list. If there are still correlations, it means that the error should be larger than what you get from a central limit analysis. But it seems like the (small) signal will still be significant, even if the error is increased a bit. Hopefully there shouldn't be large correlations in there?



Duplicates
-I can 100% guarantee that there are no duplicated accounts

The profile list is generated backwards ( last upload game first ) and filtered by:
- The mmr of the account is valid
- The race of the player is known
- The player is not a random player
- The account is not already in the list

In fact there is a mistake that i exclude data unnecessary:
i forgot that the id is only unique for an server and i only check for id not for server+id

Other correlations:
Only thing i can think of is that the users-mmr and the opponent-mmr is analysed in total different way.
And the analyser for the opponent take the result of the player into account
I can mark witch data value is userdata and witch is opponent data
Also all opponents of one player are obvious not far away from each other.
I can also mark witch opponent values are submitted by the same user.

Beside this the analyse and collection of the mmr is very complicated
I can not guarantee that i dont have any structural mistakes at some place that could create correlations
But at the moment i dont see such an factor.

Data
I can add some useful information to the profile list and publish it again
What i think of is:
-Time the game was played ( this is sadly user time not server time. i should fix this in the long term)
- An id of the user that submitted the data
- An id of the account that is shown
- mark if the data comes from a user or an opponent
- mainrace of the account +the race of the account in the last game he played
Anything else?


High mmr cap:
I have some more arguments but its offtopic and i just wake up.
Let us leave this topic for now and perhaps catch up on it later.

Also a special side note: the ladder isn't 20/20/20/20/18/2 anymore. There were some offset corrections and I don't know the new targeted distribution, but I would say conservatively it's closer to 20/20/20/20/16/4. I don't expect Blizzard to release the new target values.

Total agree with this. The data move away from normal slowly and they try to correct with offsets. However i have the feeling they decided not to do so anymore because they dont want to create demotion/promotion waves. On the other hand they could do so at session start and obvious did not with start of season 8. Example the platin offsets are not equal to silver what should be the case if the data is normal. So they corrected with this offsets towards 20/20... already.




Save gaming: kill esport
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 07:23 GMT
#437
Sure, add all the data you can think off.

I think a more interesting analysis can be made from the list of games though. Although there we will REALLY have to think of the systematics, as each player submits many games, and what if a player that is really good at say PvZ submits 30 games? That is for another thread though.

Do you think it is a problem that the samples are weighted by activity? Ie, if (X level) terrans feel frustrated and play less, they will face your users less often, and be less represented in the statistics (at X level). What we measure is actually not only MMR as a flat average over all players, but an average weighted by their current activity.

Otherwise I'm not sure there is much more I have to say. Doing measurement of single leagues (intervals in MMR) doesn't really make sense, as it would only measure the difference in slope of the distribution for the different races. Also I won't have much access to internet over the weekend.

cheers
skeldark
Profile Joined April 2010
Germany2223 Posts
Last Edited: 2012-07-13 07:34:49
July 13 2012 07:31 GMT
#438
On July 13 2012 16:23 Cascade wrote:
Sure, add all the data you can think off.

I think a more interesting analysis can be made from the list of games though. Although there we will REALLY have to think of the systematics, as each player submits many games, and what if a player that is really good at say PvZ submits 30 games? That is for another thread though.

Do you think it is a problem that the samples are weighted by activity? Ie, if (X level) terrans feel frustrated and play less, they will face your users less often, and be less represented in the statistics (at X level). What we measure is actually not only MMR as a flat average over all players, but an average weighted by their current activity.

cheers


That is true.
I already notice when i try to collect division data, that i see the same division all the time because the first players of new season create them and this are the guys who play all the time.
The active userbase is way smaller than the total userbase and the very small very active userbase create alone most of the games.
It could get a problem if you make the time interval shorter.
But i have a feeling this is again a definition of balance. If good players of one race stop playing is this an balance indicator?



Otherwise I'm not sure there is much more I have to say. Doing measurement of single leagues (intervals in MMR) doesn't really make sense, as it would only measure the difference in slope of the distribution for the different races. Also I won't have much access to internet over the weekend.

But the difference in slope of the distribution for the different races in different mmr intervals is a interesting fact too.


The total gamedata is published in my MMR-Tool thread.
I will update it soon with the race data and the game length.

Save gaming: kill esport
Thrombozyt
Profile Blog Joined June 2010
Germany1269 Posts
July 13 2012 08:31 GMT
#439
On July 13 2012 10:23 Jadoreoov wrote:
First off I'd like to point out that the normality of the data doesn't really matter because of the Central Limit Theorem, so please stop discussing that like it matters.

Continuing with lolcanoe's analysis, I found the 99% confidence intervals for the difference in mean for each group.

US and EU:
ZvT
(51.5, 118.8)
PvT
(28.9, 99.6)
ZvP
(-11.1, 53.2)


On July 13 2012 11:20 Jadoreoov wrote:
Done:

95% confidence intervals for the EU and US combined:
ZvT:
(59.5, 110.7)
PvT
(37.3, 91.2)
ZvP
(-3.7, 45.5)

US vs EU
(28.5, 70.5)


Shouldn't the interval in which the mean can fall become larger as you lower your level of confidence?
skeldark
Profile Joined April 2010
Germany2223 Posts
July 13 2012 09:04 GMT
#440
UPDATE

Games & Player:
datafile
Save gaming: kill esport
Prev 1 20 21 22 23 24 26 Next All
Please log in or register to reply.
Live Events Refresh
RSL Revival
10:00
Season 4: Group D
ByuN vs SHIN
Maru vs Krystianer
Tasteless1127
IndyStarCraft 194
Rex120
LiquipediaDiscussion
Sparkling Tuna Cup
10:00
Weekly #123
Shameless vs YoungYakovLIVE!
Creator vs TBD
CranKy Ducklings89
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Tasteless 1127
IndyStarCraft 194
Rex 120
StarCraft: Brood War
Sea 47812
Calm 13812
Horang2 2681
GuemChi 1913
Jaedong 963
BeSt 866
actioN 463
Mini 199
Last 191
Soma 191
[ Show more ]
EffOrt 148
Rush 147
Mind 107
Dewaltoss 103
ToSsGirL 81
ZerO 74
Backho 65
Hm[arnc] 65
sorry 53
JulyZerg 39
Barracks 33
IntoTheRainbow 30
HiyA 26
GoRush 22
Sea.KH 20
ivOry 13
SilentControl 10
ajuk12(nOOB) 7
Dota 2
Gorgc3830
XaKoH 539
XcaliburYe129
League of Legends
JimRising 536
Counter-Strike
zeus441
byalli401
Super Smash Bros
Mew2King83
Heroes of the Storm
Khaldor232
MindelVK7
Other Games
B2W.Neo1227
Fuzer 176
ZerO(Twitch)15
Organizations
Dota 2
PGL Dota 2 - Main Stream21639
Other Games
gamesdonequick828
ComeBackTV 294
StarCraft: Brood War
lovetv 24
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 13 non-featured ]
StarCraft 2
• 3DClanTV 68
• CranKy Ducklings SOOP4
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• C_a_k_e 1647
Upcoming Events
WardiTV Team League
32m
Patches Events
5h 32m
BSL
8h 32m
GSL
20h 32m
Wardi Open
1d
Monday Night Weeklies
1d 5h
WardiTV Team League
2 days
PiGosaur Cup
2 days
Kung Fu Cup
2 days
OSC
3 days
[ Show More ]
The PondCast
3 days
KCM Race Survival
3 days
WardiTV Team League
4 days
Replay Cast
4 days
KCM Race Survival
4 days
WardiTV Team League
5 days
Korean StarCraft League
5 days
uThermal 2v2 Circuit
6 days
BSL
6 days
Liquipedia Results

Completed

Proleague 2026-03-13
WardiTV Winter 2026
Underdog Cup #3

Ongoing

KCM Race Survival 2026 Season 1
Jeongseon Sooper Cup
BSL Season 22
RSL Revival: Season 4
Nations Cup 2026
ESL Pro League S23 Finals
ESL Pro League S23 Stage 1&2
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter Qual

Upcoming

CSL Elite League 2026
ASL Season 21
Acropolis #4 - TS6
2026 Changsha Offline CUP
Acropolis #4
IPSL Spring 2026
CSLAN 4
Kung Fu Cup 2026 Grand Finals
HSC XXIX
uThermal 2v2 2026 Main Event
NationLESS Cup
Stake Ranked Episode 2
CS Asia Championships 2026
IEM Atlanta 2026
Asian Champions League 2026
PGL Astana 2026
BLAST Rivals Spring 2026
CCT Season 3 Global Finals
IEM Rio 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.