• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 05:53
CEST 11:53
KST 18:53
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
Team TLMC #5 - Finalists & Open Tournaments0[ASL20] Ro16 Preview Pt2: Turbulence10Classic Games #3: Rogue vs Serral at BlizzCon9[ASL20] Ro16 Preview Pt1: Ascent10Maestros of the Game: Week 1/Play-in Preview12
Community News
BSL 2025 Warsaw LAN + Legends Showmatch0Weekly Cups (Sept 8-14): herO & MaxPax split cups4WardiTV TL Team Map Contest #5 Tournaments1SC4ALL $6,000 Open LAN in Philadelphia8Weekly Cups (Sept 1-7): MaxPax rebounds & Clem saga continues29
StarCraft 2
General
#1: Maru - Greatest Players of All Time Weekly Cups (Sept 8-14): herO & MaxPax split cups Team Liquid Map Contest #21 - Presented by Monster Energy SpeCial on The Tasteless Podcast Team TLMC #5 - Finalists & Open Tournaments
Tourneys
Maestros of The Game—$20k event w/ live finals in Paris Sparkling Tuna Cup - Weekly Open Tournament SC4ALL $6,000 Open LAN in Philadelphia WardiTV TL Team Map Contest #5 Tournaments RSL: Revival, a new crowdfunded tournament series
Strategy
Custom Maps
External Content
Mutation # 491 Night Drive Mutation # 490 Masters of Midnight Mutation # 489 Bannable Offense Mutation # 488 What Goes Around
Brood War
General
Soulkey on ASL S20 ASL TICKET LIVE help! :D BW General Discussion NaDa's Body A cwal.gg Extension - Easily keep track of anyone
Tourneys
[ASL20] Ro16 Group D [ASL20] Ro16 Group C [Megathread] Daily Proleagues BSL 2025 Warsaw LAN + Legends Showmatch
Strategy
Simple Questions, Simple Answers Muta micro map competition Fighting Spirit mining rates [G] Mineral Boosting
Other Games
General Games
Stormgate/Frost Giant Megathread Nintendo Switch Thread Path of Exile Borderlands 3 General RTS Discussion Thread
Dota 2
Official 'what is Dota anymore' discussion LiquidDota to reintegrate into TL.net
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread
Community
General
US Politics Mega-thread Canadian Politics Mega-thread Things Aren’t Peaceful in Palestine Russo-Ukrainian War Thread UK Politics Mega-thread
Fan Clubs
The Happy Fan Club!
Media & Entertainment
Movie Discussion! [Manga] One Piece Anime Discussion Thread
Sports
2024 - 2026 Football Thread Formula 1 Discussion MLB/Baseball 2023
World Cup 2022
Tech Support
Linksys AE2500 USB WIFI keeps disconnecting Computer Build, Upgrade & Buying Resource Thread High temperatures on bridge(s)
TL Community
BarCraft in Tokyo Japan for ASL Season5 Final The Automated Ban List
Blogs
I <=> 9
KrillinFromwales
The Personality of a Spender…
TrAiDoS
A very expensive lesson on ma…
Garnet
hello world
radishsoup
Lemme tell you a thing o…
JoinTheRain
RTS Design in Hypercoven
a11
Evil Gacha Games and the…
ffswowsucks
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1605 users

Ladder-Balance-Data - Page 22

Forum Index > SC2 General
Post a Reply
Prev 1 20 21 22 23 24 26 Next All
hunts
Profile Joined September 2010
United States2113 Posts
July 13 2012 01:28 GMT
#421
On July 13 2012 10:23 Jadoreoov wrote:
First off I'd like to point out that the normality of the data doesn't really matter because of the Central Limit Theorem, so please stop discussing that like it matters.

Continuing with lolcanoe's analysis, I found the 99% confidence intervals for the difference in mean for each group.

Race Results
+ Show Spoiler +
For US:
ZvT
(62.0, 164.6)
PvT
(8.9, 115.0)
ZvP
(3.3, 99.4)

For EU:
ZvT
(19.6, 108.6)
PvT
(18.3, 113.2)
ZvP
(-45.3, 42.0)

US and EU:
ZvT
(51.5, 118.8)
PvT
(28.9, 99.6)
ZvP
(-11.1, 53.2)


As for US vs EU, the 99% confidence interval for the mean difference in MMR is:
(21.9, 77.1)

For each interval a positive difference indicates the mean of the first population is higher than the second, so for US vs EU it reads, 99% of such samplings will yield a result such that the mean MMR of the US player base is between 21.9 and 77.1 MMR higher than that of the EU player base.

The meaning of a 99% confidence interval for the mean is as follows:
If we were to randomly pick samples of the same size* from each population and found the difference of the means between the groups, 99% of such samplings would result in a difference of means within the given interval.

*By same size I mean the same sizes as were sampled to construct the interval, so if the interval were constructed by sampling 10 Zergs and 15 Protosses, it would be random samples of 10 and 15, respectively.

I've provided the MATLAB code I used for the analysis if anyone can run it and wants to do analysis on future data:

Helper Function
+ Show Spoiler +
function [lower,upper] = findInterval(pop1,pop2,confidence)
mu1 = mean(pop1);
mu2 = mean(pop2);
s1 = std(pop1,1);
s2 = std(pop2,1);
n1 = length(pop1);
n2 = length(pop2);
diff = mu1-mu2;
df = (s1^2/n1 + s2^2/n2)^2/((s1^2/n1)^2/(n1-1)+(s2^2/n2)^2/(n2-1));
tcrit = tinv(1-(1-confidence)/2,df);
s = sqrt(s1^2/n1 + s2^2/n2);
halfrange = tcrit*sqrt(s1^2/n1 + s2^2/n2);
lower = diff-halfrange;
upper = diff+halfrange;
end


Main script
+ Show Spoiler +
%script for calculating balance

%get data from file (would be ez if OP hadn't put quotes in the .csv, BAD!)
fid = fopen('balance.csv');
str = char(fread(fid))';
fclose(fid);

omitFirstLine = '(?<=\n).*';
stripped = str( regexp(str,omitFirstLine):end ); %strip first line
rawdata = textscan(stripped, '%s %s %d', 'delimiter',' \t\n,"',...
'MultipleDelimsAsOne', 1);

%define some constants (not saying protoss #1)
protoss=1;
zerg=2;
terran=3;
US = 1;
EU = 2;

%combine into one big array
col = length(rawdata{3});
data = zeros(col, 3);
data(:,3) = rawdata{3};
for i=1:col
if ( rawdata{1}{i}(1) == 'U')
data(i,1) = US;
else
data(i,1) = EU;
end

if ( rawdata{2}{i} == 'z')
data(i,2) = zerg;
elseif ( rawdata{2}{i} == 'p')
data(i,2) = protoss;
else
data(i,2) = terran;
end
end

%define filters
tF = data(:,2) == terran;
pF = data(:,2) == protoss;
zF = data(:,2) == zerg;
uF = data(:,1) == US;
eF = data(:,1) == EU;

%construct the 99% confidence intervals based on a two-sided t-test
%zerg vs protoss
confidence = 0.99;
place = eF | uF; %lets you quickly change if US,EU, or both (uF | eF)
[zpLower,zpUpper] = findInterval( data(zF & place,3), data(pF & place,3),confidence);
[ztLower,ztUpper] = findInterval( data(zF & place,3), data(tF & place,3),confidence );
[tpLower,tpUpper] = findInterval( data(tF & place,3), data(pF & place,3),confidence );
[UsEuLower,UsEuUpper] = findInterval( data(uF,3), data(eF,3), confidence);



Nice work, though it might be nice to narrow it down to a 95% CI to get a slightly better measurement I think. I'm too lazy to do it though :D
twitch.tv/huntstv 7x legend streamer
Jadoreoov
Profile Joined December 2009
United States76 Posts
July 13 2012 02:20 GMT
#422
Done:

95% confidence intervals for the EU and US combined:
ZvT:
(59.5, 110.7)
PvT
(37.3, 91.2)
ZvP
(-3.7, 45.5)

US vs EU
(28.5, 70.5)
lolcanoe
Profile Joined July 2010
United States57 Posts
Last Edited: 2012-07-13 02:46:42
July 13 2012 02:45 GMT
#423
On July 13 2012 10:23 Jadoreoov wrote:
First off I'd like to point out that the normality of the data doesn't really matter because of the Central Limit Theorem, so please stop discussing that like it matters.

No. No. No. No....More misinformation. Normal distributions are indeed pretty prevalent in the real world, and the central limit theorem is a good rule of thumb, but its these sorts of assumptions that have lost certain financial entities billions as well.

Take stock prices returns - approximately normal - but with a fat left-tail. If you used a normal distribution you would severely undervalue the possibility of total disaster and hence under-price risk. Hence, returns are best modeled with a modified distribution to account for the extremities. Or waiting times in a queue, where you have a very long right tail but a distinctly left weighted distribution (think about it, you have a minimum of 0, but a max of infinity, with a peak that is much closer to left than right).

Most of all, we are dealing with an entirely man-made distribution here. If you counted by league only, you'd have 20 20 20 20 20 EVENLY distributed. For MMR, the way the curve shaped is ENTIRELY shaped by modeling software. If Blizzard wanted to they could create a distribution of any type. With our data we can only guess the distribution and approximate our statistics under reasonable normal guidelines (after establishing that normality is a possible model).

Hope this makes sense, and I really encourage you to keep this in mind, especially if you ever plan to work on Wall Street in your life time.
Excalibur_Z
Profile Joined October 2002
United States12237 Posts
Last Edited: 2012-07-13 02:58:35
July 13 2012 02:53 GMT
#424
Yes, the MMR cap exists. A floor likely also exists.

Don't get defensive when other community members demand more thorough data or a stronger analysis. Understanding the ladder is a communal effort. lolcanoe and Lysenko bring up salient points that should be addressed in order to produce more concrete hypotheses, even if this means refuting existing hypotheses.

We call the reverse-engineered values (points -> adj.pts -> adj.pts with offsets removed) "MMR" because that's the closest representation of MMR we have. We know that the "actual" hidden MMR factors in an uncertainty value when determining the degree of change after a match, but it's unlikely that will ever be deciphered.

The league and division offsets used by the MMR tool are not exact, but they're somewhat close. Still, this introduces a margin of error. This is probably mitigated by the volume of data, and even the relatively arbitrary values that are calculated can be used when compared to each other for the purposes of gauging race balance, because the margin of error applies universally to each race and matchup.

One thing I want to be very careful about is considering any part of this interpretation as "final" data. Every other person who has posted theories about how the ladder works in the past has fallen into the same trap of interpreting his data incorrectly until it fits his conclusions, so it's important we don't repeat that mistake. The data must remain impartial. The only additional information we have about the ladder comes from Josh himself.

Also a special side note: the ladder isn't 20/20/20/20/18/2 anymore. There were some offset corrections and I don't know the new targeted distribution, but I would say conservatively it's closer to 20/20/20/20/16/4. I don't expect Blizzard to release the new target values.
Moderator
Jadoreoov
Profile Joined December 2009
United States76 Posts
July 13 2012 03:15 GMT
#425
@lolcanoe

The issue wasn't whether the distribution itself was close to normal at all. It can be the most skewed thing in the world. The issue is that the sample size is very large, so the distribution of the SAMPLING MEAN is approximately normal.

In probability theory, the central limit theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.


The students t-test assumes that the distribution of the sampling mean is approximately normal, but makes no assumptions regarding the underlying distribution of the data itself.
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 03:38 GMT
#426
Oh, it's nice that you guys are redoing what I did back at page 10, but now with more statistics.

- Yes, I think we have enough statistics, and the distribution is well behaved enough so that central limit theorem will give a sufficiently accurate estimate of the statistical error.

- However, it does assume that the samples are uncorrelated. OP, you said that you removed duplicates from the list, but do you think there can be other correlations in the list of samples? You probably know best exactly what is in the list. If there are still correlations, it means that the error should be larger than what you get from a central limit analysis. But it seems like the (small) signal will still be significant, even if the error is increased a bit. Hopefully there shouldn't be large correlations in there?
lolcanoe
Profile Joined July 2010
United States57 Posts
July 13 2012 03:40 GMT
#427
On July 13 2012 12:15 Jadoreoov wrote:
@lolcanoe

The issue wasn't whether the distribution itself was close to normal at all. It can be the most skewed thing in the world. The issue is that the sample size is very large, so the distribution of the SAMPLING MEAN is approximately normal.

You should scroll down the page you quoted.

"In a specific type of t-test, these conditions are consequences of the population being studied, and of the way in which the data are sampled. For example, in the t-test comparing the means of two independent samples, the following assumptions should be met:
Each of the two populations being compared should follow a normal distribution. This can be tested using a normality test, such as the Shapiro-Wilk or Kolmogorov–Smirnov test, or it can be assessed graphically using a normal quantile plot.
If using Student's original definition of the t-test, the two populations being compared should have the same variance (testable using F test, Levene's test, Bartlett's test, or the Brown–Forsythe test; or assessable graphically using a Q-Q plot). If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances.[7] Welch's t-test is insensitive to equality of the variances regardless of whether the sample sizes are similar.
The data used to carry out the test should be sampled independently from the two populations being compared. This is in general not testable from the data, but if the data are known to be dependently sampled (i.e. if they were sampled in clusters), then the classical t-tests discussed here may give misleading results."

(http://en.wikipedia.org/wiki/Student's_t-test#Assumptions) Keep in mind we are using a two-sample t-test here... you did scroll down right?
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 03:45 GMT
#428
On July 11 2012 16:27 Not_That wrote:
Show nested quote +
On July 11 2012 16:15 Cascade wrote:
On July 11 2012 16:05 Not_That wrote:
On July 11 2012 15:35 Cascade wrote:
On July 11 2012 15:13 Not_That wrote:
On July 11 2012 14:53 Cascade wrote:
On July 11 2012 14:39 Not_That wrote:
MMR distribution by races.
Click for full version.
[image loading]

Amount of players:
2014 Zerg
1784 Protoss
1516 Terran

The server does matter as MMR is non comparable cross servers. I've decided to remove KR and SEA and keep EU and NA as they are closest to each other in terms of MMRs, and that's where most of our data comes from.

Cool! Can you do 100 or even 200 granularity to make it easier to read? :o)
We are not trying to see any structure smaller than 200 MMR anyway.



Here you go:
[image loading]

We tried having % of total players on the y axis. The problem with that is that it doesn't have information regarding the amount of players. The dots at the edges of the graph look very strange, for example 100% of players above 3200 are Protoss. Obviously it's not very useful. We could snip the edges of the graph, but where? How many players are enough? Are 21 players between 2700 and 2750 enough? etc.

Thanks!

I mean % of the zerg players in that bin. That is, (number of zergs in that bin)/(number of zergs total). Just like you have plotted now, only divide all zerg entries with the number of zerg players, etc. Now the zerg plot is higher in mid-range, but it is not clear if that is because a larger fraction of zergs have mid-range MMR, or if there are just more zergs.


Good thinking.

Same graph normalized, each bar representing the percentage of players of each race in the bin:
[image loading]

Nice!

Now just put the error bars back on that plot, and it's perfect! *leaving*


How do I figure out error margins for a graph with granularity?
Fixed colors btw.

Sorry, missed this post...
The error is sqrt(N) in each bin, before normalisation. Then when you rescale, just scale the error with the same factor. Equivalently, the relative error in each bin is 1/sqrt(N). N is the number of entries in that that bin btw.

That way, when you group up bins, you can expect the error to go down a factor 2 if you go from 50 to 200 granularity.

When N gets too low (rule of thumb: it is ok down to N = 20), this error estimate starts becoming a bit shaky, but for a plot like this, it is good enough. Below N = 20, we wont be able to see much anyway I think, so the bin will just say that there is not enough statistics.
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 03:59 GMT
#429
On July 13 2012 12:40 lolcanoe wrote:
Show nested quote +
On July 13 2012 12:15 Jadoreoov wrote:
@lolcanoe

The issue wasn't whether the distribution itself was close to normal at all. It can be the most skewed thing in the world. The issue is that the sample size is very large, so the distribution of the SAMPLING MEAN is approximately normal.

You should scroll down the page you quoted.

"In a specific type of t-test, these conditions are consequences of the population being studied, and of the way in which the data are sampled. For example, in the t-test comparing the means of two independent samples, the following assumptions should be met:
Each of the two populations being compared should follow a normal distribution. This can be tested using a normality test, such as the Shapiro-Wilk or Kolmogorov–Smirnov test, or it can be assessed graphically using a normal quantile plot.
If using Student's original definition of the t-test, the two populations being compared should have the same variance (testable using F test, Levene's test, Bartlett's test, or the Brown–Forsythe test; or assessable graphically using a Q-Q plot). If the sample sizes in the two groups being compared are equal, Student's original t-test is highly robust to the presence of unequal variances.[7] Welch's t-test is insensitive to equality of the variances regardless of whether the sample sizes are similar.
The data used to carry out the test should be sampled independently from the two populations being compared. This is in general not testable from the data, but if the data are known to be dependently sampled (i.e. if they were sampled in clusters), then the classical t-tests discussed here may give misleading results."

(http://en.wikipedia.org/wiki/Student's_t-test#Assumptions) Keep in mind we are using a two-sample t-test here... you did scroll down right?

No need for that tone imo. We are all working together here as far as I know.

Yes, for these probability calculations to be mathematically accurate, you need normal distributions. But according to central limit theorem, the more you sample any distribution, the more it will look like a normal distribution. The better behaved (ie, normal distribution-like) the distribution is, the faster the convergence. So while these errors are not 100% mathematically accurate, with a distribution that is well behaved like this (no strong tails), and with sample sizes of thousands, they are close enough.
VediVeci
Profile Joined October 2011
United States82 Posts
Last Edited: 2012-07-13 04:05:29
July 13 2012 04:03 GMT
#430
On July 13 2012 08:21 lolcanoe wrote:
Show nested quote +
On July 13 2012 08:13 VediVeci wrote:
Requiring someone to have a college education is a bit of an ivory tower buddy.

I'm not requiring anyone to have anything. My criticisms are objectively based on the analysis and not the source.

There is no ivory tower here. I've proven that my methods can be applied in a statistically coherent and easily understandable way, so your accusations that my suggestions are impractical (or "ivory tower") are pretty moot.



Im not arguing that your methods aren't better, they probably are, (I didn't read your post very closely). You're attacks have been pretty consistently derisive, rude, and especially condescending though, in my opinion. And I know it's not a smoking gun, but his results seem pretty consistent with yours, so he didn't do too poorly.

And I'm glad you have such good insight into how the financial crisis happened and can tell us about it. Now that you're on the case we can rest assured it won't happen again!!

And skeldark, when I say you "manipulated" the data, I don't mean you did anything negative, I just mean you performed a series of calculations or "manipulations" on the data.

Edit: clarity
DwindleFlip
Profile Joined April 2011
United States32 Posts
Last Edited: 2012-07-13 04:27:50
July 13 2012 04:23 GMT
#431
All this talk just to deny the simple truth that terran is in rough shape. Sc2 WOL is abandonware to Blizzard now.




User was temp banned for this post.
lolcanoe
Profile Joined July 2010
United States57 Posts
Last Edited: 2012-07-13 04:56:10
July 13 2012 04:40 GMT
#432
On July 13 2012 13:03 VediVeci wrote:

Im not arguing that your methods aren't better, they probably are, (I didn't read your post very closely). You're attacks have been pretty consistently derisive, rude, and especially condescending though, in my opinion. And I know it's not a smoking gun, but his results seem pretty consistent with yours, so he didn't do too poorly.
Edit: clarity

He had at least a 50% chance of getting it right. I'm going to ignore the rest of the post has to not encourage further irrelevance from posters who self-admittedly don't read things carefully.


On July 13 2012 12:59 Cascade wrote:
Yes, for these probability calculations to be mathematically accurate, you need normal distributions. But according to central limit theorem, the more you sample any distribution, the more it will look like a normal distribution. The better behaved (ie, normal distribution-like) the distribution is, the faster the convergence. So while these errors are not 100% mathematically accurate, with a distribution that is well behaved like this (no strong tails), and with sample sizes of thousands, they are close enough.

Ok, let's separate the statements clearly so I can explain why your explanation is inaccurate and why his is pretty much entirely misplaced. I understand the confusion here because my high school math teacher needed to be corrected on the same misunderstanding.

Imagine a population with a distribution that is skewed in one way or another (not normally distributed). If you take a a sample, and increase the sample size from n in an orderly fashion, what happens? Eventually your sample size is the entire population and your sample distribution and population distribution are unsurprisingly identical! So in this 1 sample situation, the shape of the distribution is dependent on the population being sampled. If the population is normal, and only if it is, the sampling distribution will become increasingly normal as n grows. This idea is pretty intuitive once you imagine a sample size equal that of your population (that's exactly what's going on here). This is why a normality test is important!

The central limit theorem specifically relates to the distribution of sampling means and infinite random samples (which isn't exactly what we have here). The distribution of sampling means does NOT equal the sample distributions themselves, as you have incorrectly equated! It refers to the distribution of the AVERAGE values in each sample, and this distribution becomes increasingly normal, not as the number of samples increase but rather as n, the sampling size, increases. In this regard it makes complete sense (with a formal mathematical proof) why the population distribution tends to be irrespective of the distribution of sampling means!
Please look into http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/stat_workshp/cnt_lim_therm/cnt_lim_therm_02.html
to understand why neither of your posts are accurate and how a completely non-normal distribution can have normally distributed sample means as n increases.

Hopefully, you'll begin to understand how you guys are misapplying CLT!
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 04:42 GMT
#433
On July 13 2012 13:23 DwindleFlip wrote:
All this talk just to deny the simple truth that terran is in rough shape. Sc2 WOL is abandonware to Blizzard now.




User was temp banned for this post.

ahaha, ok guys, we are busted. We can stop all this statistics BS now. You know, the one we make up out of thin air as we type, completely baseless. We got called on the bluff, nothing more to say. Was fun while it lasted. No point in trying to pretend that analyzing data is of any use when we have people like DwindleFlip laying down the simple truth like a B40UwwwwzzZZZzz!!!11oneone
SeAK
Profile Joined September 2010
Canada69 Posts
July 13 2012 05:30 GMT
#434
Its always easier to rip something apart then it is to build something... kinda like what I just did
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 05:34 GMT
#435
On July 13 2012 13:40 lolcanoe wrote:
Show nested quote +
On July 13 2012 13:03 VediVeci wrote:

Im not arguing that your methods aren't better, they probably are, (I didn't read your post very closely). You're attacks have been pretty consistently derisive, rude, and especially condescending though, in my opinion. And I know it's not a smoking gun, but his results seem pretty consistent with yours, so he didn't do too poorly.
Edit: clarity

He had at least a 50% chance of getting it right. I'm going to ignore the rest of the post has to not encourage further irrelevance from posters who self-admittedly don't read things carefully.


Show nested quote +
On July 13 2012 12:59 Cascade wrote:
Yes, for these probability calculations to be mathematically accurate, you need normal distributions. But according to central limit theorem, the more you sample any distribution, the more it will look like a normal distribution. The better behaved (ie, normal distribution-like) the distribution is, the faster the convergence. So while these errors are not 100% mathematically accurate, with a distribution that is well behaved like this (no strong tails), and with sample sizes of thousands, they are close enough.

Ok, let's separate the statements clearly so I can explain why your explanation is inaccurate and why his is pretty much entirely misplaced. I understand the confusion here because my high school math needed to be corrected on the same misunderstanding.

Imagine a population with a distribution that is skewed in one way or another (not normally distributed). If you take a a sample, and increase the sample size from n in an orderly fashion, what happens? Eventually your sample size is entire population and your sample distribution and population distribution unsurprisingly identical! So in this 1 sample situation, the shape of the distribution is dependent on the population being sampled. If the population is normal, and only if it is, the sampling distribution will become increasingly normal as n grows. This idea is pretty intuitive once you imagine a sample size equal that of your population.(that's exactly what's going on here). This is why a normality test is important!

The central limit theorem specifically relates to the distribution of sampling means and infinite random samples (which isn't exactly what we have here). The distribution of sampling means does NOT equal the sample distributions themselves! It refers to the distribution of the AVERAGE values in each sample, and this distribution becomes increasingly normal, not as the number of samples increase but rather as n, the sampling size, increases. In this regard it makes complete sense (with a formal mathematical proof) why the population distribution tends to be irrespective of the distribution of sampling means!
Please look into http://www.wadsworth.com/psychology_d/templates/student_resources/workshops/stat_workshp/cnt_lim_therm/cnt_lim_therm_02.html
to understand why neither of your posts are accurate and how a completely non-normal distribution can have normally distributed sample means as n increases.

You guys are misapplying CLT!

Ok, let me prove it for you then.
My claim is that if the set of samples is large enough, we can use the normal distribution with S/sqrt(N) width to estimate the errors. For simplicity, let me prove that the 2*S/sqrt(N) interval is close to 95%:

Let the distribution f(x) have an average 0 and standard deviation S. An average X from a sufficiently large (specified in the proof) set of N samples from f(x) will fall within 2*S/sqrt(N) of the average 0 with a probability between 0.93 and 0.97.
proof:
Calculating the average x from N samples (from many different sets, each of N samples) will give a distribution of averages A_N(x) that approaches a normal distribution as N goes to infinity, centred around 0, and with a width of S/sqrt(N). This is the CLT.

Specify "sufficiently large N" such that A_N(x) is similar to a normal distribution g(x) of width S/sqrt(N). Close enough so that the integral from -2*S/sqrt(N) to 2*S/sqrt(N) is between 0.97 and 0.93 (it is close to 0.95 for g). As A_N approaches g as N-->infty, this will happen for some N. The more similar f(x) is to a normal distribution, the lower N is required.

Now take a single average X from f(x), using N samples (this would be the OP). This average is distributed according to A_N(x), and with a sufficiently large N, the probability that X is between -2*S/sqrt(N) and 2*S/sqrt(N) is larger than 0.93, and smaller than 0.97. QED.

Then at what N it reaches "sufficiently large" is a trickier matter. But I am personally convinced (from experience) that with the well behaved distribution of MMR we see, and with thousands of samples, the errors are accurate enough so that the conclusion stands. Ie, that there is a significant signal that the terran MMR is lower than the zerg MMR. Due to the finite (aawwwww ) sample size there is little point in claiming confidence levels of exactly 0.99957353526452, but if this method gives a confidence level of 99.9% I think it is safe to say that you are more than 99% sure. This would also include other errors, such as correlations in the sample (as I was nagging about earlier).
skeldark
Profile Joined April 2010
Germany2223 Posts
Last Edited: 2012-07-13 07:25:42
July 13 2012 06:58 GMT
#436
Discussion:
I think its time to forget the past and start new again.
Most of us did not behaviour in the past like they should have ( me included)
After we do all agree on the main points we can let the personal stuff aside.


On July 13 2012 12:38 Cascade wrote:
- However, it does assume that the samples are uncorrelated. OP, you said that you removed duplicates from the list, but do you think there can be other correlations in the list of samples? You probably know best exactly what is in the list. If there are still correlations, it means that the error should be larger than what you get from a central limit analysis. But it seems like the (small) signal will still be significant, even if the error is increased a bit. Hopefully there shouldn't be large correlations in there?



Duplicates
-I can 100% guarantee that there are no duplicated accounts

The profile list is generated backwards ( last upload game first ) and filtered by:
- The mmr of the account is valid
- The race of the player is known
- The player is not a random player
- The account is not already in the list

In fact there is a mistake that i exclude data unnecessary:
i forgot that the id is only unique for an server and i only check for id not for server+id

Other correlations:
Only thing i can think of is that the users-mmr and the opponent-mmr is analysed in total different way.
And the analyser for the opponent take the result of the player into account
I can mark witch data value is userdata and witch is opponent data
Also all opponents of one player are obvious not far away from each other.
I can also mark witch opponent values are submitted by the same user.

Beside this the analyse and collection of the mmr is very complicated
I can not guarantee that i dont have any structural mistakes at some place that could create correlations
But at the moment i dont see such an factor.

Data
I can add some useful information to the profile list and publish it again
What i think of is:
-Time the game was played ( this is sadly user time not server time. i should fix this in the long term)
- An id of the user that submitted the data
- An id of the account that is shown
- mark if the data comes from a user or an opponent
- mainrace of the account +the race of the account in the last game he played
Anything else?


High mmr cap:
I have some more arguments but its offtopic and i just wake up.
Let us leave this topic for now and perhaps catch up on it later.

Also a special side note: the ladder isn't 20/20/20/20/18/2 anymore. There were some offset corrections and I don't know the new targeted distribution, but I would say conservatively it's closer to 20/20/20/20/16/4. I don't expect Blizzard to release the new target values.

Total agree with this. The data move away from normal slowly and they try to correct with offsets. However i have the feeling they decided not to do so anymore because they dont want to create demotion/promotion waves. On the other hand they could do so at session start and obvious did not with start of season 8. Example the platin offsets are not equal to silver what should be the case if the data is normal. So they corrected with this offsets towards 20/20... already.




Save gaming: kill esport
Cascade
Profile Blog Joined March 2006
Australia5405 Posts
July 13 2012 07:23 GMT
#437
Sure, add all the data you can think off.

I think a more interesting analysis can be made from the list of games though. Although there we will REALLY have to think of the systematics, as each player submits many games, and what if a player that is really good at say PvZ submits 30 games? That is for another thread though.

Do you think it is a problem that the samples are weighted by activity? Ie, if (X level) terrans feel frustrated and play less, they will face your users less often, and be less represented in the statistics (at X level). What we measure is actually not only MMR as a flat average over all players, but an average weighted by their current activity.

Otherwise I'm not sure there is much more I have to say. Doing measurement of single leagues (intervals in MMR) doesn't really make sense, as it would only measure the difference in slope of the distribution for the different races. Also I won't have much access to internet over the weekend.

cheers
skeldark
Profile Joined April 2010
Germany2223 Posts
Last Edited: 2012-07-13 07:34:49
July 13 2012 07:31 GMT
#438
On July 13 2012 16:23 Cascade wrote:
Sure, add all the data you can think off.

I think a more interesting analysis can be made from the list of games though. Although there we will REALLY have to think of the systematics, as each player submits many games, and what if a player that is really good at say PvZ submits 30 games? That is for another thread though.

Do you think it is a problem that the samples are weighted by activity? Ie, if (X level) terrans feel frustrated and play less, they will face your users less often, and be less represented in the statistics (at X level). What we measure is actually not only MMR as a flat average over all players, but an average weighted by their current activity.

cheers


That is true.
I already notice when i try to collect division data, that i see the same division all the time because the first players of new season create them and this are the guys who play all the time.
The active userbase is way smaller than the total userbase and the very small very active userbase create alone most of the games.
It could get a problem if you make the time interval shorter.
But i have a feeling this is again a definition of balance. If good players of one race stop playing is this an balance indicator?



Otherwise I'm not sure there is much more I have to say. Doing measurement of single leagues (intervals in MMR) doesn't really make sense, as it would only measure the difference in slope of the distribution for the different races. Also I won't have much access to internet over the weekend.

But the difference in slope of the distribution for the different races in different mmr intervals is a interesting fact too.


The total gamedata is published in my MMR-Tool thread.
I will update it soon with the race data and the game length.

Save gaming: kill esport
Thrombozyt
Profile Blog Joined June 2010
Germany1269 Posts
July 13 2012 08:31 GMT
#439
On July 13 2012 10:23 Jadoreoov wrote:
First off I'd like to point out that the normality of the data doesn't really matter because of the Central Limit Theorem, so please stop discussing that like it matters.

Continuing with lolcanoe's analysis, I found the 99% confidence intervals for the difference in mean for each group.

US and EU:
ZvT
(51.5, 118.8)
PvT
(28.9, 99.6)
ZvP
(-11.1, 53.2)


On July 13 2012 11:20 Jadoreoov wrote:
Done:

95% confidence intervals for the EU and US combined:
ZvT:
(59.5, 110.7)
PvT
(37.3, 91.2)
ZvP
(-3.7, 45.5)

US vs EU
(28.5, 70.5)


Shouldn't the interval in which the mean can fall become larger as you lower your level of confidence?
skeldark
Profile Joined April 2010
Germany2223 Posts
July 13 2012 09:04 GMT
#440
UPDATE

Games & Player:
datafile
Save gaming: kill esport
Prev 1 20 21 22 23 24 26 Next All
Please log in or register to reply.
Live Events Refresh
Next event in 7m
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Crank 50
Tasteless 45
Rex 13
StarCraft: Brood War
Calm 6527
Bisu 883
actioN 251
Hyuk 176
sorry 108
Pusan 104
ToSsGirL 102
Light 90
Hyun 83
Mini 80
[ Show more ]
Dewaltoss 72
HiyA 68
Soma 63
Liquid`Ret 43
ZerO 37
Nal_rA 35
soO 27
Sharp 23
Free 23
Rush 19
SilentControl 13
Sexy 8
Dota 2
singsing1594
XcaliburYe205
League of Legends
JimRising 374
Counter-Strike
olofmeister1619
shoxiejesuss650
allub191
Other Games
XaKoH 154
NeuroSwarm77
Trikslyr14
Organizations
StarCraft: Brood War
lovetv 632
Other Games
gamesdonequick557
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
Kim Chul Min (afreeca) 0
sctven
[ Show 14 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• iopq 1
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
League of Legends
• Jankos1347
• Stunt600
Other Games
• WagamamaTV108
Upcoming Events
RSL Revival
7m
Maru vs Reynor
Cure vs TriGGeR
Crank 50
Tasteless45
Rex13
CranKy Ducklings7
Map Test Tournament
1h 7m
The PondCast
3h 7m
RSL Revival
1d
Zoun vs Classic
Korean StarCraft League
1d 17h
BSL Open LAN 2025 - War…
1d 22h
RSL Revival
2 days
BSL Open LAN 2025 - War…
2 days
RSL Revival
3 days
Online Event
3 days
[ Show More ]
Wardi Open
4 days
Monday Night Weeklies
4 days
Sparkling Tuna Cup
5 days
LiuLi Cup
6 days
Liquipedia Results

Completed

Proleague 2025-09-10
Chzzk MurlocKing SC1 vs SC2 Cup #2
HCC Europe

Ongoing

BSL 20 Team Wars
KCM Race Survival 2025 Season 3
BSL 21 Points
ASL Season 20
CSL 2025 AUTUMN (S18)
LASL Season 20
RSL Revival: Season 2
Maestros of the Game
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1

Upcoming

2025 Chongqing Offline CUP
BSL World Championship of Poland 2025
IPSL Winter 2025-26
BSL Season 21
SC4ALL: Brood War
BSL 21 Team A
Stellar Fest
SC4ALL: StarCraft II
EC S1
ESL Impact League Season 8
SL Budapest Major 2025
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025
MESA Nomadic Masters Fall
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.