• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 09:40
CET 15:40
KST 23:40
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
TL.net Map Contest #21: Winners10Intel X Team Liquid Seoul event: Showmatches and Meet the Pros10[ASL20] Finals Preview: Arrival13TL.net Map Contest #21: Voting12[ASL20] Ro4 Preview: Descent11
Community News
StarCraft, SC2, HotS, WC3, Returning to Blizzcon!44$5,000+ WardiTV 2025 Championship7[BSL21] RO32 Group Stage4Weekly Cups (Oct 26-Nov 2): Liquid, Clem, Solar win; LAN in Philly2Weekly Cups (Oct 20-26): MaxPax, Clem, Creator win10
StarCraft 2
General
Mech is the composition that needs teleportation t StarCraft, SC2, HotS, WC3, Returning to Blizzcon! RotterdaM "Serral is the GOAT, and it's not close" TL.net Map Contest #21: Winners Weekly Cups (Oct 20-26): MaxPax, Clem, Creator win
Tourneys
Sparkling Tuna Cup - Weekly Open Tournament Constellation Cup - Main Event - Stellar Fest $5,000+ WardiTV 2025 Championship Merivale 8 Open - LAN - Stellar Fest Sea Duckling Open (Global, Bronze-Diamond)
Strategy
Custom Maps
Map Editor closed ?
External Content
Mutation # 498 Wheel of Misfortune|Cradle of Death Mutation # 497 Battle Haredened Mutation # 496 Endless Infection Mutation # 495 Rest In Peace
Brood War
General
FlaSh on: Biggest Problem With SnOw's Playstyle BW General Discussion BGH Auto Balance -> http://bghmmr.eu/ Where's CardinalAllin/Jukado the mapmaker? [ASL20] Ask the mapmakers — Drop your questions
Tourneys
[ASL20] Grand Finals [BSL21] RO32 Group A - Saturday 21:00 CET [Megathread] Daily Proleagues [BSL21] RO32 Group B - Sunday 21:00 CET
Strategy
Current Meta PvZ map balance How to stay on top of macro? Soma's 9 hatch build from ASL Game 2
Other Games
General Games
Stormgate/Frost Giant Megathread Nintendo Switch Thread Path of Exile Should offensive tower rushing be viable in RTS games? Dawn of War IV
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread SPIRED by.ASL Mafia {211640}
Community
General
The Games Industry And ATVI US Politics Mega-thread Russo-Ukrainian War Thread Things Aren’t Peaceful in Palestine YouTube Thread
Fan Clubs
White-Ra Fan Club The herO Fan Club!
Media & Entertainment
[Manga] One Piece Anime Discussion Thread Movie Discussion! Korean Music Discussion Series you have seen recently...
Sports
2024 - 2026 Football Thread Formula 1 Discussion NBA General Discussion MLB/Baseball 2023 TeamLiquid Health and Fitness Initiative For 2023
World Cup 2022
Tech Support
SC2 Client Relocalization [Change SC2 Language] Linksys AE2500 USB WIFI keeps disconnecting Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List Recent Gifted Posts
Blogs
Learning my new SC2 hotkey…
Hildegard
Coffee x Performance in Espo…
TrAiDoS
Saturation point
Uldridge
DnB/metal remix FFO Mick Go…
ImbaTosS
Reality "theory" prov…
perfectspheres
Our Last Hope in th…
KrillinFromwales
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1381 users

What you don't know about statistics

Forum Index > SC2 General
Post a Reply
1 2 3 4 5 6 7 Next All
Cyberonic
Profile Blog Joined September 2011
Germany80 Posts
Last Edited: 2012-05-04 07:33:28
May 04 2012 00:00 GMT
#1
This post is originally written by Drabzalver on Reddit. Since he does not have a TL account I asked and was allowed to repost if I include this:

Drabzalver on Reddit:
TL is the [swear word] of praetentious [swear word] where moderators reward supposed 'high quality posts' which are full of statistical and scientific garbage I just outlined just because they are 'praesented nicely' and the mods basically think that any long post with a lot of images and no swear words is intellectually advanced while often a lot of it is total garbage filled with wrong interpretations and grave statistical errors.
Also, I don't have a TL account, you're welcome to repost, but do include this qualifier, should be fun,

Clarification: This is not my opinion of this community and I only posted the quote because I was asked to and respect his author rights. The content is very interesting.


EDIT: A lot of people claim the OP/author just wants to bash the TL community. That is wrong! The text is directed towrds the reddit Community and was originally called "What reddit doesn't know about statistics" with me liberately altering the title... I see this might have let to some confusion.
-----------------------------------------------------------------------------------------------------------------------------------

What you don't know about statistics
by Drabzalver

Not a day goes by without some stat or graph being posted here or whenever in the community, 99% of the time this deals with winrates and balance, 99% of the people commenting on it have no clue how to interpret these numbers and how little they actually mean, and no, this has nothing to do with sample size. Stats and probability theory is actually something you study on for years and is a specialized field to avoid people from making the errors people make here. So stay a while and listen:

- Fallacy: winrates indicate balance

Yes and no, for the most part, they do not indicate balance, rather, they indicate balance shifts. It's so trivial to see that it boggles me that people don't figure this out on their own. Assume that race X was actually underpowered last month and is balanced this month. This means that the X players who actually qualified last month and stayed in tournaments are actually better than the Y and Z players, therefore, now that it gets balanced, as they are overal better, they start smashing Y and Z because they are better, thereby suddenly making the graph appear as X has been 'overbuffed' or whatever else while simply the few X players that were around in the scene were better. This will continue on until the mediocre Y and Z players who were more so carried by their race than the X players get weeded out.

This extends even further, most tournaments have qualifiers, so say X is underpowered, the players who play X that get into the tournament are simply better because thety got in despite the imbalance, therefore as they are better, they will continue to win even despite the imbalance vested against them, thereby skewing the results to more 50-50 than it actually is.

In fact, there are even more things wrong with this idea, because tournaments generally feature some form of elimination, this means that players who are better stay in the tournament for longer, therefore they contribute more to the amount of games played, since these are the players that overcome balance, again, it skews to 50-50 more than it is.

So yes, no matter how you have it, balance will skew to 50-50 more than it actually is, and monthily spikes and variation indicate balance shift more so than absolute balance. The only way to find out absolute balance is to get a random pool of pros, force them to play random, and have a round robin tournament to ensure that everyone plays the same amount of games. Very unfeasible to get enough games with that for a reasonable sample size.

- Fallacy: sample size is big enough

The TLPD winrate graphs are praetentious and amateuristic, sorry to say it but that's how it is, the error bars there are pure bollocks and are calculated using the rules of independent probability experiments, that is to say, it is assumed that the results of every series has no effect on the others, as if you flip a coin. If they were independent, sample size would be enough by a large margin to say something, but they are not independent. Because you're dealing with players, not just games. Good players simply ruin the idea of independent experiments. The fact that Stephano won 4-0 this time and 4-0 the last time is not independent, they have a common cause, Stephano is fucking good. More so than that, as indicated before, most tournaments feature a form of elimination, which means that because Stephano is fucking good he simply contributes more to the sample size than players who are not as good. One has to realize that in some single elimination tournaments, it's possible for the champion to have played 40% of all the games in the tournament...

As an extreme example to show that the idea is fundamentally flawed. Say you have 2 and only 2 players, the best 2 in the world, let's say you have MKP vs DRG, they play a thousand games, this ends 720-280 in MKP's favour. Can you then say 'Clearly since this is taken from the absolute top, TvZ is highly imbalanced as a thousand games is a very large sample size'. No! you cannot, and while this is an extreme example it shows the general idea and the fallacy thereof, the games are not independent probability experiments.

Especially in Korea, it is very likely that the TLPD graphs we're all so fond of do not indicate balance overall, they just indicate whichever race has a couple of good players this month that dominated everything. The Korean graph just shifts around every month while the amount of games should be large enough to stop that from happening, if they were independent experiments, but they aren't, they are quite dependent and how much the KR graph flips around each months demonstrates how unreliable it is.

Simply put, the amount of games has a large enough sample size to be significant, but the amount of players is way too small.

- Fallacy: advantages at certain times of matchups expressed in graphs

There are also a lot of graphs posted which supposedly indicate that some races may have an advantage at certain matchups. Oh boy do people misread what these graphs mean. Take this bad boy.

A naïve way to read it would simply be 'Hmm, Z has an early game advantage in ZvP, then it becomes about even, then P has a slight advantage up to the end, then Z again.', wrong; look closely, what does the graph actually say, it says this:
IF a ZvP game ends in the 0-5 minute range, the chance is 60% it ends in Z's favour.
Now the 'if' is so bloody important here, the game needn't end. Now, everyone of course realizes that that part is caused by early pools. Does Zerg really have a large advantage at that point? Can Zerg force a win at that point if they want to, are early pools overpowered? No, not at all, so what is going on?

Imagine a ZvP, Z decides to 7pool, P doesn't scout soon enough, lings get in, kill every probe, traalalala, P GG's. Game over in the first 5 minutes in Z's favour.
Okay, imagine a ZvP, Z decides to 7pool, P scouts in time, gets his wall up, damn, Z's like 'fuck man, shouldn't have done that'. But not necesarily GG's unless IdrA, the game goes on, Z however plays at such a disadvantage that in the next 5-10 minutes surely P will claim his victory unless P messes up.

See the fallacy? That Z has that 'early game advantage' doesn't mean that Z is more powerful or that 7pools are too powerful, it just means that IF the game immediately ends due to a 7pool it will most likely be in Z's favour. If the 7pool fails, the game doesn't end at that point, Z will most likely stay in the game and play from a significant disadvantage to lose later.

It is a grave statistical error of the magnitude of interpreting 'If a 8 year old child dies, the chance is the greatest he dies from a car swoop' as 'It is very likely 8 year old children die from car swoops'.

The graph doesn't even say how likely it is that the game ends at certain intervals. For all you it's far more likely for P to win in the late game than in the mid game, even though the graph indicates that if the game ends in the late game, the chance is higher that Z takes it. And even so, that still says nothing about advantages of races at certain times. One would assume that if a race is likely to win at time X, that race enjoys an advantage slightly before that time, no?

What would be far more intersting, though also not conclusive, would be a graph which outlines 'How large was the percentage of Z wins in ZvP at each interval', which is fundamentally different from 'at each interval, if the game ends, how often does it end into Z's favour in ZvP'. My bet is that because 7pools are actually quite rare, it would not at all show the huge spike for Z in the early game.
halfies
Profile Joined November 2011
United Kingdom327 Posts
May 04 2012 00:09 GMT
#2
praetentious?
really?
i hope that how he spelt it too, because it would be really funny if he wrote this much about people being pretentious and used big words that he couldn't even spell.
Lorizean
Profile Blog Joined March 2011
Germany1330 Posts
May 04 2012 00:12 GMT
#3
None of this is new or even in-depth. He starts of by saying that probability and stats are a filed that people study and continues on to bring arguments anybody could make. Not really proving a point there.
Not that I'm saying he's necessarily wrong, but having such an arrogant tone and then bringing something this trivial to the table is a bit... pretentious.
imMUTAble787
Profile Joined November 2011
United States680 Posts
May 04 2012 00:13 GMT
#4
reddit is a cesspool of memes and retardation

that being said, i love the job the mods do on these forums.

as far as stats etc go i really dont pay any mind to them. the only ones i think have any validity would be the GSL player stats because they are tournament level and reliable.
*eternalenvy fanboy*
Ripper41
Profile Joined July 2011
284 Posts
Last Edited: 2012-05-04 00:16:54
May 04 2012 00:16 GMT
#5
On May 04 2012 09:09 halfies wrote:
praetentious?
really?
i hope that how he spelt it too, because it would be really funny if he wrote this much about people being pretentious and used big words that he couldn't even spell.

Well he explains...
it's worse, he spelled it that way intentionally. still, he makes some good points in the substance of what he says.
Chocolate
Profile Blog Joined December 2010
United States2350 Posts
May 04 2012 00:17 GMT
#6
So basically the person who came up with this thinks that he is smarter than 99% of everyone else and then proceeds to list "fallacies" which tend to skew game results. Then he explains that ZvP ends early more often in favor of the zerg because of early pools.

I don't really see the value in this. It only shows what we already knew, that statistics aren't always true due to a couple factors. He also calls us pretentious but spells it oddly :/. The author himself strikes me as very pretentious.
Integra
Profile Blog Joined January 2008
Sweden5626 Posts
Last Edited: 2012-05-04 00:19:21
May 04 2012 00:18 GMT
#7
Yes, statistics taken out of context and without understanding of what they are suppose to prove makes them misleading. Good for reddit that they finally understand this

EDIT. @Chocolate: there is no value in the post, the poster just wanted to bash on TL.Net community and used statistics as an excuse.
"Dark Pleasure" | | I survived the Locust war of May 3, 2014
dmasterding
Profile Blog Joined January 2011
United States205 Posts
May 04 2012 00:19 GMT
#8
On May 04 2012 09:13 imMUTAble787 wrote:
reddit is a cesspool of memes and retardation

that being said, i love the job the mods do on these forums.

as far as stats etc go i really dont pay any mind to them. the only ones i think have any validity would be the GSL player stats because they are tournament level and reliable.


Please stop bashing the r/sc forums if you don't actually know what goes on over there. That was a lot more prevalent before, but even if you look at the front page right now you can see that there's a lot of starcraft legitimate related content and not just random shitty memes.
No tears now, only dreams.
Trumpstyle
Profile Joined May 2011
Sweden114 Posts
May 04 2012 00:23 GMT
#9
Sorry he's theory is totally garbage as he thinks the skill gap between the races are massive(it's minimum) and screws with the statistic.
dmasterding
Profile Blog Joined January 2011
United States205 Posts
May 04 2012 00:26 GMT
#10
Did any of you guys actually read the thing? He didn't actually give any opinions about the matchups, he was just trying to get rid of some misconceptions people had about interpretation of results. I am pretty sure that if the OP never mentioned this person was from r/SC you guys wouldn't be so biased against the author.
No tears now, only dreams.
i)awn
Profile Joined October 2011
United States189 Posts
Last Edited: 2012-05-04 00:28:04
May 04 2012 00:26 GMT
#11
While tournaments are indeed not reliable to calculate balance statistics because of the small pool and elimination process as Drabzalver said; tournaments and statistics still contain quite useful information. Many people might not interpret or analyze the numbers correctly but that doesn't mean that the numbers are useless.
Cyberonic
Profile Blog Joined September 2011
Germany80 Posts
May 04 2012 00:33 GMT
#12
On May 04 2012 09:26 dmasterding wrote:
Did any of you guys actually read the thing? He didn't actually give any opinions about the matchups, he was just trying to get rid of some misconceptions people had about interpretation of results. I am pretty sure that if the OP never mentioned this person was from r/SC you guys wouldn't be so biased against the author.


I think so, too... There's no reason for bashing around. I wrote why I included his text but I'd rather have a discussion on the topic and not its preface.
Bidj
Profile Joined September 2010
France98 Posts
May 04 2012 00:37 GMT
#13
At least a very interesting read.
Rooooaaaar
Jinsho
Profile Joined March 2011
United Kingdom3101 Posts
May 04 2012 00:37 GMT
#14
IdrA has been always been referring to point 1 and 2 when asked about winrates.
windsupernova
Profile Joined October 2010
Mexico5280 Posts
May 04 2012 00:40 GMT
#15
As much as I agree with him in some points. I don't like how he comes off as someone pretty arrogant and doesn't even present some kind of credentials on why he understand statistics more than 99% of people.I mean for all we know he could be some arrogant College kid who just passed his 1st statistics class.

Arguing from authority only works if you prove you are have some kind of authority. But his arguments are nice and he does seems to have some kind of understanding of statistics. But then he doesn't say how we should go about interpreting those statistics and providing proof.

That being said I do think most of the people take a really simplistic approach to statistics, but well statistics are a hard subject to tackle
"Its easy, just trust your CPU".-Boxer on being good at games
LaM
Profile Blog Joined September 2011
United States1321 Posts
May 04 2012 00:42 GMT
#16
On May 04 2012 09:13 imMUTAble787 wrote:
reddit is a cesspool of memes and retardation

that being said, i love the job the mods do on these forums.

as far as stats etc go i really dont pay any mind to them. the only ones i think have any validity would be the GSL player stats because they are tournament level and reliable.


Maybe you should read the OP instead of just the qualifier.
Anything is Possible
LaM
Profile Blog Joined September 2011
United States1321 Posts
May 04 2012 00:43 GMT
#17
On May 04 2012 09:23 Trumpstyle wrote:
Sorry he's theory is totally garbage as he thinks the skill gap between the races are massive(it's minimum) and screws with the statistic.


He makes literally 0 claims about the skill gap between races. Reread what he said.
Anything is Possible
JitnikoVi
Profile Joined May 2010
Russian Federation396 Posts
May 04 2012 00:45 GMT
#18
i dont understand the point of this... most of this is common sense yet it states that 99% of people dont know this?

really?
In theory yes, but theoretically, no.
Mephiztopheles1
Profile Blog Joined December 2010
1124 Posts
May 04 2012 00:49 GMT
#19
On May 04 2012 09:26 dmasterding wrote:
Did any of you guys actually read the thing? He didn't actually give any opinions about the matchups, he was just trying to get rid of some misconceptions people had about interpretation of results. I am pretty sure that if the OP never mentioned this person was from r/SC you guys wouldn't be so biased against the author.

I read it. He is extremely aggressive so he begets extremely aggressive answers.

Regardless of his tone and of any tl vs reddit thing brewing here, his points should accompany threads (better phrased, of course) like the TLPD win rates thread in my opinion so that users with no background in statistics (for whatever reason) can have a better grasp of what the data presented means and/or doesn't.
xrapture
Profile Blog Joined December 2011
United States1644 Posts
May 04 2012 00:55 GMT
#20
If that is a top of the line post from reddit then I'm glad I stick to TL lol.

He just said random stuff in an aggressive way (stuff everyone knows anyway) and called TLPD statistics pretentious? hypocrite much?
Everyone is either delusional, a nihlilst, or dead from suicide.
1 2 3 4 5 6 7 Next All
Please log in or register to reply.
Live Events Refresh
WardiTV Korean Royale
12:00
Group Stage 1 - Group A
WardiTV1338
Rex146
IntoTheiNu 12
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Reynor 378
Rex 143
MindelVK 28
Railgan 1
StarCraft: Brood War
firebathero 10541
Sea 3566
GuemChi 668
Barracks 598
Mini 442
Soma 350
PianO 205
Hyun 171
Last 145
hero 139
[ Show more ]
Larva 76
Backho 53
ToSsGirL 34
Terrorterran 27
HiyA 15
Noble 13
scan(afreeca) 10
zelot 9
Dota 2
qojqva2409
Dendi848
BananaSlamJamma37
Counter-Strike
byalli298
Heroes of the Storm
Khaldor340
Other Games
singsing2508
B2W.Neo1156
Sick249
RotterdaM183
Hui .182
XcaliburYe110
goatrope61
QueenE39
Mlord14
Organizations
StarCraft: Brood War
Kim Chul Min (afreeca) 7
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 13 non-featured ]
StarCraft 2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Dota 2
• C_a_k_e 3156
• WagamamaTV503
• Ler88
Upcoming Events
LAN Event
20m
ByuN vs Zoun
TBD vs TriGGeR
Clem vs TBD
IPSL
3h 20m
JDConan vs WIZARD
WolFix vs Cross
BSL 21
5h 20m
spx vs rasowy
HBO vs KameZerg
Cross vs Razz
dxtr13 vs ZZZero
OSC
8h 20m
OSC
18h 20m
Wardi Open
21h 20m
Replay Cast
1d 8h
WardiTV Korean Royale
1d 21h
Replay Cast
2 days
Kung Fu Cup
2 days
Classic vs Solar
herO vs Cure
Reynor vs GuMiho
ByuN vs ShoWTimE
[ Show More ]
Tenacious Turtle Tussle
3 days
The PondCast
3 days
RSL Revival
3 days
Solar vs Zoun
MaxPax vs Bunny
Kung Fu Cup
3 days
WardiTV Korean Royale
3 days
Replay Cast
4 days
RSL Revival
4 days
Classic vs Creator
Cure vs TriGGeR
Kung Fu Cup
4 days
CranKy Ducklings
5 days
RSL Revival
5 days
herO vs Gerald
ByuN vs SHIN
Kung Fu Cup
5 days
BSL 21
6 days
Tarson vs Julia
Doodle vs OldBoy
eOnzErG vs WolFix
StRyKeR vs Aeternum
Sparkling Tuna Cup
6 days
RSL Revival
6 days
Reynor vs sOs
Maru vs Ryung
Kung Fu Cup
6 days
WardiTV Korean Royale
6 days
Liquipedia Results

Completed

Proleague 2025-11-07
SC4ALL: StarCraft II
Eternal Conflict S1

Ongoing

C-Race Season 1
IPSL Winter 2025-26
KCM Race Survival 2025 Season 4
SOOP Univ League 2025
YSL S2
BSL Season 21
Stellar Fest: Constellation Cup
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
CS Asia Championships 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual

Upcoming

SLON Tour Season 2
BSL 21 Non-Korean Championship
Acropolis #4
IPSL Spring 2026
HSC XXVIII
RSL Offline Finals
WardiTV 2025
RSL Revival: Season 3
META Madness #9
BLAST Bounty Winter 2026: Closed Qualifier
eXTREMESLAND 2025
ESL Impact League Season 8
SL Budapest Major 2025
BLAST Rivals Fall 2025
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.