• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EST 14:40
CET 20:40
KST 04:40
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
RSL Revival - 2025 Season Finals Preview8RSL Season 3 - Playoffs Preview0RSL Season 3 - RO16 Groups C & D Preview0RSL Season 3 - RO16 Groups A & B Preview2TL.net Map Contest #21: Winners12
Community News
Weekly Cups (Dec 29-Jan 4): Protoss rolls, 2v2 returns6[BSL21] Non-Korean Championship - Starts Jan 103SC2 All-Star Invitational: Jan 17-1822Weekly Cups (Dec 22-28): Classic & MaxPax win, Percival surprises3Weekly Cups (Dec 15-21): Classic wins big, MaxPax & Clem take weeklies3
StarCraft 2
General
Weekly Cups (Dec 29-Jan 4): Protoss rolls, 2v2 returns SC2 All-Star Invitational: Jan 17-18 Weekly Cups (Dec 22-28): Classic & MaxPax win, Percival surprises Chinese SC2 server to reopen; live all-star event in Hangzhou Starcraft 2 Zerg Coach
Tourneys
WardiTV Winter Cup WardiTV Mondays SC2 AI Tournament 2026 OSC Season 13 World Championship uThermal 2v2 Circuit
Strategy
Simple Questions Simple Answers
Custom Maps
Map Editor closed ?
External Content
Mutation # 507 Well Trained Mutation # 506 Warp Zone Mutation # 505 Rise From Ashes Mutation # 504 Retribution
Brood War
General
I would like to say something about StarCraft BGH Auto Balance -> http://bghmmr.eu/ BW General Discussion StarCraft & BroodWar Campaign Speedrun Quest Data analysis on 70 million replays
Tourneys
[Megathread] Daily Proleagues [BSL21] Grand Finals - Sunday 21:00 CET [BSL21] Non-Korean Championship - Starts Jan 10 SLON Grand Finals – Season 2
Strategy
Game Theory for Starcraft Simple Questions, Simple Answers Current Meta [G] How to get started on ladder as a new Z player
Other Games
General Games
Stormgate/Frost Giant Megathread General RTS Discussion Thread Nintendo Switch Thread Awesome Games Done Quick 2026! Should offensive tower rushing be viable in RTS games?
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia Mafia Game Mode Feedback/Ideas Survivor II: The Amazon Sengoku Mafia
Community
General
US Politics Mega-thread Things Aren’t Peaceful in Palestine Russo-Ukrainian War Thread Trading/Investing Thread The Big Programming Thread
Fan Clubs
White-Ra Fan Club
Media & Entertainment
Anime Discussion Thread [Manga] One Piece
Sports
2024 - 2026 Football Thread Formula 1 Discussion
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread
TL Community
The Automated Ban List TL+ Announced
Blogs
How do archons sleep?
8882
Psychological Factors That D…
TrAiDoS
James Bond movies ranking - pa…
Topin
StarCraft improvement
iopq
GOAT of Goats list
BisuDagger
Customize Sidebar...

Website Feedback

Closed Threads



Active: 2011 users

Statisticians of TL! Some advice

Blogs > Duka08
Post a Reply
Duka08
Profile Blog Joined July 2010
3391 Posts
July 18 2011 17:49 GMT
#1
Through reading some of the college/class related threads in General occasionally I've noticed quite a good number of high-caliber mathematically-inclined posters on TL! So I've come to prompt some discussion that may lead my research group in a direction that we've been unable to find for a few weeks.

I'm currently doing undergraduate research (Physics) at my university and we (myself, my partner, and my advisor) have had a dilemma over the past few weeks on how to quantify something statistically. Rather than talk about the research in detail, in order to avoid subtleties and unnecessarily dense description, I'll use a fun (and hopefully understandable) example that is essentially counterpart to our data.

It's going to be a rough example, but bear with me. Let's imagine the traffic of Teamliquid. Each time someone visits TL, we'll tag that exact moment in time as "a visitor", and this is how we'll track these events. Over the course of each day or week (long time spans) we'd expect some pretty regular patterns (background) depending on the time of day for the most part, where a large majority of the same people check on a daily or weekly basis. Ignoring smaller subtle fluctuations (noise), there would be presumably some general trend on a daily/weekly/monthly scale we could see and account for as background.

Now, larger events such as showmatches or the final rounds of some tournaments (a much shorter time scale than the background) could cause an increase in flux of visitors, posting in LR threads and viewing the stream(s) and what not. Assuming we subtract the daily background, these events would show up in a graph of visitors over time, and we could attribute this "burst" of visitors to the larger event in question.

With these "bursts" in mind (with a source we can associate with good certainty) we come to my actual dilemma. Let's say there is a showmatch between Idra and Tyler. This would generate a burst of visitors previously discussed, in both posters and viewers. In Game 3, Idra 6 pools, and some TL notables tweet about it as it happens live and there is a spike of people that read it and immediately go to TL and tune in. So not only is there a general increase in visitor flux due to the showmatch as a whole, but this momentary SPIKE in arrival times (remember, we're tagging these events exactly as the time people join, not "how many people currently viewing") that are presumably associated with a specific event (in this case, the 6pool/tweet).

We want to quantify this bunching. The events in question are simply large amounts of arrival times, and we can histogram them to see general trends/flux over long times, or zoom in and look at each time individually to see the small scale structure. After subtracting background, let's say we see exactly 100 events semi-randomly distributed in some time interval. These events are above background and can be associated with a larger-scale event (in the example, the showmatch as a whole). Now, these 100 time-tagged events appear randomly distributed, but upon closer inspection 5 of them come EXTREMELY close together (the tweet-induced visitors). Visually, they are clearly bunched together and hopefully associated with some event, and the goal is to statistically quantify "how bunched they are" in comparison to the overall randomness of the 100 background-subtracted events. Basically, there's a bunch of stuff that should be random or looks random, but there's a clump that is unusually close, and we want to be able to somehow say "in this bunch of random stuff, these are so unusual that they aren't just random".

We've tried most of the basic, "common/acceptable" tests, such as chi-square and something we had high hopes for called a K-S Test. Most tests we've tried either don't capture what we're looking for, in that they don't properly "observe" the closeness in a way that works with small amounts of data (the numbers in the last paragraph are fabricated but basically on the same scale). The basic ratio that we're using "in-house" for our own measurements is a ratio of [events seen in a specific time window / events expected in the same window] where the time window is chosen to be close to the scale of the "bunching" and the expected rate is simply related to the [total events seen times the percentage of the total time our chosen window is]. The larger this ratio is the more significant the clumping in a chosen time window somewhere in the data. However this ratio is, to our knowledge, simply arbitrary. We need a way to actually quantify this in a way that makes sense statistically that others in the field will accept as significant.


Hopefully this doesn't fall into the category of homework help since... it's not?!? No answers to be found here really, just as many possible ways of approaching the problem. We're looking for options that we haven't tried or don't know well enough.

MisterD
Profile Blog Joined June 2010
Germany1338 Posts
July 18 2011 18:14 GMT
#2
so you are basically looking for a high frequency spike in a low frequency background noise? I don't know, but fourier transform came to my mind, but i'm not sure if thats applicable here. But maybe you can steal some techniques from signal analysis, they should have ways to deal with this stuff in continuous spaces, which should be transferable to your discrete space in some way possibly maybe.
Gold isn't everything in life... you need wood, too!
McFortran
Profile Joined October 2010
United States79 Posts
July 18 2011 18:22 GMT
#3
I'm not a statistician, but what you're describing is a stochastic process. Perhaps you should look up methods for modeling time series.
ComaDose
Profile Blog Joined December 2009
Canada10357 Posts
July 18 2011 19:11 GMT
#4
Im sad that this is all the advice you can get
but alas i cannot help you either.
I'd be interested to know what parallel comes with this example.
BW pros training sc2 is like kiss making a dub step album.
n.DieJokes
Profile Blog Joined November 2008
United States3443 Posts
July 18 2011 19:42 GMT
#5
Who's the grad. statistician at Duke, OneOther or Empyrean? Which ever one it is, it probably wouldn't hurt to ask him
MyLove + Your Love= Supa Love
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-07-19 00:32:38
July 18 2011 22:23 GMT
#6
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.
Thank God and gunrun.
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 01:42 GMT
#7
On July 19 2011 07:23 Primadog wrote:
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

Show nested quote +
I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.

He doesn't actually want to do research on TL views, it was just an analogy for his physics research.
Milkis
Profile Blog Joined January 2010
5003 Posts
July 19 2011 01:54 GMT
#8
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 03:31 GMT
#9
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

I was thinking about tell him to compare it to a poisson process, but the problem if you do that is that you have to manually choose the endpoints on where the event's influence lies. If there are sensical time endpoints to use that's great, but if it's an event that has a indeterminate influence on the future then you want something more sophisticated I think.

Not knowing very much about time dependent data, I can't give any more than that.
Duka08
Profile Blog Joined July 2010
3391 Posts
Last Edited: 2011-07-19 03:48:20
July 19 2011 03:41 GMT
#10
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

What you're describing is essentially it yes. There is a "larger" background-subtracted / detrended event (your example's 100 people), and within this broader event there is a short burst of even higher frequency data (times) that we want to say is somehow quantifiably correlated in excess to the rest of the (essentially random) background-subtracted data.

We've been working with Poisson statistics from numerous approaches, especially over the past few days. Any quantitative assessment we arrive at with the Poissonian methods and distributions seem too situationally specific and even TOO low of a probability. Sure lower is better, that's what we'd like! In Physics you have to account for everything and second guess even yourself... we'd like a lower probability to be able to say that the bunching is more significant, but the numbers we were arriving at were TOO absurdly low to even find reasonable haha. Perhaps integrating over some Poissonian distribution or choosing a more general, flexible set of parameters instead of getting so specific. As I said, we've done some with Poisson stats and still are, but we're also looking for more options.

At this point I may just delve into the actuality of what we're doing / working with, but I was trying to avoid that as it's active research blah blah mumbo jumbo. If anyone (Milkis?) is genuinely interested in the specifics if they think it'll help with ideas feel free to PM.
Please log in or register to reply.
Live Events Refresh
Next event in 1d 8h
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
JuggernautJason159
BRAT_OK 85
Railgan 42
StarCraft: Brood War
Britney 26475
Shuttle 623
Dewaltoss 132
scan(afreeca) 10
Shine 7
Dota 2
420jenkins700
Fuzer 262
League of Legends
JimRising 400
Counter-Strike
adren_tv108
Heroes of the Storm
Liquid`Hasu536
Other Games
Grubby4085
Liquid`RaSZi2167
FrodaN1496
ceh9543
Mlord493
B2W.Neo269
DeMusliM215
ToD146
QueenE67
fpsfer 2
Organizations
Other Games
gamesdonequick45536
BasetradeTV10
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 19 non-featured ]
StarCraft 2
• StrangeGG 66
• naamasc251
• Adnapsc2 15
• intothetv
• Migwel
• AfreecaTV YouTube
• sooper7s
• Kozan
• IndyKCrew
• LaughNgamezSOOP
StarCraft: Brood War
• blackmanpl 33
• 80smullet 8
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• WagamamaTV577
League of Legends
• Nemesis2708
Other Games
• imaqtpie1754
• Shiphtur289
Upcoming Events
SOOP
1d 8h
SHIN vs GuMiho
Cure vs Creator
The PondCast
1d 14h
Wardi Open
1d 16h
Sparkling Tuna Cup
2 days
WardiTV Invitational
2 days
IPSL
3 days
DragOn vs Sziky
Replay Cast
3 days
Wardi Open
3 days
Monday Night Weeklies
3 days
WardiTV Invitational
4 days
[ Show More ]
WardiTV Invitational
5 days
The PondCast
6 days
Liquipedia Results

Completed

Proleague 2026-01-06
WardiTV 2025
META Madness #9

Ongoing

C-Race Season 1
IPSL Winter 2025-26
OSC Championship Season 13
eXTREMESLAND 2025
SL Budapest Major 2025
ESL Impact League Season 8
BLAST Rivals Fall 2025
IEM Chengdu 2025
PGL Masters Bucharest 2025

Upcoming

Escore Tournament S1: W3
BSL 21 Non-Korean Championship
CSL 2025 WINTER (S19)
Acropolis #4
IPSL Spring 2026
Bellum Gens Elite Stara Zagora 2026
HSC XXVIII
Rongyi Cup S3
Thunderfire SC2 All-star 2025
Big Gabe Cup #3
Nations Cup 2026
Underdog Cup #3
NA Kuram Kup
BLAST Open Spring 2026
ESL Pro League Season 23
ESL Pro League Season 23
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter Qual
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.