• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 06:49
CET 11:49
KST 19:49
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL21] Ro24 Preview Pt1: New Chaos0Team Liquid Map Contest #22 - Presented by Monster Energy7ByuL: The Forgotten Master of ZvT30Behind the Blue - Team Liquid History Book19Clem wins HomeStory Cup 289
Community News
Weekly Cups (March 16-22): herO doubles, Cure surprises3Blizzard Classic Cup @ BlizzCon 2026 - $100k prize pool42Weekly Cups (March 9-15): herO, Clem, ByuN win42026 KungFu Cup Announcement6BGE Stara Zagora 2026 cancelled12
StarCraft 2
General
Explore the Palmistry Certificate Course at Bivs Weekly Cups (March 16-22): herO doubles, Cure surprises Weekly Cups (August 25-31): Clem's Last Straw? Team Liquid Map Contest #22 - Presented by Monster Energy What mix of new & old maps do you want in the next ladder pool? (SC2)
Tourneys
Sparkling Tuna Cup - Weekly Open Tournament World University TeamLeague (500$+) | Signups Open RSL Season 4 announced for March-April WardiTV Team League Season 10 KSL Week 87
Strategy
Custom Maps
Publishing has been re-enabled! [Feb 24th 2026]
External Content
The PondCast: SC2 News & Results Mutation # 518 Radiation Zone Mutation # 517 Distant Threat Mutation # 516 Specter of Death
Brood War
General
ASL21 General Discussion Soulkey's decision to leave C9 BGH Auto Balance -> http://bghmmr.eu/ JaeDong's form before ASL [ASL21] Ro24 Preview Pt1: New Chaos
Tourneys
[ASL21] Ro24 Group B [ASL21] Ro24 Group A ASL Season 21 LIVESTREAM with English Commentary [Megathread] Daily Proleagues
Strategy
Fighting Spirit mining rates Simple Questions, Simple Answers Soma's 9 hatch build from ASL Game 2
Other Games
General Games
General RTS Discussion Thread Stormgate/Frost Giant Megathread Nintendo Switch Thread Path of Exile Dawn of War IV
Dota 2
Official 'what is Dota anymore' discussion The Story of Wings Gaming
League of Legends
G2 just beat GenG in First stand
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Five o'clock TL Mafia Mafia Game Mode Feedback/Ideas Vanilla Mini Mafia
Community
General
US Politics Mega-thread European Politico-economics QA Mega-thread Things Aren’t Peaceful in Palestine YouTube Thread Canadian Politics Mega-thread
Fan Clubs
The IdrA Fan Club
Media & Entertainment
[Req][Books] Good Fantasy/SciFi books Movie Discussion! [Manga] One Piece
Sports
2024 - 2026 Football Thread Cricket [SPORT] Formula 1 Discussion Tokyo Olympics 2021 Thread General nutrition recommendations
World Cup 2022
Tech Support
Laptop capable of using Photoshop Lightroom?
TL Community
The Automated Ban List
Blogs
Funny Nicknames
LUCKY_NOOB
Money Laundering In Video Ga…
TrAiDoS
Iranian anarchists: organize…
XenOsky
FS++
Kraekkling
Shocked by a laser…
Spydermine0240
Unintentional protectionism…
Uldridge
ASL S21 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 2188 users

Statisticians of TL! Some advice

Blogs > Duka08
Post a Reply
Duka08
Profile Blog Joined July 2010
3391 Posts
July 18 2011 17:49 GMT
#1
Through reading some of the college/class related threads in General occasionally I've noticed quite a good number of high-caliber mathematically-inclined posters on TL! So I've come to prompt some discussion that may lead my research group in a direction that we've been unable to find for a few weeks.

I'm currently doing undergraduate research (Physics) at my university and we (myself, my partner, and my advisor) have had a dilemma over the past few weeks on how to quantify something statistically. Rather than talk about the research in detail, in order to avoid subtleties and unnecessarily dense description, I'll use a fun (and hopefully understandable) example that is essentially counterpart to our data.

It's going to be a rough example, but bear with me. Let's imagine the traffic of Teamliquid. Each time someone visits TL, we'll tag that exact moment in time as "a visitor", and this is how we'll track these events. Over the course of each day or week (long time spans) we'd expect some pretty regular patterns (background) depending on the time of day for the most part, where a large majority of the same people check on a daily or weekly basis. Ignoring smaller subtle fluctuations (noise), there would be presumably some general trend on a daily/weekly/monthly scale we could see and account for as background.

Now, larger events such as showmatches or the final rounds of some tournaments (a much shorter time scale than the background) could cause an increase in flux of visitors, posting in LR threads and viewing the stream(s) and what not. Assuming we subtract the daily background, these events would show up in a graph of visitors over time, and we could attribute this "burst" of visitors to the larger event in question.

With these "bursts" in mind (with a source we can associate with good certainty) we come to my actual dilemma. Let's say there is a showmatch between Idra and Tyler. This would generate a burst of visitors previously discussed, in both posters and viewers. In Game 3, Idra 6 pools, and some TL notables tweet about it as it happens live and there is a spike of people that read it and immediately go to TL and tune in. So not only is there a general increase in visitor flux due to the showmatch as a whole, but this momentary SPIKE in arrival times (remember, we're tagging these events exactly as the time people join, not "how many people currently viewing") that are presumably associated with a specific event (in this case, the 6pool/tweet).

We want to quantify this bunching. The events in question are simply large amounts of arrival times, and we can histogram them to see general trends/flux over long times, or zoom in and look at each time individually to see the small scale structure. After subtracting background, let's say we see exactly 100 events semi-randomly distributed in some time interval. These events are above background and can be associated with a larger-scale event (in the example, the showmatch as a whole). Now, these 100 time-tagged events appear randomly distributed, but upon closer inspection 5 of them come EXTREMELY close together (the tweet-induced visitors). Visually, they are clearly bunched together and hopefully associated with some event, and the goal is to statistically quantify "how bunched they are" in comparison to the overall randomness of the 100 background-subtracted events. Basically, there's a bunch of stuff that should be random or looks random, but there's a clump that is unusually close, and we want to be able to somehow say "in this bunch of random stuff, these are so unusual that they aren't just random".

We've tried most of the basic, "common/acceptable" tests, such as chi-square and something we had high hopes for called a K-S Test. Most tests we've tried either don't capture what we're looking for, in that they don't properly "observe" the closeness in a way that works with small amounts of data (the numbers in the last paragraph are fabricated but basically on the same scale). The basic ratio that we're using "in-house" for our own measurements is a ratio of [events seen in a specific time window / events expected in the same window] where the time window is chosen to be close to the scale of the "bunching" and the expected rate is simply related to the [total events seen times the percentage of the total time our chosen window is]. The larger this ratio is the more significant the clumping in a chosen time window somewhere in the data. However this ratio is, to our knowledge, simply arbitrary. We need a way to actually quantify this in a way that makes sense statistically that others in the field will accept as significant.


Hopefully this doesn't fall into the category of homework help since... it's not?!? No answers to be found here really, just as many possible ways of approaching the problem. We're looking for options that we haven't tried or don't know well enough.

MisterD
Profile Blog Joined June 2010
Germany1338 Posts
July 18 2011 18:14 GMT
#2
so you are basically looking for a high frequency spike in a low frequency background noise? I don't know, but fourier transform came to my mind, but i'm not sure if thats applicable here. But maybe you can steal some techniques from signal analysis, they should have ways to deal with this stuff in continuous spaces, which should be transferable to your discrete space in some way possibly maybe.
Gold isn't everything in life... you need wood, too!
McFortran
Profile Joined October 2010
United States79 Posts
July 18 2011 18:22 GMT
#3
I'm not a statistician, but what you're describing is a stochastic process. Perhaps you should look up methods for modeling time series.
ComaDose
Profile Blog Joined December 2009
Canada10357 Posts
July 18 2011 19:11 GMT
#4
Im sad that this is all the advice you can get
but alas i cannot help you either.
I'd be interested to know what parallel comes with this example.
BW pros training sc2 is like kiss making a dub step album.
n.DieJokes
Profile Blog Joined November 2008
United States3443 Posts
July 18 2011 19:42 GMT
#5
Who's the grad. statistician at Duke, OneOther or Empyrean? Which ever one it is, it probably wouldn't hurt to ask him
MyLove + Your Love= Supa Love
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-07-19 00:32:38
July 18 2011 22:23 GMT
#6
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.
Thank God and gunrun.
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 01:42 GMT
#7
On July 19 2011 07:23 Primadog wrote:
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

Show nested quote +
I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.

He doesn't actually want to do research on TL views, it was just an analogy for his physics research.
Milkis
Profile Blog Joined January 2010
5003 Posts
July 19 2011 01:54 GMT
#8
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 03:31 GMT
#9
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

I was thinking about tell him to compare it to a poisson process, but the problem if you do that is that you have to manually choose the endpoints on where the event's influence lies. If there are sensical time endpoints to use that's great, but if it's an event that has a indeterminate influence on the future then you want something more sophisticated I think.

Not knowing very much about time dependent data, I can't give any more than that.
Duka08
Profile Blog Joined July 2010
3391 Posts
Last Edited: 2011-07-19 03:48:20
July 19 2011 03:41 GMT
#10
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

What you're describing is essentially it yes. There is a "larger" background-subtracted / detrended event (your example's 100 people), and within this broader event there is a short burst of even higher frequency data (times) that we want to say is somehow quantifiably correlated in excess to the rest of the (essentially random) background-subtracted data.

We've been working with Poisson statistics from numerous approaches, especially over the past few days. Any quantitative assessment we arrive at with the Poissonian methods and distributions seem too situationally specific and even TOO low of a probability. Sure lower is better, that's what we'd like! In Physics you have to account for everything and second guess even yourself... we'd like a lower probability to be able to say that the bunching is more significant, but the numbers we were arriving at were TOO absurdly low to even find reasonable haha. Perhaps integrating over some Poissonian distribution or choosing a more general, flexible set of parameters instead of getting so specific. As I said, we've done some with Poisson stats and still are, but we're also looking for more options.

At this point I may just delve into the actuality of what we're doing / working with, but I was trying to avoid that as it's active research blah blah mumbo jumbo. If anyone (Milkis?) is genuinely interested in the specifics if they think it'll help with ideas feel free to PM.
Please log in or register to reply.
Live Events Refresh
Afreeca Starleague
10:00
Ro24 Group B
Soulkey vs Ample
JyJ vs sSak
Afreeca ASL 8783
StarCastTV_EN222
Liquipedia
Sparkling Tuna Cup
10:00
Weekly #124
Percival vs YoungYakovLIVE!
CranKy Ducklings75
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
OGKoka 204
SortOf 127
ProTech119
StarCraft: Brood War
Calm 13578
Flash 6062
Bisu 4434
GuemChi 1863
BeSt 804
firebathero 550
EffOrt 355
Pusan 286
Light 272
Zeus 271
[ Show more ]
Stork 238
actioN 227
ZerO 223
Leta 218
HiyA 121
ToSsGirL 79
Rush 75
Mind 73
Killer 68
Sharp 54
PianO 51
Barracks 47
Snow 35
Nal_rA 28
Hm[arnc] 21
GoRush 20
Bale 19
Shinee 17
Terrorterran 16
Noble 12
yabsab 11
Purpose 10
sorry 10
soO 10
Dota 2
XcaliburYe281
BananaSlamJamma177
canceldota148
League of Legends
JimRising 346
Counter-Strike
olofmeister2104
shoxiejesuss748
byalli509
x6flipin259
Super Smash Bros
Westballz13
Other Games
singsing2007
Sick311
XBOCT263
crisheroes248
Happy133
Livibee56
Trikslyr19
B2W.Neo11
Organizations
Other Games
gamesdonequick874
StarCraft: Brood War
UltimateBattle 250
Dota 2
PGL Dota 2 - Main Stream125
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 15 non-featured ]
StarCraft 2
• Berry_CruncH201
• StrangeGG 47
• LUISG 26
• CranKy Ducklings SOOP5
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• iopq 4
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
Upcoming Events
Replay Cast
22h 11m
Afreeca Starleague
23h 11m
hero vs YSC
Larva vs Shine
Kung Fu Cup
1d
Replay Cast
1d 13h
KCM Race Survival
1d 22h
The PondCast
1d 23h
WardiTV Team League
2 days
Replay Cast
2 days
WardiTV Team League
3 days
RSL Revival
3 days
Cure vs Zoun
herO vs Rogue
[ Show More ]
WardiTV Team League
4 days
Platinum Heroes Events
4 days
BSL
4 days
RSL Revival
4 days
ByuN vs Maru
MaxPax vs TriGGeR
WardiTV Team League
5 days
BSL
5 days
Replay Cast
5 days
Afreeca Starleague
5 days
Light vs Calm
Royal vs Mind
Wardi Open
6 days
Monday Night Weeklies
6 days
Sparkling Tuna Cup
6 days
Afreeca Starleague
6 days
Rush vs PianO
Flash vs Speed
Liquipedia Results

Completed

Proleague 2026-03-23
WardiTV Winter 2026
Underdog Cup #3

Ongoing

KCM Race Survival 2026 Season 1
BSL Season 22
CSL Elite League 2026
CSL Season 20: Qualifier 1
ASL Season 21
Acropolis #4 - TS6
RSL Revival: Season 4
Nations Cup 2026
NationLESS Cup
BLAST Open Spring 2026
ESL Pro League S23 Finals
ESL Pro League S23 Stage 1&2
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026
BLAST Bounty Winter Qual

Upcoming

2026 Changsha Offline CUP
CSL Season 20: Qualifier 2
CSL 2026 SPRING (S20)
Acropolis #4
IPSL Spring 2026
BSL 22 Non-Korean Championship
CSLAN 4
Kung Fu Cup 2026 Grand Finals
HSC XXIX
uThermal 2v2 2026 Main Event
IEM Cologne Major 2026
Stake Ranked Episode 2
CS Asia Championships 2026
IEM Atlanta 2026
Asian Champions League 2026
PGL Astana 2026
BLAST Rivals Spring 2026
CCT Season 3 Global Finals
IEM Rio 2026
PGL Bucharest 2026
Stake Ranked Episode 1
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.