• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 22:28
CEST 04:28
KST 11:28
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL21] Ro24 Preview Pt2: News Flash10[ASL21] Ro24 Preview Pt1: New Chaos0Team Liquid Map Contest #22 - Presented by Monster Energy18ByuL: The Forgotten Master of ZvT30Behind the Blue - Team Liquid History Book20
Community News
$5,000 WardiTV TLMC tournament - Presented by Monster Energy1GSL CK: More events planned pending crowdfunding0Weekly Cups (May 30-Apr 5): herO, Clem, SHIN win0[BSL22] RO32 Group Stage4Weekly Cups (March 23-29): herO takes triple6
StarCraft 2
General
BGE Stara Zagora 2026 cancelled Blizzard Classic Cup @ BlizzCon 2026 - $100k prize pool Weekly Cups (May 30-Apr 5): herO, Clem, SHIN win Rongyi Cup S3 - Preview & Info Team Liquid Map Contest #22 - Presented by Monster Energy
Tourneys
RSL Season 4 announced for March-April $5,000 WardiTV TLMC tournament - Presented by Monster Energy Sea Duckling Open (Global, Bronze-Diamond) GSL CK: More events planned pending crowdfunding Sparkling Tuna Cup - Weekly Open Tournament
Strategy
Custom Maps
[D]RTS in all its shapes and glory <3 [A] Nemrods 1/4 players [M] (2) Frigid Storage
External Content
The PondCast: SC2 News & Results Mutation # 520 Moving Fees Mutation # 519 Inner Power Mutation # 518 Radiation Zone
Brood War
General
so ive been playing broodwar for a week straight. Gypsy to Korea ASL21 General Discussion Pros React To: JaeDong vs Queen [BSL22] RO32 Group Stage
Tourneys
[BSL22] RO32 Group B - Sunday 21:00 CEST [BSL22] RO32 Group A - Saturday 21:00 CEST 🌍 Weekly Foreign Showmatches [Megathread] Daily Proleagues
Strategy
Muta micro map competition Fighting Spirit mining rates What's the deal with APM & what's its true value Simple Questions, Simple Answers
Other Games
General Games
Stormgate/Frost Giant Megathread Starcraft Tabletop Miniature Game General RTS Discussion Thread Nintendo Switch Thread Darkest Dungeon
Dota 2
The Story of Wings Gaming Official 'what is Dota anymore' discussion
League of Legends
G2 just beat GenG in First stand
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Mafia Game Mode Feedback/Ideas TL Mafia Community Thread Five o'clock TL Mafia
Community
General
US Politics Mega-thread Things Aren’t Peaceful in Palestine European Politico-economics QA Mega-thread Canadian Politics Mega-thread Russo-Ukrainian War Thread
Fan Clubs
The IdrA Fan Club
Media & Entertainment
[Manga] One Piece [Req][Books] Good Fantasy/SciFi books Movie Discussion!
Sports
2024 - 2026 Football Thread Formula 1 Discussion Cricket [SPORT] Tokyo Olympics 2021 Thread General nutrition recommendations
World Cup 2022
Tech Support
[G] How to Block Livestream Ads
TL Community
The Automated Ban List
Blogs
Loot Boxes—Emotions, And Why…
TrAiDoS
Broowar part 2
qwaykee
Funny Nicknames
LUCKY_NOOB
Iranian anarchists: organize…
XenOsky
FS++
Kraekkling
ASL S21 English Commentary…
namkraft
Electronics
mantequilla
Customize Sidebar...

Website Feedback

Closed Threads



Active: 1692 users

Statisticians of TL! Some advice

Blogs > Duka08
Post a Reply
Duka08
Profile Blog Joined July 2010
3391 Posts
July 18 2011 17:49 GMT
#1
Through reading some of the college/class related threads in General occasionally I've noticed quite a good number of high-caliber mathematically-inclined posters on TL! So I've come to prompt some discussion that may lead my research group in a direction that we've been unable to find for a few weeks.

I'm currently doing undergraduate research (Physics) at my university and we (myself, my partner, and my advisor) have had a dilemma over the past few weeks on how to quantify something statistically. Rather than talk about the research in detail, in order to avoid subtleties and unnecessarily dense description, I'll use a fun (and hopefully understandable) example that is essentially counterpart to our data.

It's going to be a rough example, but bear with me. Let's imagine the traffic of Teamliquid. Each time someone visits TL, we'll tag that exact moment in time as "a visitor", and this is how we'll track these events. Over the course of each day or week (long time spans) we'd expect some pretty regular patterns (background) depending on the time of day for the most part, where a large majority of the same people check on a daily or weekly basis. Ignoring smaller subtle fluctuations (noise), there would be presumably some general trend on a daily/weekly/monthly scale we could see and account for as background.

Now, larger events such as showmatches or the final rounds of some tournaments (a much shorter time scale than the background) could cause an increase in flux of visitors, posting in LR threads and viewing the stream(s) and what not. Assuming we subtract the daily background, these events would show up in a graph of visitors over time, and we could attribute this "burst" of visitors to the larger event in question.

With these "bursts" in mind (with a source we can associate with good certainty) we come to my actual dilemma. Let's say there is a showmatch between Idra and Tyler. This would generate a burst of visitors previously discussed, in both posters and viewers. In Game 3, Idra 6 pools, and some TL notables tweet about it as it happens live and there is a spike of people that read it and immediately go to TL and tune in. So not only is there a general increase in visitor flux due to the showmatch as a whole, but this momentary SPIKE in arrival times (remember, we're tagging these events exactly as the time people join, not "how many people currently viewing") that are presumably associated with a specific event (in this case, the 6pool/tweet).

We want to quantify this bunching. The events in question are simply large amounts of arrival times, and we can histogram them to see general trends/flux over long times, or zoom in and look at each time individually to see the small scale structure. After subtracting background, let's say we see exactly 100 events semi-randomly distributed in some time interval. These events are above background and can be associated with a larger-scale event (in the example, the showmatch as a whole). Now, these 100 time-tagged events appear randomly distributed, but upon closer inspection 5 of them come EXTREMELY close together (the tweet-induced visitors). Visually, they are clearly bunched together and hopefully associated with some event, and the goal is to statistically quantify "how bunched they are" in comparison to the overall randomness of the 100 background-subtracted events. Basically, there's a bunch of stuff that should be random or looks random, but there's a clump that is unusually close, and we want to be able to somehow say "in this bunch of random stuff, these are so unusual that they aren't just random".

We've tried most of the basic, "common/acceptable" tests, such as chi-square and something we had high hopes for called a K-S Test. Most tests we've tried either don't capture what we're looking for, in that they don't properly "observe" the closeness in a way that works with small amounts of data (the numbers in the last paragraph are fabricated but basically on the same scale). The basic ratio that we're using "in-house" for our own measurements is a ratio of [events seen in a specific time window / events expected in the same window] where the time window is chosen to be close to the scale of the "bunching" and the expected rate is simply related to the [total events seen times the percentage of the total time our chosen window is]. The larger this ratio is the more significant the clumping in a chosen time window somewhere in the data. However this ratio is, to our knowledge, simply arbitrary. We need a way to actually quantify this in a way that makes sense statistically that others in the field will accept as significant.


Hopefully this doesn't fall into the category of homework help since... it's not?!? No answers to be found here really, just as many possible ways of approaching the problem. We're looking for options that we haven't tried or don't know well enough.

MisterD
Profile Blog Joined June 2010
Germany1338 Posts
July 18 2011 18:14 GMT
#2
so you are basically looking for a high frequency spike in a low frequency background noise? I don't know, but fourier transform came to my mind, but i'm not sure if thats applicable here. But maybe you can steal some techniques from signal analysis, they should have ways to deal with this stuff in continuous spaces, which should be transferable to your discrete space in some way possibly maybe.
Gold isn't everything in life... you need wood, too!
McFortran
Profile Joined October 2010
United States79 Posts
July 18 2011 18:22 GMT
#3
I'm not a statistician, but what you're describing is a stochastic process. Perhaps you should look up methods for modeling time series.
ComaDose
Profile Blog Joined December 2009
Canada10357 Posts
July 18 2011 19:11 GMT
#4
Im sad that this is all the advice you can get
but alas i cannot help you either.
I'd be interested to know what parallel comes with this example.
BW pros training sc2 is like kiss making a dub step album.
n.DieJokes
Profile Blog Joined November 2008
United States3443 Posts
July 18 2011 19:42 GMT
#5
Who's the grad. statistician at Duke, OneOther or Empyrean? Which ever one it is, it probably wouldn't hurt to ask him
MyLove + Your Love= Supa Love
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-07-19 00:32:38
July 18 2011 22:23 GMT
#6
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.
Thank God and gunrun.
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 01:42 GMT
#7
On July 19 2011 07:23 Primadog wrote:
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

Show nested quote +
I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.

He doesn't actually want to do research on TL views, it was just an analogy for his physics research.
Milkis
Profile Blog Joined January 2010
5003 Posts
July 19 2011 01:54 GMT
#8
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 03:31 GMT
#9
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

I was thinking about tell him to compare it to a poisson process, but the problem if you do that is that you have to manually choose the endpoints on where the event's influence lies. If there are sensical time endpoints to use that's great, but if it's an event that has a indeterminate influence on the future then you want something more sophisticated I think.

Not knowing very much about time dependent data, I can't give any more than that.
Duka08
Profile Blog Joined July 2010
3391 Posts
Last Edited: 2011-07-19 03:48:20
July 19 2011 03:41 GMT
#10
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

What you're describing is essentially it yes. There is a "larger" background-subtracted / detrended event (your example's 100 people), and within this broader event there is a short burst of even higher frequency data (times) that we want to say is somehow quantifiably correlated in excess to the rest of the (essentially random) background-subtracted data.

We've been working with Poisson statistics from numerous approaches, especially over the past few days. Any quantitative assessment we arrive at with the Poissonian methods and distributions seem too situationally specific and even TOO low of a probability. Sure lower is better, that's what we'd like! In Physics you have to account for everything and second guess even yourself... we'd like a lower probability to be able to say that the bunching is more significant, but the numbers we were arriving at were TOO absurdly low to even find reasonable haha. Perhaps integrating over some Poissonian distribution or choosing a more general, flexible set of parameters instead of getting so specific. As I said, we've done some with Poisson stats and still are, but we're also looking for more options.

At this point I may just delve into the actuality of what we're doing / working with, but I was trying to avoid that as it's active research blah blah mumbo jumbo. If anyone (Milkis?) is genuinely interested in the specifics if they think it'll help with ideas feel free to PM.
Please log in or register to reply.
Live Events Refresh
Replay Cast
00:00
WardiTV Mondays #77
CranKy Ducklings121
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
ViBE208
RuFF_SC2 168
ROOTCatZ 82
StarCraft: Brood War
GuemChi 6030
Sea 3113
NaDa 20
Dota 2
NeuroSwarm111
League of Legends
JimRising 623
Counter-Strike
taco 671
Super Smash Bros
hungrybox478
Other Games
summit1g15505
C9.Mang0390
Maynarde146
Mew2King56
Organizations
Other Games
gamesdonequick1108
BasetradeTV117
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 15 non-featured ]
StarCraft 2
• EnkiAlexander 61
• davetesta10
• CranKy Ducklings SOOP2
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• LaughNgamezSOOP
• Migwel
• sooper7s
StarCraft: Brood War
• HerbMon 31
• BSLYoutube
• STPLYoutube
• ZZZeroYoutube
League of Legends
• Doublelift4738
Upcoming Events
The PondCast
7h 33m
CranKy Ducklings
21h 33m
WardiTV Team League
1d 8h
Replay Cast
1d 21h
CranKy Ducklings
2 days
WardiTV Team League
2 days
uThermal 2v2 Circuit
2 days
BSL
2 days
n0maD vs perroflaco
TerrOr vs ZZZero
MadiNho vs WolFix
DragOn vs LancerX
Sparkling Tuna Cup
3 days
WardiTV Team League
3 days
[ Show More ]
OSC
3 days
BSL
3 days
Sterling vs Azhi_Dahaki
Napoleon vs Mazur
Jimin vs Nesh
spx vs Strudel
Replay Cast
3 days
Replay Cast
4 days
Wardi Open
4 days
GSL
5 days
Replay Cast
6 days
Kung Fu Cup
6 days
Replay Cast
6 days
Liquipedia Results

Completed

CSL Elite League 2026
RSL Revival: Season 4
NationLESS Cup

Ongoing

BSL Season 22
ASL Season 21
CSL 2026 SPRING (S20)
StarCraft2 Community Team League 2026 Spring
Nations Cup 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026
ESL Pro League S23 Finals
ESL Pro League S23 Stage 1&2
PGL Cluj-Napoca 2026
IEM Kraków 2026
BLAST Bounty Winter 2026

Upcoming

Escore Tournament S2: W2
IPSL Spring 2026
Escore Tournament S2: W3
Acropolis #4
BSL 22 Non-Korean Championship
CSLAN 4
Kung Fu Cup 2026 Grand Finals
HSC XXIX
uThermal 2v2 2026 Main Event
RSL Revival: Season 5
WardiTV TLMC #16
IEM Cologne Major 2026
Stake Ranked Episode 2
CS Asia Championships 2026
Asian Champions League 2026
IEM Atlanta 2026
PGL Astana 2026
BLAST Rivals Spring 2026
CCT Season 3 Global Finals
IEM Rio 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.