• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 17:02
CEST 23:02
KST 06:02
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
TL.net Map Contest #22 - Voting & Ladder Map Selection5Code S Season 2 (2026) - RO8 Preview5[ASL21] Finals Preview: Two Legacies21Code S Season 2 (2026) - RO12 Preview2herO wins GSL Code S Season 1 (2026)7
Community News
[BSL22] Non-Korean Championship from 13 to 28 June0Weekly Cups (May 25-31): Clem doubles, 2v2 circuit heads toward finale0StarCraft II 5.0.16 PTR Patch Notes may 26th151Weekly Cups (May 18-24): MaxPax wins doubles0Crank Gathers Season 4: BW vs SC2 Team League6
StarCraft 2
General
Oliveira Would Have Returned If EWC Continued TL.net Map Contest #22 - Voting & Ladder Map Selection My starcraft 2 changes SCFusion - WoL, HotS & LotV Build Order Optimizer TL Poll: How do you feel about the 5.0.16 PTR balance changes?
Tourneys
Maestros of The Game 2 announcement and schedule ! Crank Gathers Season 4: BW vs SC2 Team League GSL Code S Season 2 (2026) Sparkling Tuna Cup - Weekly Open Tournament RSL Revival: Season 5 - Qualifiers and Main Event
Strategy
[G] Having the right mentality to improve
Custom Maps
[D]RTS in all its shapes and glory <3
External Content
The PondCast: SC2 News & Results Mutation # 528 Infection Detected Welcome to the External Content forum Mutation # 527 Hell Train
Brood War
General
The Korean Terminology Thread BW animated web series: seeking contributors FlaShFTW vs A.Alm Grudge Match Event 14k games analyzed: Cross Spawn Nexus first good? Data needed
Tourneys
[ASL21] Grand Finals [BSL22] Grand Finals - Sunday 21:00 CEST [Megathread] Daily Proleagues Escore Tournament StarCraft Season 2
Strategy
Any training maps people recommend? Why doesn't anyone use restoration? Muta micro map competition [G] Hydra ZvZ: An Introduction
Other Games
General Games
Path of Exile Nintendo Switch Thread Stormgate/Frost Giant Megathread Warcraft III: The Frozen Throne ZeroSpace Megathread
Dota 2
Looking for a Dota Mentor Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Deck construction bug Heroes of StarCraft mini-set
TL Mafia
Vanilla Mini Mafia Mafia Game Mode Feedback/Ideas TL Mafia Community Thread Five o'clock TL Mafia
Community
General
US Politics Mega-thread Dating: How's your luck? Trading/Investing Thread Russo-Ukrainian War Thread How cold is too cold to be outdoors?
Fan Clubs
The herO Fan Club!
Media & Entertainment
Movie Discussion! [Manga] One Piece
Sports
2024 - 2026 Football Thread McBoner: A hockey love story TeamLiquid Health and Fitness Initiative For 2023 Formula 1 Discussion
World Cup 2022
Tech Support
Computer Build, Upgrade & Buying Resource Thread Facing Challenges in Mobile App Development
TL Community
The Automated Ban List
Blogs
I'm an arrogant trash talke…
FlaShFTW
Gauntlet SC2: A Retrospectiv…
Ctone23
Esportsmanship: How to NOT B…
TrAiDoS
Why RTS gamers make better f…
gosubay
ASL S21 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 5840 users

Statisticians of TL! Some advice

Blogs > Duka08
Post a Reply
Duka08
Profile Blog Joined July 2010
3391 Posts
July 18 2011 17:49 GMT
#1
Through reading some of the college/class related threads in General occasionally I've noticed quite a good number of high-caliber mathematically-inclined posters on TL! So I've come to prompt some discussion that may lead my research group in a direction that we've been unable to find for a few weeks.

I'm currently doing undergraduate research (Physics) at my university and we (myself, my partner, and my advisor) have had a dilemma over the past few weeks on how to quantify something statistically. Rather than talk about the research in detail, in order to avoid subtleties and unnecessarily dense description, I'll use a fun (and hopefully understandable) example that is essentially counterpart to our data.

It's going to be a rough example, but bear with me. Let's imagine the traffic of Teamliquid. Each time someone visits TL, we'll tag that exact moment in time as "a visitor", and this is how we'll track these events. Over the course of each day or week (long time spans) we'd expect some pretty regular patterns (background) depending on the time of day for the most part, where a large majority of the same people check on a daily or weekly basis. Ignoring smaller subtle fluctuations (noise), there would be presumably some general trend on a daily/weekly/monthly scale we could see and account for as background.

Now, larger events such as showmatches or the final rounds of some tournaments (a much shorter time scale than the background) could cause an increase in flux of visitors, posting in LR threads and viewing the stream(s) and what not. Assuming we subtract the daily background, these events would show up in a graph of visitors over time, and we could attribute this "burst" of visitors to the larger event in question.

With these "bursts" in mind (with a source we can associate with good certainty) we come to my actual dilemma. Let's say there is a showmatch between Idra and Tyler. This would generate a burst of visitors previously discussed, in both posters and viewers. In Game 3, Idra 6 pools, and some TL notables tweet about it as it happens live and there is a spike of people that read it and immediately go to TL and tune in. So not only is there a general increase in visitor flux due to the showmatch as a whole, but this momentary SPIKE in arrival times (remember, we're tagging these events exactly as the time people join, not "how many people currently viewing") that are presumably associated with a specific event (in this case, the 6pool/tweet).

We want to quantify this bunching. The events in question are simply large amounts of arrival times, and we can histogram them to see general trends/flux over long times, or zoom in and look at each time individually to see the small scale structure. After subtracting background, let's say we see exactly 100 events semi-randomly distributed in some time interval. These events are above background and can be associated with a larger-scale event (in the example, the showmatch as a whole). Now, these 100 time-tagged events appear randomly distributed, but upon closer inspection 5 of them come EXTREMELY close together (the tweet-induced visitors). Visually, they are clearly bunched together and hopefully associated with some event, and the goal is to statistically quantify "how bunched they are" in comparison to the overall randomness of the 100 background-subtracted events. Basically, there's a bunch of stuff that should be random or looks random, but there's a clump that is unusually close, and we want to be able to somehow say "in this bunch of random stuff, these are so unusual that they aren't just random".

We've tried most of the basic, "common/acceptable" tests, such as chi-square and something we had high hopes for called a K-S Test. Most tests we've tried either don't capture what we're looking for, in that they don't properly "observe" the closeness in a way that works with small amounts of data (the numbers in the last paragraph are fabricated but basically on the same scale). The basic ratio that we're using "in-house" for our own measurements is a ratio of [events seen in a specific time window / events expected in the same window] where the time window is chosen to be close to the scale of the "bunching" and the expected rate is simply related to the [total events seen times the percentage of the total time our chosen window is]. The larger this ratio is the more significant the clumping in a chosen time window somewhere in the data. However this ratio is, to our knowledge, simply arbitrary. We need a way to actually quantify this in a way that makes sense statistically that others in the field will accept as significant.


Hopefully this doesn't fall into the category of homework help since... it's not?!? No answers to be found here really, just as many possible ways of approaching the problem. We're looking for options that we haven't tried or don't know well enough.

MisterD
Profile Blog Joined June 2010
Germany1338 Posts
July 18 2011 18:14 GMT
#2
so you are basically looking for a high frequency spike in a low frequency background noise? I don't know, but fourier transform came to my mind, but i'm not sure if thats applicable here. But maybe you can steal some techniques from signal analysis, they should have ways to deal with this stuff in continuous spaces, which should be transferable to your discrete space in some way possibly maybe.
Gold isn't everything in life... you need wood, too!
McFortran
Profile Joined October 2010
United States79 Posts
July 18 2011 18:22 GMT
#3
I'm not a statistician, but what you're describing is a stochastic process. Perhaps you should look up methods for modeling time series.
ComaDose
Profile Blog Joined December 2009
Canada10357 Posts
July 18 2011 19:11 GMT
#4
Im sad that this is all the advice you can get
but alas i cannot help you either.
I'd be interested to know what parallel comes with this example.
BW pros training sc2 is like kiss making a dub step album.
n.DieJokes
Profile Blog Joined November 2008
United States3443 Posts
July 18 2011 19:42 GMT
#5
Who's the grad. statistician at Duke, OneOther or Empyrean? Which ever one it is, it probably wouldn't hurt to ask him
MyLove + Your Love= Supa Love
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-07-19 00:32:38
July 18 2011 22:23 GMT
#6
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.
Thank God and gunrun.
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 01:42 GMT
#7
On July 19 2011 07:23 Primadog wrote:
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

Show nested quote +
I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.

He doesn't actually want to do research on TL views, it was just an analogy for his physics research.
Milkis
Profile Blog Joined January 2010
5003 Posts
July 19 2011 01:54 GMT
#8
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 03:31 GMT
#9
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

I was thinking about tell him to compare it to a poisson process, but the problem if you do that is that you have to manually choose the endpoints on where the event's influence lies. If there are sensical time endpoints to use that's great, but if it's an event that has a indeterminate influence on the future then you want something more sophisticated I think.

Not knowing very much about time dependent data, I can't give any more than that.
Duka08
Profile Blog Joined July 2010
3391 Posts
Last Edited: 2011-07-19 03:48:20
July 19 2011 03:41 GMT
#10
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

What you're describing is essentially it yes. There is a "larger" background-subtracted / detrended event (your example's 100 people), and within this broader event there is a short burst of even higher frequency data (times) that we want to say is somehow quantifiably correlated in excess to the rest of the (essentially random) background-subtracted data.

We've been working with Poisson statistics from numerous approaches, especially over the past few days. Any quantitative assessment we arrive at with the Poissonian methods and distributions seem too situationally specific and even TOO low of a probability. Sure lower is better, that's what we'd like! In Physics you have to account for everything and second guess even yourself... we'd like a lower probability to be able to say that the bunching is more significant, but the numbers we were arriving at were TOO absurdly low to even find reasonable haha. Perhaps integrating over some Poissonian distribution or choosing a more general, flexible set of parameters instead of getting so specific. As I said, we've done some with Poisson stats and still are, but we're also looking for more options.

At this point I may just delve into the actuality of what we're doing / working with, but I was trying to avoid that as it's active research blah blah mumbo jumbo. If anyone (Milkis?) is genuinely interested in the specifics if they think it'll help with ideas feel free to PM.
Please log in or register to reply.
Live Events Refresh
OSC
21:00
Mid Season Playoffs
MaxPax vs YoungYakov
Krystianer vs Shameless
GgMaChine vs Creature
LetaleX vs MiniZergUA
ReBellioN vs TBD
ArT vs HiGhDrA
Nicoract vs Azura
davetesta0
Liquipedia
uThermal 2v2 Circuit
15:00
Season Finals: Group Stage 1
LiquipediaDiscussion
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
Railgan 125
StarCraft: Brood War
Britney 14157
Calm 5391
Sexy 48
Rock 26
NaDa 11
League of Legends
Doublelift3693
JimRising 251
Super Smash Bros
Mew2King95
Heroes of the Storm
Khaldor240
Other Games
summit1g6770
Grubby4225
fl0m3013
FrodaN2837
Mlord839
B2W.Neo558
Pyrionflax173
KnowMe140
OptimusSC212
Organizations
Other Games
gamesdonequick990
EGCTV952
BasetradeTV159
angryscii28
StarCraft 2
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
[ Show 19 non-featured ]
StarCraft 2
• printf 50
• mYiSmile119
• Reevou 9
• Response 8
• LaughNgamezSOOP
• AfreecaTV YouTube
• intothetv
• Kozan
• IndyKCrew
• sooper7s
• Migwel
StarCraft: Brood War
• blackmanpl 55
• FirePhoenix5
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• lizZardDota2118
Other Games
• Shiphtur190
• tFFMrPink 10
Upcoming Events
GSL
10h 58m
herO vs Rogue
Maru vs Cure
Patches Events
15h 58m
uThermal 2v2 Circuit
17h 58m
BSL
21h 58m
Bonyth vs Dewalt
OSC
1d 2h
Monday Night Weeklies
1d 18h
Replay Cast
2 days
Sparkling Tuna Cup
2 days
Replay Cast
3 days
Kung Fu Cup
3 days
[ Show More ]
Maestros of the Game
3 days
Classic vs Lambo
Clem vs Maru
Replay Cast
4 days
The PondCast
4 days
Maestros of the Game
4 days
Serral vs Rogue
herO vs SHIN
Replay Cast
5 days
Maestros of the Game
5 days
Replay Cast
6 days
CranKy Ducklings
6 days
uThermal 2v2 Circuit
6 days
Liquipedia Results

Completed

KK 2v2 League Season 1
RSL Revival: Season 5
Heroes Pulsing #1

Ongoing

BSL Season 22
IPSL Spring 2026
KCM Race Survival 2026 Season 2
Acropolis #4
CSCL: Masked Kings S4
YSL S3
SCTL 2026 Spring
WardiTV Spring 2026
Maestros of the Game 2
uThermal 2v2 2026 Main Event
2026 GSL S2
Murky Cup 2026
IEM Cologne Major 2026
Stake Ranked Episode 2
CS Asia Championships 2026
Asian Champions League 2026
IEM Atlanta 2026
PGL Astana 2026
BLAST Rivals Spring 2026
IEM Rio 2026
PGL Bucharest 2026
Stake Ranked Episode 1
BLAST Open Spring 2026

Upcoming

BSL 22 Non-Korean Championship
CSLAN 4
Blizzard Classic Cup 2026
Kung Fu Cup 2026 Grand Finals
CranK Gathers Season 4: BW vs SC2 Team League
HSC XXIX
Heroes Pulsing #3
Heroes Pulsing #2
Esports World Cup 2026
BLAST Bounty Summer 2026
BLAST Bounty Summer Qual
Stake Ranked Episode 3
XSE Pro League 2026
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2026 TLnet. All Rights Reserved.