• Log InLog In
  • Register
Liquid`
Team Liquid Liquipedia
EDT 16:00
CEST 22:00
KST 05:00
  • Home
  • Forum
  • Calendar
  • Streams
  • Liquipedia
  • Features
  • Store
  • EPT
  • TL+
  • StarCraft 2
  • Brood War
  • Smash
  • Heroes
  • Counter-Strike
  • Overwatch
  • Liquibet
  • Fantasy StarCraft
  • TLPD
  • StarCraft 2
  • Brood War
  • Blogs
Forum Sidebar
Events/Features
News
Featured News
[ASL20] Ro24 Preview Pt2: Take-Off6[ASL20] Ro24 Preview Pt1: Runway132v2 & SC: Evo Complete: Weekend Double Feature4Team Liquid Map Contest #21 - Presented by Monster Energy9uThermal's 2v2 Tour: $15,000 Main Event18
Community News
Weekly Cups (Aug 18-24): herO dethrones MaxPax5Maestros of The Game—$20k event w/ live finals in Paris30Weekly Cups (Aug 11-17): MaxPax triples again!13Weekly Cups (Aug 4-10): MaxPax wins a triple6SC2's Safe House 2 - October 18 & 195
StarCraft 2
General
Weekly Cups (Aug 18-24): herO dethrones MaxPax What mix of new and old maps do you want in the next 1v1 ladder pool? (SC2) : A Eulogy for the Six Pool Geoff 'iNcontroL' Robinson has passed away 2v2 & SC: Evo Complete: Weekend Double Feature
Tourneys
WardiTV Mondays Maestros of The Game—$20k event w/ live finals in Paris RSL: Revival, a new crowdfunded tournament series Sparkling Tuna Cup - Weekly Open Tournament Monday Nights Weeklies
Strategy
Custom Maps
External Content
Mutation # 488 What Goes Around Mutation # 487 Think Fast Mutation # 486 Watch the Skies Mutation # 485 Death from Below
Brood War
General
BW General Discussion Flash On His 2010 "God" Form, Mind Games, vs JD BGH Auto Balance -> http://bghmmr.eu/ [ASL20] Ro24 Preview Pt2: Take-Off No Rain in ASL20?
Tourneys
[ASL20] Ro24 Group E [Megathread] Daily Proleagues [ASL20] Ro24 Group D [ASL20] Ro24 Group B
Strategy
Simple Questions, Simple Answers Fighting Spirit mining rates [G] Mineral Boosting Muta micro map competition
Other Games
General Games
Stormgate/Frost Giant Megathread Nintendo Switch Thread General RTS Discussion Thread Dawn of War IV Path of Exile
Dota 2
Official 'what is Dota anymore' discussion
League of Legends
Heroes of the Storm
Simple Questions, Simple Answers Heroes of the Storm 2.0
Hearthstone
Heroes of StarCraft mini-set
TL Mafia
TL Mafia Community Thread Vanilla Mini Mafia
Community
General
Russo-Ukrainian War Thread US Politics Mega-thread Things Aren’t Peaceful in Palestine The year 2050 European Politico-economics QA Mega-thread
Fan Clubs
INnoVation Fan Club SKT1 Classic Fan Club!
Media & Entertainment
Anime Discussion Thread Movie Discussion! [Manga] One Piece [\m/] Heavy Metal Thread
Sports
2024 - 2026 Football Thread TeamLiquid Health and Fitness Initiative For 2023 Formula 1 Discussion
World Cup 2022
Tech Support
High temperatures on bridge(s) Gtx660 graphics card replacement Installation of Windows 10 suck at "just a moment"
TL Community
The Automated Ban List TeamLiquid Team Shirt On Sale
Blogs
Evil Gacha Games and the…
ffswowsucks
Breaking the Meta: Non-Stand…
TrAiDoS
INDEPENDIENTE LA CTM
XenOsky
[Girl blog} My fema…
artosisisthebest
Sharpening the Filtration…
frozenclaw
ASL S20 English Commentary…
namkraft
Customize Sidebar...

Website Feedback

Closed Threads



Active: 3280 users

Statisticians of TL! Some advice

Blogs > Duka08
Post a Reply
Duka08
Profile Blog Joined July 2010
3391 Posts
July 18 2011 17:49 GMT
#1
Through reading some of the college/class related threads in General occasionally I've noticed quite a good number of high-caliber mathematically-inclined posters on TL! So I've come to prompt some discussion that may lead my research group in a direction that we've been unable to find for a few weeks.

I'm currently doing undergraduate research (Physics) at my university and we (myself, my partner, and my advisor) have had a dilemma over the past few weeks on how to quantify something statistically. Rather than talk about the research in detail, in order to avoid subtleties and unnecessarily dense description, I'll use a fun (and hopefully understandable) example that is essentially counterpart to our data.

It's going to be a rough example, but bear with me. Let's imagine the traffic of Teamliquid. Each time someone visits TL, we'll tag that exact moment in time as "a visitor", and this is how we'll track these events. Over the course of each day or week (long time spans) we'd expect some pretty regular patterns (background) depending on the time of day for the most part, where a large majority of the same people check on a daily or weekly basis. Ignoring smaller subtle fluctuations (noise), there would be presumably some general trend on a daily/weekly/monthly scale we could see and account for as background.

Now, larger events such as showmatches or the final rounds of some tournaments (a much shorter time scale than the background) could cause an increase in flux of visitors, posting in LR threads and viewing the stream(s) and what not. Assuming we subtract the daily background, these events would show up in a graph of visitors over time, and we could attribute this "burst" of visitors to the larger event in question.

With these "bursts" in mind (with a source we can associate with good certainty) we come to my actual dilemma. Let's say there is a showmatch between Idra and Tyler. This would generate a burst of visitors previously discussed, in both posters and viewers. In Game 3, Idra 6 pools, and some TL notables tweet about it as it happens live and there is a spike of people that read it and immediately go to TL and tune in. So not only is there a general increase in visitor flux due to the showmatch as a whole, but this momentary SPIKE in arrival times (remember, we're tagging these events exactly as the time people join, not "how many people currently viewing") that are presumably associated with a specific event (in this case, the 6pool/tweet).

We want to quantify this bunching. The events in question are simply large amounts of arrival times, and we can histogram them to see general trends/flux over long times, or zoom in and look at each time individually to see the small scale structure. After subtracting background, let's say we see exactly 100 events semi-randomly distributed in some time interval. These events are above background and can be associated with a larger-scale event (in the example, the showmatch as a whole). Now, these 100 time-tagged events appear randomly distributed, but upon closer inspection 5 of them come EXTREMELY close together (the tweet-induced visitors). Visually, they are clearly bunched together and hopefully associated with some event, and the goal is to statistically quantify "how bunched they are" in comparison to the overall randomness of the 100 background-subtracted events. Basically, there's a bunch of stuff that should be random or looks random, but there's a clump that is unusually close, and we want to be able to somehow say "in this bunch of random stuff, these are so unusual that they aren't just random".

We've tried most of the basic, "common/acceptable" tests, such as chi-square and something we had high hopes for called a K-S Test. Most tests we've tried either don't capture what we're looking for, in that they don't properly "observe" the closeness in a way that works with small amounts of data (the numbers in the last paragraph are fabricated but basically on the same scale). The basic ratio that we're using "in-house" for our own measurements is a ratio of [events seen in a specific time window / events expected in the same window] where the time window is chosen to be close to the scale of the "bunching" and the expected rate is simply related to the [total events seen times the percentage of the total time our chosen window is]. The larger this ratio is the more significant the clumping in a chosen time window somewhere in the data. However this ratio is, to our knowledge, simply arbitrary. We need a way to actually quantify this in a way that makes sense statistically that others in the field will accept as significant.


Hopefully this doesn't fall into the category of homework help since... it's not?!? No answers to be found here really, just as many possible ways of approaching the problem. We're looking for options that we haven't tried or don't know well enough.

MisterD
Profile Blog Joined June 2010
Germany1338 Posts
July 18 2011 18:14 GMT
#2
so you are basically looking for a high frequency spike in a low frequency background noise? I don't know, but fourier transform came to my mind, but i'm not sure if thats applicable here. But maybe you can steal some techniques from signal analysis, they should have ways to deal with this stuff in continuous spaces, which should be transferable to your discrete space in some way possibly maybe.
Gold isn't everything in life... you need wood, too!
McFortran
Profile Joined October 2010
United States79 Posts
July 18 2011 18:22 GMT
#3
I'm not a statistician, but what you're describing is a stochastic process. Perhaps you should look up methods for modeling time series.
ComaDose
Profile Blog Joined December 2009
Canada10357 Posts
July 18 2011 19:11 GMT
#4
Im sad that this is all the advice you can get
but alas i cannot help you either.
I'd be interested to know what parallel comes with this example.
BW pros training sc2 is like kiss making a dub step album.
n.DieJokes
Profile Blog Joined November 2008
United States3443 Posts
July 18 2011 19:42 GMT
#5
Who's the grad. statistician at Duke, OneOther or Empyrean? Which ever one it is, it probably wouldn't hurt to ask him
MyLove + Your Love= Supa Love
Primadog
Profile Blog Joined April 2010
United States4411 Posts
Last Edited: 2011-07-19 00:32:38
July 18 2011 22:23 GMT
#6
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.
Thank God and gunrun.
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 01:42 GMT
#7
On July 19 2011 07:23 Primadog wrote:
I take pride in my blog "PrimeCoverage" for its analytical bend. See if you can find anything useful there~~

Show nested quote +
I study sc2 tournaments' data extensively because I intend bring to sports statistics into starcraft


Note that I am not a trained statistician (Electrical Engineering), just a hobbyist mathematician.



EDIT: Completely misread the OP, sorry.

Here's the non-self-advertising response. The big question I will have towards this is, how do you propose collecting this "visitor" and traffic data? Simply attempt to scrap the Active and Logged In numbers on the top left of TL?

While doing my own StarCraft analytics, I found that it suffers simultaneously from excess amount of data and insufficient amount of data. It's a brand new industry, so very little statistical analysis was ever done on it, even in BW. Unlike mature sports like baseball, advanced research like Sabermetrics simply does not exist, and no effort have ever been made to properly document data useful to curious amateur statisticians like yours truly (besides the TLPD). Plenty of data is out there - tournament results, matchups, stream numbers - but no one is collecting them.

For example, I once was curious about the NASL viewership, but those data is hard to come by because Justin.tv has horrid (non-existent) site analytic, and I end up using the daily livereport threads views and posts as a rough proxy of "viewership interest". It's imprecise and I don't think the dataset I have properly answered the question, but in the end, it's the best that StarCraft has right now.

Perhaps you can find a better solution, or perhaps somebody is working on resolving this problem in secret. For now, I see no substantial resources where proper statistics can be done, without building those datasets up yourself.




Regarding the non-StarCraft related issue, I think you will care less about the dataset itself but more about its first and perhaps second-derivatives.

He doesn't actually want to do research on TL views, it was just an analogy for his physics research.
Milkis
Profile Blog Joined January 2010
5003 Posts
July 19 2011 01:54 GMT
#8
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it
theonemephisto
Profile Blog Joined May 2008
United States409 Posts
July 19 2011 03:31 GMT
#9
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

I was thinking about tell him to compare it to a poisson process, but the problem if you do that is that you have to manually choose the endpoints on where the event's influence lies. If there are sensical time endpoints to use that's great, but if it's an event that has a indeterminate influence on the future then you want something more sophisticated I think.

Not knowing very much about time dependent data, I can't give any more than that.
Duka08
Profile Blog Joined July 2010
3391 Posts
Last Edited: 2011-07-19 03:48:20
July 19 2011 03:41 GMT
#10
On July 19 2011 10:54 Milkis wrote:
So, let me get the situation before I start thinking of a solution.

You are measuring x, and you see how x behaves without any sort of external shocks. So, you know how x moves in an hourly basis without any shocks.

Now, there is a shock, a "major event". Due to this, x starts behaving abnormally -- specifically, it spikes up (does the spiking up matter?). Within this event, there is an additional shock, caused by some element of "major event", and it affects x even more.

Your goal is to prove that these spikes aren't random and that these are something that is caused by the major event? That these shocks *are* abnormal and you want to see how they affect X? Are you looking for causality or just correlation?

Or is it more simple like this: After detrending, 100 people arrive every hour. Out of these, 5 of them are really close to each other, ie: they arrive at a much higher frequency than the other 95. You want to somehow show that these 5 follow a different distribution?

Try fitting the detrended model to a poisson distribution and i think you may be able to show that there is something that causes the mean of the poisson distribution to change at those spikes.

Not sure if this is what you're looking for since there are a lot of ways of studying this (more complicated ways including fitting in some time dependent data) but I think this may do the trick? not sure depending on how sophisicated you want to be with it

What you're describing is essentially it yes. There is a "larger" background-subtracted / detrended event (your example's 100 people), and within this broader event there is a short burst of even higher frequency data (times) that we want to say is somehow quantifiably correlated in excess to the rest of the (essentially random) background-subtracted data.

We've been working with Poisson statistics from numerous approaches, especially over the past few days. Any quantitative assessment we arrive at with the Poissonian methods and distributions seem too situationally specific and even TOO low of a probability. Sure lower is better, that's what we'd like! In Physics you have to account for everything and second guess even yourself... we'd like a lower probability to be able to say that the bunching is more significant, but the numbers we were arriving at were TOO absurdly low to even find reasonable haha. Perhaps integrating over some Poissonian distribution or choosing a more general, flexible set of parameters instead of getting so specific. As I said, we've done some with Poisson stats and still are, but we're also looking for more options.

At this point I may just delve into the actuality of what we're doing / working with, but I was trying to avoid that as it's active research blah blah mumbo jumbo. If anyone (Milkis?) is genuinely interested in the specifics if they think it'll help with ideas feel free to PM.
Please log in or register to reply.
Live Events Refresh
Next event in 4h
[ Submit Event ]
Live Streams
Refresh
StarCraft 2
mouzHeroMarine 645
IndyStarCraft 139
UpATreeSC 132
BRAT_OK 84
JuggernautJason60
ProTech45
MindelVK 36
Nathanias 8
CosmosSc2 5
StarCraft: Brood War
Calm 2969
Mini 520
Larva 317
Dewaltoss 117
TY 40
Zeus 39
NaDa 11
Dota 2
Pyrionflax183
capcasts159
Counter-Strike
fl0m1599
pashabiceps1123
Stewie2K583
flusha232
Heroes of the Storm
Liquid`Hasu382
Other Games
Grubby2699
FrodaN1615
B2W.Neo557
Sick306
KnowMe128
C9.Mang0118
ArmadaUGS97
QueenE66
ZombieGrub6
summit1g0
Organizations
StarCraft 2
angryscii 24
Blizzard YouTube
StarCraft: Brood War
BSLTrovo
sctven
[ Show 23 non-featured ]
StarCraft 2
• LUISG 11
• Reevou 5
• IndyKCrew
• sooper7s
• AfreecaTV YouTube
• intothetv
• Kozan
• Migwel
• LaughNgamezSOOP
StarCraft: Brood War
• 80smullet 10
• Pr0nogo 6
• Michael_bg 3
• FirePhoenix3
• iopq 2
• STPLYoutube
• ZZZeroYoutube
• BSLYoutube
Dota 2
• masondota22257
• WagamamaTV793
• Noizen39
League of Legends
• TFBlade1167
Counter-Strike
• imaqtpie1113
• Shiphtur238
Upcoming Events
PiGosaur Monday
4h
Afreeca Starleague
14h
hero vs Alone
Royal vs Barracks
Replay Cast
1d 4h
The PondCast
1d 14h
WardiTV Summer Champion…
1d 15h
Replay Cast
2 days
LiuLi Cup
2 days
MaxPax vs TriGGeR
ByuN vs herO
Cure vs Rogue
Classic vs HeRoMaRinE
Cosmonarchy
2 days
OyAji vs Sziky
Sziky vs WolFix
WolFix vs OyAji
BSL Team Wars
2 days
Team Hawk vs Team Dewalt
BSL Team Wars
2 days
Team Hawk vs Team Bonyth
[ Show More ]
SC Evo League
3 days
TaeJa vs Cure
Rogue vs threepoint
ByuN vs Creator
MaNa vs Classic
Maestros of the Game
3 days
ShoWTimE vs Cham
GuMiho vs Ryung
Zoun vs Spirit
Rogue vs MaNa
[BSL 2025] Weekly
3 days
SC Evo League
4 days
Maestros of the Game
4 days
SHIN vs Creator
Astrea vs Lambo
Bunny vs SKillous
HeRoMaRinE vs TriGGeR
BSL Team Wars
4 days
Team Bonyth vs Team Sziky
BSL Team Wars
4 days
Team Dewalt vs Team Sziky
Monday Night Weeklies
5 days
Replay Cast
6 days
Sparkling Tuna Cup
6 days
Liquipedia Results

Completed

CSLAN 3
uThermal 2v2 Main Event
HCC Europe

Ongoing

Copa Latinoamericana 4
BSL 20 Team Wars
KCM Race Survival 2025 Season 3
BSL 21 Qualifiers
ASL Season 20
CSL Season 18: Qualifier 1
Acropolis #4 - TS1
SEL Season 2 Championship
WardiTV Summer 2025
Esports World Cup 2025
BLAST Bounty Fall 2025
BLAST Bounty Fall Qual
IEM Cologne 2025
FISSURE Playground #1
BLAST.tv Austin Major 2025

Upcoming

CSL Season 18: Qualifier 2
CSL 2025 AUTUMN (S18)
LASL Season 20
BSL Season 21
BSL 21 Team A
Chzzk MurlocKing SC1 vs SC2 Cup #2
RSL Revival: Season 2
Maestros of the Game
EC S1
Sisters' Call Cup
IEM Chengdu 2025
PGL Masters Bucharest 2025
Thunderpick World Champ.
MESA Nomadic Masters Fall
CS Asia Championships 2025
Roobet Cup 2025
ESL Pro League S22
StarSeries Fall 2025
FISSURE Playground #2
BLAST Open Fall 2025
BLAST Open Fall Qual
TLPD

1. ByuN
2. TY
3. Dark
4. Solar
5. Stats
6. Nerchio
7. sOs
8. soO
9. INnoVation
10. Elazer
1. Rain
2. Flash
3. EffOrt
4. Last
5. Bisu
6. Soulkey
7. Mini
8. Sharp
Sidebar Settings...

Advertising | Privacy Policy | Terms Of Use | Contact Us

Original banner artwork: Jim Warren
The contents of this webpage are copyright © 2025 TLnet. All Rights Reserved.