|
United States24513 Posts
|
Hmmmm that's a pretty neat trick.
|
United States4053 Posts
Alternatively, you could replace the usernames by their ordinal length (i.e. numbers 1-5, based on length). Then, the only arrangements that will look "diagonal" are 12345 and 54321, so 2/(5!) = 1/60.
|
United States24513 Posts
Yeah that's the shortcut infinitestory :p
ty for pointing it out
|
re: the whole repeating decimal thing another way to look at it:
how to create any repeating decimal you want: let's say you're pro and you wanna repeat 31337 so 0.313373133731337....
x=0.3133731337... 100000x = 31337.31337.... subtract the 2 99999x = 31337
so the fraction that makes the original x is 31337/99999 (which is sadly irreducible. sometimes you get cool fractions though).
same for the 0.0166...
x=0.0166.. 100x=1.66... 1000x=16.66.... 900x = 15
it's pretty cute
|
Short thread title attract eyes of mods. Those threads are usually instalocked.
|
On August 29 2011 13:53 JeeJee wrote: re: the whole repeating decimal thing another way to look at it:
how to create any repeating decimal you want: let's say you're pro and you wanna repeat 31337 so 0.313373133731337....
x=0.3133731337... 100000x = 31337.31337.... subtract the 2 99999x = 31337
so the fraction that makes the original x is 31337/99999 (which is sadly irreducible. sometimes you get cool fractions though).
same for the 0.0166...
x=0.0166.. 100x=1.66... 1000x=16.66.... 900x = 15
it's pretty cute Ah yes, that reminds me of learning 0.99999999...=1. That blew my mind in 9th grade.
|
If you wanted to be picky, there's a chance that names are of equal length. Which means one of three thigns -
either equal names are always unacceptable, in which case diagonal becomes significantly less likely.
Or they are always acceptable, in which case it becomes much MORE likely, because you now have two chances to hit the right name at that point.
Or they are sometimes acceptable (two names of equal length still makes it look diagonal, but 5 names of equal length don't) in which case you're on your own
|
United States24513 Posts
On August 29 2011 14:07 Clerseri wrote:If you wanted to be picky, there's a chance that names are of equal length. Which means one of three thigns - either equal names are always unacceptable, in which case diagonal becomes significantly less likely. Or they are always acceptable, in which case it becomes much MORE likely, because you now have two chances to hit the right name at that point. Or they are sometimes acceptable (two names of equal length still makes it look diagonal, but 5 names of equal length don't) in which case you're on your own Yeah. The cases I can think of I think there were none of equal length amazingly!
|
I thought this thread would have used the database over blog posters to get length of posters nicks and the frequency of them posting to calculate it. This is nice as well though.
|
On August 29 2011 14:17 Yurie wrote:I thought this thread would have used the database over blog posters to get length of posters nicks and the frequency of them posting to calculate it. This is nice as well though.
This is an interesting question. I wonder how hard is it to pull TL username length distribution from the database?
If we have the probability curve (actually a histogram), we can use an analysis similar to above, multiply by the probability that none of the five recent blog names have the same length, to get an even more precise answer.
And if we want to take it one step further, we can dig the database for the probability curve for TL bloggers only, skimmed using the full set of blog and blog counts.
|
United States24513 Posts
On August 29 2011 14:47 Primadog wrote:Show nested quote +On August 29 2011 14:17 Yurie wrote:I thought this thread would have used the database over blog posters to get length of posters nicks and the frequency of them posting to calculate it. This is nice as well though. This is an interesting question. I wonder how hard is it to pull TL username length distribution from the database? If we have the probability curve (actually a histogram), we can use an analysis similar to above, multiply by the probability that none of the five recent blog names have the same length, to get an even more precise answer. And if we want to take it one step further, we can dig the database for the probability curve for TL bloggers only, skimmed using the full set of blog and blog counts. There's 10 pages of bloggers (slightly less than 800 total bloggers); putting the data into a spreadsheet is actually pretty easy. The tricky part is accounting for active vs inactive bloggers :p
|
Oh ya, we can use the blogger ladder to estimate that data set, good thinking.
Here's what I got so far:
Let's suppose Pn = the probability the next blogger post's username is length n ie P1 + P2 + P3.... = 1 (anyone know what's the length limit for TL usernames?)
For an arbitrary set of 5 recent blog posts, let the username lengths be: a, b, c, d, e
then the probability that none of the usernames have equal length for this set are 1(1-Pa)(1-Pa-Pb)(1-Pa-Pb-Pc)(1-Pa-Pb-Pc-Pd)
then we apply the magic of combinatrix in this step, but this where I got stuck.
|
I had a little spare time at work, so here it goes.
First, let's parse the poster names from blogs written in August: Perl: + Show Spoiler + #!/usr/bin/perl -w require LWP::UserAgent; my $net = LWP::UserAgent->new; $net->agent("Mozilla"); $url="http://www.teamliquid.net/forum/index.php?viewdays=30&show_part=18¤tpage="; for($i=34;$i>=1;$i--) { $_=$net->get("$url"."$i"); $_=$_->content; while(s/href="\/blogs[^>]+>([^<]+)<[^\n]+\n<td [^>]+>([^<]+)<//) { $lengths[length($2)]++; } } for($i=1;$i<20;$i++) { print $i.": ".$lengths[$i]."\n"; }
The program outputs the following results: + Show Spoiler + 1: 0 2: 13 3: 28 4: 68 5: 103 6: 188 7: 131 8: 179 9: 115 10: 121 11: 77 12: 64 13: 46 14: 28 15: 25 16: 4 17: 0 18: 0 19: 0
By using these results, we can count the number of ways of getting a strictly monotonically increasing set of 5 posters. I used a simple dynamic programming algorithm. C++: + Show Spoiler + #include<iostream> #include "stdint.h" using namespace std; int64_t counts[] = {13,28,68,103,188,131,179,115,121,77,64,46,28,25,4}; int64_t ways[5][15];
int64_t num_of_ways(int pos, int num) { if(pos==0) return counts[num]; if(ways[pos][num]>=0) return ways[pos][num]; int64_t sum=0; for(int i=0;i<num;i++) sum+=num_of_ways(pos-1,i); sum*=counts[num]; ways[pos][num] = sum; return sum; }
int main() { for(int i=0;i<5;i++) for(int j=0;j<15;j++) { ways[i][j]=-1; } int64_t ans=0; for(int i=0;i<15;i++) { ans+=num_of_ways(4,i); } cout<<ans<<endl;
}
The output is 6385086626728.
We can select the set of 5 posters in 2366359177588560 different ways [sum*(sum-1)*..(sum-4)], but only 2*6385086626728 of them are strictly monotonic.
2*6385086626728/2366359177588560 = 0.00539654899999984538
The probability of a diagonal list is then 0.54%.
+ Show Spoiler +I have no idea if this is correct, but I just wasted 20 minutes of my life doing this.
Edit: Fixed an error in the number of all sets. The results appears to be approximately the same as in a monte carlo simulation, so this is probably correct. Hurray.
|
Programmers ruin the fun and games with probability The answer is lower than the probability by micronesia, so it passes the smell test and feels in the ball park.
edit: great work, btw.
editedit: who is the 2 letter name guy? 3 letter name guy? Can't think of a regular blogger with such a short name.
|
User NB made several blog posts in August. The other bloggers of length 2-3 are: + Show Spoiler + BG1 ckw cz DNB DnX giX GTR hnQ JFO Jh JWD LML MiB Noe PH qxc Qzy rei Ryo VIB W2 Yew
|
Was about to post what infinitestory said. Cool blog!
|
EDIT: Nevermind, I misinterpreted. I thought you wanted every next username to be longer/shorter (> or <), while you meant same size or longer/shorter (>= or <=).
---
You missed one thing: the probability of two names being the same length.
So the odds are smaller than what you suggested, since you have to take out all the cases in which two names are the same length. However, there is no way to calculate these odds exactly.
|
We also have to realize that more characters does not necessarily represent a longer name.
Example from the above list:
W2 rei
|
On August 29 2011 14:55 micronesia wrote:Show nested quote +On August 29 2011 14:47 Primadog wrote:On August 29 2011 14:17 Yurie wrote:I thought this thread would have used the database over blog posters to get length of posters nicks and the frequency of them posting to calculate it. This is nice as well though. This is an interesting question. I wonder how hard is it to pull TL username length distribution from the database? If we have the probability curve (actually a histogram), we can use an analysis similar to above, multiply by the probability that none of the five recent blog names have the same length, to get an even more precise answer. And if we want to take it one step further, we can dig the database for the probability curve for TL bloggers only, skimmed using the full set of blog and blog counts. There's 10 pages of bloggers (slightly less than 800 total bloggers); putting the data into a spreadsheet is actually pretty easy. The tricky part is accounting for active vs inactive bloggers :p There's probably way more than 800 bloggers - your blog drops off the list if you haven't posted a new blog in awhile. The inactive bloggers are already culled from the list.
|
|
|
|