Math: Odds of Blogs Sidebar Being Diagonal

micronesia

United States24670 Posts

August 29 2011 04:28 GMT

Every once in a while I notice the blogs section in the sidebar has an interesting look to it: the usernames of the five most recent posters are arranged in ascending or descending length order. I was just wondering what the odds are of that happening so... here goes!

In the first slot we need either the shortest or longest name (let's assume no two names are the same length). This means we have 2 out of 5 odds. The next slot has to be the next longest (or shortest) name, so the odds are 1 in 4. The next slot has odds of 1 in 3, then the next 1 in 2. The last doesn't matter.

So the odds are:

+ Show Spoiler +

2/5 * 1/4 * 1/3 * 1/2 = 2/120 = 1/60.

Not too difficult, but seems like a reasonable answer given the fact that I notice this phenomenon from time to time, but not often.

Another interesting thing that came up... when I first did this a minute ago I typed it into MS calculator and got an answer of 0.01666...

I recognized that that was probably a fraction but couldn't figure out how to convert it offhand without using a calculator's answer->fraction function. According to someone I just spoke to:

In general, if a decimal eventually repeats with period n, you can multiply by 10^n-1 to get a terminating one.

So 9*0.01666... = 0.15 = 15/100. So 0.0166666 = 15/900 = 1/60.

An interesting trick/rule I had no idea about.

Ingenol

United States1328 Posts

August 29 2011 04:32 GMT

Hmmmm that's a pretty neat trick.

infinitestory

United States4053 Posts

August 29 2011 04:32 GMT

Alternatively, you could replace the usernames by their ordinal length (i.e. numbers 1-5, based on length). Then, the only arrangements that will look "diagonal" are 12345 and 54321, so 2/(5!) = 1/60.

micronesia

United States24670 Posts

August 29 2011 04:34 GMT

Yeah that's the shortcut infinitestory :p

ty for pointing it out

JeeJee

Canada5652 Posts

August 29 2011 04:53 GMT

re: the whole repeating decimal thing
another way to look at it:

how to create any repeating decimal you want:
let's say you're pro and you wanna repeat 31337
so 0.313373133731337....

x=0.3133731337...
100000x = 31337.31337....
subtract the 2
99999x = 31337

so the fraction that makes the original x is 31337/99999 (which is sadly irreducible. sometimes you get cool fractions though).

same for the 0.0166...

x=0.0166..
100x=1.66...
1000x=16.66....
900x = 15

it's pretty cute

nitdkim

1264 Posts

August 29 2011 04:54 GMT

Short thread title attract eyes of mods. Those threads are usually instalocked.

Ingenol

United States1328 Posts

August 29 2011 04:59 GMT

On August 29 2011 13:53 JeeJee wrote:
re: the whole repeating decimal thing
another way to look at it:

how to create any repeating decimal you want:
let's say you're pro and you wanna repeat 31337
so 0.313373133731337....

x=0.3133731337...
100000x = 31337.31337....
subtract the 2
99999x = 31337

so the fraction that makes the original x is 31337/99999 (which is sadly irreducible. sometimes you get cool fractions though).

same for the 0.0166...

x=0.0166..
100x=1.66...
1000x=16.66....
900x = 15

it's pretty cute

Ah yes, that reminds me of learning 0.99999999...=1. That blew my mind in 9th grade.

Clerseri

Australia150 Posts

August 29 2011 05:07 GMT

If you wanted to be picky, there's a chance that names are of equal length. Which means one of three thigns -

either equal names are always unacceptable, in which case diagonal becomes significantly less likely.

Or they are always acceptable, in which case it becomes much MORE likely, because you now have two chances to hit the right name at that point.

Or they are sometimes acceptable (two names of equal length still makes it look diagonal, but 5 names of equal length don't) in which case you're on your own

micronesia

United States24670 Posts

August 29 2011 05:13 GMT

On August 29 2011 14:07 Clerseri wrote:
If you wanted to be picky, there's a chance that names are of equal length. Which means one of three thigns -

either equal names are always unacceptable, in which case diagonal becomes significantly less likely.

Or they are always acceptable, in which case it becomes much MORE likely, because you now have two chances to hit the right name at that point.

Or they are sometimes acceptable (two names of equal length still makes it look diagonal, but 5 names of equal length don't) in which case you're on your own

Yeah. The cases I can think of I think there were none of equal length amazingly!

Yurie

11814 Posts

August 29 2011 05:17 GMT

#10

I thought this thread would have used the database over blog posters to get length of posters nicks and the frequency of them posting to calculate it. This is nice as well though.

Primadog

United States4411 Posts

August 29 2011 05:47 GMT

#11

On August 29 2011 14:17 Yurie wrote:
I thought this thread would have used the database over blog posters to get length of posters nicks and the frequency of them posting to calculate it. This is nice as well though.

This is an interesting question. I wonder how hard is it to pull TL username length distribution from the database?

If we have the probability curve (actually a histogram), we can use an analysis similar to above, multiply by the probability that none of the five recent blog names have the same length, to get an even more precise answer.

And if we want to take it one step further, we can dig the database for the probability curve for TL bloggers only, skimmed using the full set of blog and blog counts.

micronesia

United States24670 Posts

August 29 2011 05:55 GMT

#12

On August 29 2011 14:47 Primadog wrote:

Show nested quote +

There's 10 pages of bloggers (slightly less than 800 total bloggers); putting the data into a spreadsheet is actually pretty easy. The tricky part is accounting for active vs inactive bloggers :p

Primadog

United States4411 Posts

August 29 2011 06:00 GMT

#13

Oh ya, we can use the blogger ladder to estimate that data set, good thinking.

Here's what I got so far:

Let's suppose
Pn = the probability the next blogger post's username is length n
ie P1 + P2 + P3.... = 1 (anyone know what's the length limit for TL usernames?)

For an arbitrary set of 5 recent blog posts, let the username lengths be:
a, b, c, d, e

then the probability that none of the usernames have equal length for this set are
1(1-Pa)(1-Pa-Pb)(1-Pa-Pb-Pc)(1-Pa-Pb-Pc-Pd)

then we apply the magic of combinatrix in this step, but this where I got stuck.

slmw

Finland233 Posts

August 29 2011 08:37 GMT

#14

I had a little spare time at work, so here it goes.

First, let's parse the poster names from blogs written in August:
Perl:
+ Show Spoiler +

#!/usr/bin/perl -w
require LWP::UserAgent;
my $net = LWP::UserAgent->new;
$net->agent("Mozilla");
$url="http://www.teamliquid.net/forum/index.php?viewdays=30&show_part=18&currentpage=";
for($i=34;$i>=1;$i--) {
$_=$net->get("$url"."$i");
$_=$_->content;
while(s/href="\/blogs[^>]+>([^<]+)<[^\n]+\n<td [^>]+>([^<]+)<//) {
$lengths[length($2)]++;
}
}
for($i=1;$i<20;$i++) {
print $i.": ".$lengths[$i]."\n";
}

The program outputs the following results:
+ Show Spoiler +

1: 0
2: 13
3: 28
4: 68
5: 103
6: 188
7: 131
8: 179
9: 115
10: 121
11: 77
12: 64
13: 46
14: 28
15: 25
16: 4
17: 0
18: 0
19: 0

By using these results, we can count the number of ways of getting a strictly monotonically increasing set of 5 posters. I used a simple dynamic programming algorithm.
C++:
+ Show Spoiler +

#include<iostream>
#include "stdint.h"
using namespace std;
int64_t counts[] = {13,28,68,103,188,131,179,115,121,77,64,46,28,25,4};
int64_t ways[5][15];

int64_t num_of_ways(int pos, int num) {
if(pos==0) return counts[num];
if(ways[pos][num]>=0) return ways[pos][num];
int64_t sum=0;
for(int i=0;i<num;i++) sum+=num_of_ways(pos-1,i);
sum*=counts[num];
ways[pos][num] = sum;
return sum;
}

int main() {
for(int i=0;i<5;i++) for(int j=0;j<15;j++) {
ways[i][j]=-1;
}
int64_t ans=0;
for(int i=0;i<15;i++) {
ans+=num_of_ways(4,i);
}
cout<<ans<<endl;

}

The output is 6385086626728.

We can select the set of 5 posters in 2366359177588560 different ways [sum*(sum-1)*..(sum-4)], but only 2*6385086626728 of them are strictly monotonic.

2*6385086626728/2366359177588560 = 0.00539654899999984538

The probability of a diagonal list is then 0.54%.

+ Show Spoiler +

I have no idea if this is correct, but I just wasted 20 minutes of my life doing this.

Edit: Fixed an error in the number of all sets. The results appears to be approximately the same as in a monte carlo simulation, so this is probably correct. Hurray.

Primadog

United States4411 Posts

August 29 2011 09:27 GMT

#15

Programmers ruin the fun and games with probability

The answer is lower than the probability by micronesia, so it passes the smell test and feels in the ball park.

edit: great work, btw.

editedit: who is the 2 letter name guy? 3 letter name guy? Can't think of a regular blogger with such a short name.

slmw

Finland233 Posts

August 29 2011 09:49 GMT

#16

User NB made several blog posts in August.
The other bloggers of length 2-3 are:
+ Show Spoiler +

BG1
ckw
cz
DNB
DnX
giX
GTR
hnQ
JFO
Jh
JWD
LML
MiB
Noe
PH
qxc
Qzy
rei
Ryo
VIB
W2
Yew

See.Blue

United States2673 Posts

August 29 2011 15:07 GMT

#17

Was about to post what infinitestory said. Cool blog!

Khenra

Netherlands885 Posts

August 29 2011 15:47 GMT

#18

EDIT: Nevermind, I misinterpreted. I thought you wanted every next username to be longer/shorter (> or <), while you meant same size or longer/shorter (>= or <=).

---

You missed one thing: the probability of two names being the same length.

So the odds are smaller than what you suggested, since you have to take out all the cases in which two names are the same length. However, there is no way to calculate these odds exactly.

XXGeneration

United States625 Posts

August 29 2011 15:49 GMT

#19

We also have to realize that more characters does not necessarily represent a longer name.

Example from the above list:

W2
rei

Zona

40426 Posts

September 01 2011 04:08 GMT

#20

On August 29 2011 14:55 micronesia wrote:

Show nested quote +

There's 10 pages of bloggers (slightly less than 800 total bloggers); putting the data into a spreadsheet is actually pretty easy. The tricky part is accounting for active vs inactive bloggers :p

There's probably way more than 800 bloggers - your blog drops off the list if you haven't posted a new blog in awhile. The inactive bloggers are already culled from the list.

Please or register to reply.

Math: Odds of Blogs Sidebar Being Diagonal

Completed

Ongoing

Upcoming