Ask and answer stupid questions here! - Page 136

The_Templar

your Country52797 Posts

August 28 2014 15:18 GMT

#2701

There was an argument in my information and coding class today about two binomial strings, where I was the only person who thought my point was valid at all.
1010101001 0001110101 0110100110 1001010100 1001001101
1000111010 0111101101 1110111111 1011001111 1100010110

Which of these is randomly generated, and which of these was created by a human?

Acrofales

Spain18132 Posts

August 28 2014 15:54 GMT

#2702

On August 29 2014 00:18 The_Templar wrote:
There was an argument in my information and coding class today about two binomial strings, where I was the only person who thought my point was valid at all.
1010101001 0001110101 0110100110 1001010100 1001001101
1000111010 0111101101 1110111111 1011001111 1100010110

Which of these is randomly generated, and which of these was created by a human?

Honestly, there's not really enough info to go on... what is the human trying to do when creating this? Generate a random sequence? Or give some kind of meaning? If he's trying to create a random sequence, I'll go with him writing the first sequence, because it has less long sequences and humans tend to think long sequences of subsequent characters are atypical of random strings. However, it's pretty tenuous.

EDIT: I say it's tenuous because these strings actually represent something else, and if we were to generate the objects they represent (for instance, integers between 0 and 1023) and then convert them to string form, this interpretation of human bias is invalidated. Another argument for maybe the second one is that the last 2 strings of the first sequence are quite similar (start with 10010). A "human random generator" doesn't like that kind of pattern either.

ComaDose

Canada10357 Posts

August 28 2014 16:04 GMT

#2703

wouldn't there be about the same chance that the computer generated either of those strings

Acrofales

Spain18132 Posts

August 28 2014 16:05 GMT

#2704

On August 29 2014 01:04 ComaDose wrote:
wouldn't there be about the same chance that the computer generated either of those strings

Yes. But I think the question is not so much about the computer, but about human bias.

Najda

United States3765 Posts

August 28 2014 16:49 GMT

#2705

First third fourth and fifth look most human generated since the longest string is only 2 digits. Radiolab has a really interesting podcast about randomness that I'd recommend for your entertainment, I can find the link later when I'm on m computer.

Zess

Adun Toridas!9144 Posts

August 28 2014 17:30 GMT

#2706

The probability of not having a run of 3 or more in a set of 10 Bernoulli trials is actually quite low, so the ones with just runs of 2 are more likely to be human generated.

The_Templar

your Country52797 Posts

August 28 2014 17:39 GMT

#2707

There are only two strings, I just happened to divide them into groups of ten

Najda

United States3765 Posts

August 28 2014 17:55 GMT

#2708

On August 29 2014 02:39 The_Templar wrote:
There are only two strings, I just happened to divide them into groups of ten

Oh I see that now, my phone broke the format and it's much more obvious on the computer. I'll just say the first string then.

LSB

United States5171 Posts

August 28 2014 18:52 GMT

#2709

For a serious answer.

Assumption #1: One of the strings is Human Generated, One of the Strings is Computer Generated
Assumption #2: The computer picks 0 and 1 at true random.

String 1 Has 24 Ones, this seems to be the one most likely to be generated by a random number generator
String 2 Has 33 Ones

The chance of observing 33 or more successes in 50 trials is 1.64%, double this if you want to include the chance of 17 or less heads for 3.28% which is less than the 5% value typically used for "statistical significance"

Thus it is far more likely the first is randomly generated.

My statistics is rusty so correct me if I'm wrong plox.

Acrofales

Spain18132 Posts

August 28 2014 19:20 GMT

#2710

On August 29 2014 03:52 LSB wrote:

Show nested quote +

Eh, I kinda disagree. While you seem to be right on the math (just calculated part of the tails manually, didn't plug it into R and got bored after 36/14, but it seems to be heading for the %s you say), you're dismissing the fact that it's not just 1 being drawn up by a computer, but it's the other one being drawn up by a human, who we are assuming is doing his best to generate a "random" sequence. Maybe a "bias towards 1s" is a human bias (it might be, for all I know), but I think the human would generate less than 3 in 100 sequences with such a lopsided count: if asked to draw a random distribution of 50 1s and 0s, I for one would take good care to never stray too far from 25 of each

LSB

United States5171 Posts

August 28 2014 19:34 GMT

#2711

On August 29 2014 04:20 Acrofales wrote:

Show nested quote +

I considered that approach however you are adding even more assumptions.

Theoretically we can assume that the collection of human biases are normally distributed around some number, however we have no idea what that number is (might not even be 50%), and if we do make an assumption of 50% we would be sampling an assumption which would introduce a boatload of unmeasurable error.

ComaDose

Canada10357 Posts

August 28 2014 19:38 GMT

#2712

how much someone knows about statistics and random number generation would also affect how well they made a random string of numbers so it would vary greatly change from person to person.

can you tell us what your point was and what the answer is if there is one? my answer is that it could be either we don't know.

GettingIt

1656 Posts

August 28 2014 19:58 GMT

#2713

Why are you guys so smart?

The_Templar

your Country52797 Posts

August 28 2014 20:13 GMT

#2714

On August 29 2014 04:38 ComaDose wrote:
how much someone knows about statistics and random number generation would also affect how well they made a random string of numbers so it would vary greatly change from person to person.

can you tell us what your point was and what the answer is if there is one? my answer is that it could be either we don't know.

The point I made is that, in isolation, both are far more likely to be human generated, and there was therefore no way to actually tell. Nobody agreed with me, and everyone found it obvious that the second one was computer generated and not the first. Of course this was correct.

LSB

United States5171 Posts

August 28 2014 20:25 GMT

#2715

On August 29 2014 05:13 The_Templar wrote:

Show nested quote +

Welcome to peer pressure and confirmation bias.

Najda

United States3765 Posts

August 28 2014 20:28 GMT

#2716

On August 29 2014 05:13 The_Templar wrote:

Show nested quote +

I'll agree with that now that I see LSB's statistical analysis

Acrofales

Spain18132 Posts

August 28 2014 20:31 GMT

#2717

On August 29 2014 05:13 The_Templar wrote:

Show nested quote +

I don't think you phrased that properly, because I don't really see why either of the strings is "far more likely" to be generated by a human than by a computer. I do agree that the underlying assumptions for stating the second one is computer-generated are tenuous... and a better argument is that in isolation it is not easy to state which is which. As LSB's math above shows, a computer will only generate a similarly lopsided string in 3% of the cases, so it's not exactly a "typical" outcome for a random string generator either.

@LSB: you have to make some assumptions. Otherwise all you're saying is that a string similar to the bottom one is less likely to be generated by a computer than the top one, in which you are throwing away the information that you know the other one is generated by a human... and it's not so that we know absolutely nothing about humans and therefore should simply assign to them the one that is less likely to be generated by a computer.

LSB

United States5171 Posts

August 28 2014 20:35 GMT

#2718

On August 29 2014 05:31 Acrofales wrote:

Show nested quote +

Just because you have data doesn't mean you have or should incorporate in it a model. In fact, in this case incorporating the data would induce a huge amount of error, rather than simplify it.

EDIT: Technically speaking it is impossible to incorporate it into the model unless you want to throw out statistics.
The are a variety of reasons, the chief being that you can't use two variables to describe two data points.

Acrofales

Spain18132 Posts

August 28 2014 20:42 GMT

#2719

On August 29 2014 05:35 LSB wrote:

Show nested quote +

Just because you have data doesn't mean you have or should incorporate in it a model. In fact, in this case incorporating the data would induce a huge amount of error, rather than simplify it.

I disagree. As long as you do it in a principled manner. I think I could make a fairly simple Bayesian classifier that does better than random at predicting human strings looking at "longest string of subsequent digits" as one of the features. Perhaps "deviation from the expected number of 1s" is another one, although I have no evidence to back the second one up.

LSB

United States5171 Posts

August 28 2014 20:54 GMT

#2720

On August 29 2014 05:42 Acrofales wrote:

Show nested quote +

This is the fatal trap I which I am pointing out that you are falling into.

You have three assumptions
1) Computer behaves a certain way
2) A typical human behaves a certain way
3) The specific human who picked the number sequence behaves like a typical human

I make one. See the difference?

Prev 1 134 135 136 137 138 783 Next

Please or register to reply.

Ask and answer stupid questions here! - Page 136

Completed

Ongoing

Upcoming