I'm not necessarily going to be right, but I am 90% confident that I am right with that range.
On A) you know the odds are beforehand. The dice has 6 sides, so betting on one is 1:6. That is an objective assumption. You calculated that, so you're safe. You can picture exactly in your head what 16.6% is. But on B), when you say "90% confident that I am right". Where does that number came from? You cannot calculate that what "90% confidence" is. You don't have the variables for that. So you guessed a range that gives you a warm and fuzzy feeling inside that sounds 90%'ish.[/quote]I get what you are saying here, but it doesn't really make sense. 90% confidence means I am willing to accept a 9:1 bet on it. The whole point of the exercise is to show that when people are willing to accept 90% bets, they should only accept 70% or 50% bets. Now, you can say that individuals change on a day to day basis, so one day they will be perfectly right, while the next day they will get everything wrong, and that might be true. But the fact is though, that this test is measured over thousands of people, and randomization will remove individual differences. So in your highly exaggerated example, the guy with the perfect understanding won't get 8376 answers right, it's called a standard deviation, but if you had thousands of people like him, then their mean should be fairly close to 8376. Furthermore, the point of this test is not to show that some people have a 70% confidence interval, while others have a 40% confidence interval. It is to show that people who should get 8376 questions right tend to get 3000 questions right, or people who should get 9 questions right, tend to get 3-7 questions right.
Of course this is a big exaggeration. But it shows you the two points where I disagree with the test's method. There are two axioms that you build at the start of the test that I disagree with:
1) The test subject can calculate off the top of his head what 90% of an unknown value is.
2) The test subject can provide a reliable confidence range off the top of his head to an unknown answer.
1) The test subject can calculate off the top of his head what 90% of an unknown value is.
2) The test subject can provide a reliable confidence range off the top of his head to an unknown answer.
I think you can discard the unknown part of both those questions, as it is irrelevant. Firstly, none of those values are provided to you in a void. All of them you have some knowledge about, so you have a starting point. When did MLK die? Most people die between the ages of 0-110, so you know where to start. From there you can estimate how old he was when he was still alive... at least 25. And from there you can decide on what number you would be willing to take for the upperlimit so that you would accept a 9:1 bet on the range. Secondly, and more importantly, not having knowledge about it shouldn't matter, as the point of the exercise is to choose a limit at which you are confident that you will be right, so even if you have no knowledge of the answer, you should still be able to choose a limit at which you are confident that you are right. For example, I have no idea what the shortest distance from Earth to the next galaxy is, in fact, I don't even have a starting point. Yet I can still say that I am 90% confident that the distance to the next galaxy is between 10 light years and 10,000,000,000 light years away. I didn't just type a big number out now either, I feel that, based on no information whatsoever, that the next galaxy should not be more than ten billion light years away. Even if I get asked 10 questions like that, of which I have no knowledge, I should be able to provide intervals with which I feel confident.
The problem with questions like that is obviously, that people might vastly underestimate or overestimate the phenomenon if they have no basis of knowledge, which is why they give us questions which we know a little bit about. We know how old most people get, we have seen photos of MLK, so we have a basis for our estimates. With the elephant question, we know how long humans give birth, so we use that as our basis. It is unlikely that anyone will say 300 years, or 2 days, for that question, becuase they have a basic estimate. So people aren't likely to completely get the ball park wrong. So I think that you can leave the unknown quantity out of your axions, as firstly, it should not be relevant, and secondly, the questions are designed so that people are not without any knowledge.
That leaves us with two axions:
The test subject can calculate what 90% is.
The test subject can provide a reliable confidence range
The test subject can provide a reliable confidence range
I think we can both agree that everyone knows what 90% is theoretically (9 times out of 10). If they don't know what it is practically, then that's what the test is trying to show. This is not a general test of overconfidence, its a test of decision making overconfidence, and if people theoretically know what 90% confidence is, but they can't apply that practically, then it shows that people give 90% values to decisions in real life that should not get them - exactly what this test is trying to prove.
The second axion has two elements to it, that participants know what a confidence range is, and that they can provide a reliable confidence range. I think confidence range is explained reasonably well (for those that can and do read) in the test itself, and in practical applications of this test, it was likely to be explained again. So, can people provide a reliable answer? You seem to imply that people can't, that people choose figures just out of the blue. Here's an example: The computer you are using now is probably not brand new, so how much would I need to pay you for it? $2,000? How about tomorrow, nothing has changed except the memories in your head and your breakfast. $1,500 now? Then tomorrow you have a bad day, $4,000 right? People don't work like that. Yes, values might change. On a good day you might ask me $1,900 for it, and on a bad day you might ask me $2,100 for it. But that doesn't mean that you do not make a logical, systematic decision.
Furthermore, the test is averaged over lots of people, as said earlier, so individual differences shouldn't have an impact. The only way in which individual differences can be threat to validity is if it changes systematically for the participants, for example, if this test was done at a school on the day after prom, then the elation of the night before might make people be more optimistic than normal, leading to more overconfidence. However, if this test is done in a normal situation, then the ratio of positive vs. negative people should be the same as usual.
Yes, reliability won't be perfect, and yes, you're not unequivocally proving that people tend to be overconfident in their decisions, and some people will get different results based on situational factors. That's part of all research in the social sciences. That's why we don't work with causal factors, we work with correlations.
In this case. The theory passed the test. If the theory is right, then a very low number of people should get 9/10 answers correct, and according to the OP, only 1% did. But that doesn't mean the test is correct. I'm pretty sure that if you had asked the test subjects to roll a dice of a random size 10 times instead of asking those 10 questions. The results would be very similar. Does that mean rollings a dice is effectively measuring how confident one can be? I think not.
You have to actually substantiate what you think could be confounding the variables. The test says "choose a range with which you are 90% confident" and then finds that most people get the answer right only 50% of the time. It specifically asks them to give their confidence level, and then it proves that their confidence is unfounded. I do not not see how this is comparable to the results of rolling a die.
Actually, I'm pretty sure that if you repeated the same test with the same person, but with completely different questions, at completely different days and times. The results from the first and the test would vary a lot more often then not. That would objectively disprove the test, I think 

All that would prove is that the test has high variance or low reliability. ("The reliability of a measurement procedure is the stability or consistency of a measurement. If the same individuals are measured under the same conditions, a reliable measurement procedure will produce identical (or nearly identical) measurements."). This would be relevant if confidence was relatively fixed, like IQ. You can't have an IQ test says that a person has a 150 IQ on day 1, and a 80 IQ on day 2. However, do people's confidence levels change depending on the day? If yes, then reliability isn't important to the test, and is expected to vary. As mentioned earlier, the only risk then would be a systematic variance in confidence levels.
Now about the authors' credibility. I'm gonna say something is not completely relevant to what we're talking. But it's so funny that I'm gonna post it anyway:+ Show Spoiler +
On August 03 2009 18:10 Daigomi wrote:And just so that you know, the test was designed by Prof. Russo and Prof. Schoemaker. Russo is a prof at Cornell, and if I remember correctly, he did his BA in maths, his masters in statistics, and his PhD in cognitive psychology. Schoemaker did a BS in physics, then an masters in management, an MBA in finance, and a PhD in decision making. I've had this professor some years ago who was a phd in statistics. He was pretty well known around here because of his veeeery unconventional style and his. He often bragged about all his awards on mathematics "contests" and "olympic tests" (not sure how those are called in english) and how he could solve any complex trigonometry problem using only Tales and Pythagoras. So anyway, we happen to have heard many that he used to have serious money problems because of gambling. But for someone who is phd is fucking statistics that sounded more like gossips. Until one day, during class he was trying to prove that the odds of a specific sequence to happen was so rare. That he pulled a dice he had in his pocket, asked a girl in the front row to roll the dice x times and said that if numbers matched such sequence he would he would approve everyone in the final exams. Well, the girl rolled the dices, got the numbers correct and now he is all desperate begging us not to tell this to anyone because he could get fired and all and how he needed money because he lost so much to gambling already lol
And that's how I passed in statistics
Not trying to imply anything about the authors of the test. I don't know them. Just saying you should always be skeptical about anyone 
On August 03 2009 18:10 Daigomi wrote:And just so that you know, the test was designed by Prof. Russo and Prof. Schoemaker. Russo is a prof at Cornell, and if I remember correctly, he did his BA in maths, his masters in statistics, and his PhD in cognitive psychology. Schoemaker did a BS in physics, then an masters in management, an MBA in finance, and a PhD in decision making.
And that's how I passed in statistics


I don't really get the point of the example you give. Are you implying that he wasn't good at stats because he couldn't gamble? I've got a stunning handwriting, it doesn't mean I can write novels. Or are you implying that intelligent people also make mistakes? Because from your example, it doesn't seem like he made a mistake, he just had terrible luck. If the sequence was really rare (lets say four six roles in a row), then what he did with your class would have worked for 1295 other classes.
I've got endless respect for professors, as I think does anyone studying post-grad. That doesn't mean they are never wrong, not at all, but it does mean that compared to a lay-person, and on their topic of specialization, they are basically never wrong. Getting a PhD in psychology is 8-10 years of studying, of which half of it is focused on your specialisation. To become a professor changes from uni to uni, but one of the general conditions is that you need to publish a set amount of articles (like 6) in scientific journals every year. What that means is that the people that designed this test studied for a combined total of roughly 20 years, with half of it focusing on this topic, and that they designed an average of six experiments per year, experiments that were accepted through peer review by equally knowledgeable people. What this means to me is that they probably know how to set up valid experiments in their field of specialisation, and that your arguments are more likely to come from a misunderstanding of the experiment than it is to come from them completely screwing up the experiment.
I don't mean that to sound harsh, and I'm not saying that because you didn't study in the direction, your opinions should be ignored. That's why I touched on everything you said. What I am saying is that you should consider how confident you are that you are right here, and then consider the odds of you being right, and see if the two are the same :p