|
Read the rules in the OP before posting, please.In order to ensure that this thread continues to meet TL standards and follows the proper guidelines, we will be enforcing the rules in the OP more strictly. Be sure to give them a re-read to refresh your memory! The vast majority of you are contributing in a healthy way, keep it up! NOTE: When providing a source, explain why you feel it is relevant and what purpose it adds to the discussion if it's not obvious. Also take note that unsubstantiated tweets/posts meant only to rekindle old arguments can result in a mod action. |
United Kingdom13775 Posts
On November 03 2016 02:15 ChristianS wrote: Someone in the thread was saying recently (I forget who, sorry) that the 538 election probabilities are Bayesian probabilities, that is they indicate a "degree of belief" chance rather than a frequentist "how many times out of 100" probability.
That doesn't seem quite right to me though. Among other reasons, they come from running a simulated version of the election 10000 times and calculating totals, rather than assigning a prior probability at the beginning of the race and then doing Bayesian updating.
Problem is, I don't really know what they represent. Nate Silver talks about it like a frequentist probability (e.g. comparing Trump's ~17% chance to Russian Roulette), but I'm not really clear what variables his simulation is changing randomly to produce random results. Like, is it modeling individual demographics' preference and turnout as normal curves, and randomly generating values for each demographic? Determining an average and standard deviation for each state's poll numbers, and then randomly generating results state-by-state?
It kinda seems like it's movement up or down is meaningful, and when it's at 50% it's probably even, but otherwise the precise value doesn't mean much
On November 03 2016 02:29 TheTenthDoc wrote: I think they're Bayesian, just reasoning from a prior of complete ignorance (which is basically what all frequentism is). Which is probably a fair prior in this setting.
The frequencies the model creates are based upon t distributions, I think, with correlation between each states' vote results based upon demographics. Alright, I finally have a chance to explain this issue. I'll also take this chance to explain the entire issue of election predictions because that's the context in which we actually care about probabilities and how they are interpreted. There's a lot of reading you could do depending on how much you care about the issue. It's rarely discussed though, because most people don't know and don't care about the "what is a probability" issue and just take the most intuitive approach that happens to be the frequentist one. It's one of those mistakes that is really easy to make because even among the people who are supposed to know better, it's common that they just don't really care.
I'm going to start this discussion by linking this wiki page on interpretations of probability, which contains with it a link to the "Bayesian probability" page previously referenced. This takes a more philosophical than mathematical approach to the issue of probabilities (and philosophical Bayesianism is... kind of odd), but it is pretty comprehensive and you'll find more than you need to know. I'm actually going to just focus on a very limited dichotomy, the one most relevant to mathematical statistics: the frequentist versus the Bayesian-logical approach. Frequentism considers probabilities in terms of limiting frequencies - like if I say I have a 45% chance of rolling a 6, that means that if I rolled the dice 100 billion times then I'd get something really, really close to 45 billion 6's. A Bayesian-logical approach (right now the most commonly used Bayesian approach, and henceforth referred to simply as "Bayesian") interprets probabilities as "degrees of belief." To put it in the context of election predictions: if, from a Bayesian perspective I say that "Hillary has a 65 percent chance of winning the election" what I'm really saying is, "based on the model that I have designed, I am 65 percent sure that the outcome of the coming election is that Hillary will win." This difference is one that manifests itself mostly in how you can use the machinery of probability theory in making predictions about future events.
The frequentist approach is basically the classical approach to probability, and the one that is mostly quite mature as a field of mathematics. I'm not going to go into it much; either you have the relevant mathematical background to already know what "classical probability" is, or you won't get it. The best book on probability theory from a frequentist/classical perspective is William Feller's two-part series An Introduction to Probability Theory and Its Applications (Part 1 Part 2) which is at a graduate level of mathematics, but is also the definitive text on classical probability (and one of my personal favorites among math books). Back when Feller wrote his book (1950 for the first edition), Bayesianism wasn't really very well-received in the mathematical community; it was a "big new idea" that more experienced mathematicians were rightfully skeptical about; it did end up being very useful but the concerns about its questionable reliability were really spot-on.
The Bayesian approach is an interesting outgrowth of the Bayes' rule or Bayes' theorem, a simple and almost-trivial statement of relation between conditional probabilities (e.g. probability of some event happening given that some other event happened). This simple rule forms the backbone of a range of techniques called Bayesian inference which use Bayes' theorem as a tool to update predictions based on new evidence. There are a lot of approaches, and a lot of techniques that are used, and unlike classical probability I wouldn't call Bayesian inference anything close to mature. It's proven to be extremely useful in a wide range of applications of probability, but it does suffer from a few issues, which I'll get to in a bit. There isn't really any definitive book I can recommend on Bayesianism since it's not a mature field, but I will offer Phil Gregory's Bayesian Logical Data Analysis for the Physical Sciences (link) as something that develops a Bayesian-logical approach to probability in a useful way.
Frequentism basically only uses the classical machinery of probabilistic theory, which is useful but limited in what you can accomplish with it. The use of Bayesian approaches expands what probabilities can be used for to a lot of further applications - the one we are interested in in this case is the application to events that happen only once, something that from a philosophical perspective is very irreconcilable with the definition of frequentist probability. In the Bayesian-logical context, we use evidence (here, polls and possibly fundamentals) within some model to get what we hope is an increasingly accurate picture of how sure we are that the evidence points to a certain outcome. Although the validity of Bayesian approaches is very clear by now, the classical criticism of them being easily misled is a very poignant concern that justifies the skepticism of the mathematicians who were suspicious about it.
So if you've taken any statistics or probability coursework, the basic idea of Bayesian inference is rather simple: you have your prior assumption about the state of the world, you have some evidence, and then you have a marginal likelihood (a factor that you use to ensure that the probability of all events sum to one). The basic idea that is used is that you try to make as uninformative an assumption about the past state of events as you can, given only what you automatically assume to be true (not a trivial topic; I'll link maximum entropy as one approach to look at but there's a lot to this), and then you let those assumptions just lose their influence as you add new data. The real issue with Bayesianism, though, is this: when you choose priors and probabilistic representations of the likelihood of your observed data, you prescribe a probabilistic structure to that data. Nothing inherently wrong with that, that's basically the entire idea of what a "model" does - in math, in science, or anywhere else. The problem is that if your model sucks then your results do too, and Bayesianism is especially vulnerable to this issue.
Now that we have established a context for what Bayesianism is, let's get into the issue of polls and predictions. I'm going to focus most on FiveThirtyEight, both because it's popular here on TL and because I personally have always been very impressed with Nate Silver's work on election predictions. I'm also going to briefly touch upon the Princeton Election Consortium predictions, which I'm less fond of but that will make a good example for one point I have to make about models. As with any non-trivial predictions, the models used are both complex and partially opaque (partly to avoid people stealing their stuff, partly because not too many people care about all the technical details). To put it simply though, most models focus on two general broad ranges of factors: poll results, and fundamentals (economics, historical results, approval ratings, etc). If you want more background reading, look at this paper on fundamental models and this Pew piece on polling methodology.
The major idea behind 538's model is this general line of thought: averages of polls are better than single polls, the results of polls are accurate to some measurable margin of error, and fundamental factors have an effect but are insufficient to predict the results. If you want some further reading on the model that Nate Silver and 538 have made, you should look at their list of pollster ratings, Nate Silver's technical explanation of how his pollster ratings work, their article on the usefulness of polls in making predictions, and their Endorsement Primary article (one of their "fundamentals" factors, of many). As an example of a different methodology, the PEC uses an approach called the meta margin as a "buffer factor" to predict elections, and their model gives a much tighter bound on the chances of Hillary Clinton winning.
You may or may not remember that Nate Silver became famous four years ago in the 2012 elections, when he differed from most other predictions and gave a model that looked ridiculously biased in favor of Obama, but ended up being the exact result of the election (he guessed 1 Senate seat and 0 electoral votes wrong). While he admitted that some luck factors played in his favor, the other thing that helped was the soundness of his statistical methods. What I remember most about Nate Silver in 2012 was the controversy over how Gallup polls (the most reliable pollster in terms of raw prediction rate) predicted that Romney would win, and how he had a rather scathing criticism of that assumption that generated a lot of controversy but ultimately ended up being really, really spot-on. That is basically how he became famous; I'm fine with that because I have generally been very impressed with the technical soundness of his statistical analysis (despite some factors I disagree with him on) and I think that the 538 model is the one most representative of the truth at this point. He did have a somewhat famous failure to predict elections in the 2015 UK General Election (read this Quartz piece, this 538 admission of failure, and this follow-up). I attribute that to the fact that his success in 2012 was well-founded in good statistical modeling, but being able to predict results with that uncanny level of accuracy is rare. Polling is a difficult task that is getting increasingly unreliable; read the earlier Pew piece on polling methodology and the 538 piece on polling accuracy if you want more details.
Now, after all that, we can talk about Bayesianism in the context of polling predictions. Nate Silver's approach is very Bayesian and best thought of in the previously described Bayesian-logical framework. His inference comes from a form of the model I have described, in which he updates the forecast based on new polls (both state and national), and fundamental factors (but only in the polls-plus model). Everything he does culminates in a bunch of distributions for the predicted result for each state (likely some variant of a Student t distribution, with a few added details), from which we can predict the results of the electoral college. The effect of the "fundamental" factors is to essentially "tip the scales" in one direction or the other in a way that makes sense; Nate Silver has noted that while the polls-plus models are more accurate than polls-only the majority of the time (57% versus 43%), they are only useful when you look at both of them together. In any case, this sounds very much like the Bayesian-logical framework that I have previously explained about.
Your objection was about that they run ten thousand simulations so you don't see how it could be a "state of belief." What's happening here is that he is applying a technique called a Monte Carlo simulation. The simple explanation is that the probabilistic description is analytically intractable, but it's based on a whole bunch of simple probability distributions which are analytically trivial to analyze. So all you do is you run a simulation, where you generate a completely random sample from each individual distribution, you combine those results, and you record the ultimate result of all of those factors. Then you repeat until you have enough. The methods for proving "what is enough" are too difficult to get into, and I'm not going to bother trying to explain this. But I will tell you that practically, 10,000 is the general cutoff that is used - it's small enough to be computationally feasible, big enough that it's a good simulation size. Over a broad range of applications I have found that your results don't change enough to matter whether you take ten thousand or ten billion runs of a simulation. So 10,000 it is.
The basic idea of all those runs is to see, to what extent does the variance in the prediction model (e.g. polling averages in this state or that state or national margin, etc) have an effect on the chance of a certain election result. The repeated running of the simulation generates enough runs of the possible variation in the results to get a good idea of what chances a candidate really has. But at the end of the day, the statement we're making is something akin to this: "The model was run with the parameters specified, and performed 10,000 simulations of the results. In 7501 of those runs, Hillary Clinton was victorious, in 2406, Donald Trump was victorious, and in 93 we had an electoral draw. So based on these results of my model, I am 75.01% sure that Hillary Clinton is going to win." Ultimately it is a probabilistic statement of belief, because it's a one-time event. You can't run an election more than once, it's a one-time event so the idea of "frequencies" is just not applicable. You are merely collecting evidence supporting a certain hypothesis and using simulations to model to what extent you are sure that that outcome will occur. And as new data is added, that evidence is updated - with new polls, new fundamentals data, and new pollster ratings. Bayesian through-and-through.
You will often find people talking about polls like "if we re-ran this election 100 times, we would expect Remain to win 52 times." The people who say that are actually just wrong. I certainly hope Nate Silver didn't say it exactly that way; I've generally been very impressed with how technically accurate he has been in his dialogues. If he did, either he was just making an analogy for saying "you can think of Trump's probability of victory as about this much" or he was just not being careful (for shame). But my description above is really how you should think about the probabilities in the context of the election predictions. And of course, those predictions change a lot based on how the evidence evolves. Trump's chances in the past week have increased substantially, and some recent articles from 538 have discussed that in some depth. There is not a "random chance" of Hillary Clinton and Donald Trump winning - we just don't know who will win. We have some idea, but we only have a general sense for how much the evidence points to one result or the other. At this moment in time, the 538 model is about 67% sure that the factors will line up in favor of a Hillary Clinton win. And that is the long answer for what these probabilities are actually saying.
|
I doesn't have to be a big mess if Democrats seize upon the coming election mandate in pushing for single payer or further exchange controls. If Obamacare is left alone or outright repealed, yes, big mess indeed.
|
So I gotta ask. Where the fuck is all of this noise about the emails from Weiner's computer showing that the Clintons are part of a child sex ring coming from? Is this some highly elaborate troll job?
|
United States41989 Posts
On November 03 2016 13:12 xDaunt wrote: So I gotta ask. Where the fuck is all of this noise about the emails from Weiner's computer showing that the Clintons are part of a child sex ring coming from? Is this some highly elaborate troll job? You understand you're the noise, right? This is the first I've heard of it, and it's from you. You probably heard someone else repeating it. That's how noise happens.
|
On November 03 2016 13:15 KwarK wrote:Show nested quote +On November 03 2016 13:12 xDaunt wrote: So I gotta ask. Where the fuck is all of this noise about the emails from Weiner's computer showing that the Clintons are part of a child sex ring coming from? Is this some highly elaborate troll job? You understand you're the noise, right? This is the first I've heard of it, and it's from you. You probably heard someone else repeating it. That's how noise happens. We don't go to the right websites.
|
According to the WSJ it's lower level FBI agents who want to go aggressive and do grand jury hearings (in the CF case, not the private server case) based on the claims of a non-CF person making allegations about CF in an unrelated corruption investigation. They presented their case to the actual DOJ prosecutors who would handle the case, who were unimpressed by the hearsay. The entire CF investigation originates from Clinton Cash, a conservative book.
|
On November 03 2016 12:12 Buckyman wrote:Show nested quote +On November 03 2016 12:10 Doodsmack wrote: I'd have to a imagine Clinton lawyers could get the wikileaks evidence thrown out pretty easily. Unless the FBI subpoenaed Podesta's gmail or something. A lot of the Podesta emails have cryptographic signatures that prove they were sent from his account.
Isn't that just us taking Wikileaks at their word or is there some way to verify? Ultimately Wikileaks is just splashing text up on their website, it could be entirely fabricated for all we know.
|
United States41989 Posts
On November 03 2016 13:24 Doodsmack wrote: According to the WSJ it's lower level FBI agents who want to go aggressive and do grand jury hearings (in the CF case, not the private server case) based on the claims of a non-CF person making allegations about CF in an unrelated corruption investigation. They presented their case to the actual DOJ prosecutors who would handle the case, who were unimpressed by the hearsay. The entire CF investigation originates from Clinton Cash, a conservative book. Clinton Cash is a work of fiction, it's not even conservative, it's just false.
|
On November 03 2016 13:28 Doodsmack wrote:Show nested quote +On November 03 2016 12:12 Buckyman wrote:On November 03 2016 12:10 Doodsmack wrote: I'd have to a imagine Clinton lawyers could get the wikileaks evidence thrown out pretty easily. Unless the FBI subpoenaed Podesta's gmail or something. A lot of the Podesta emails have cryptographic signatures that prove they were sent from his account. Isn't that just us taking Wikileaks at their word or is there some way to verify? Ultimately Wikileaks is just splashing text up on their website, it could be entirely fabricated for all we know.
Wikileaks can't sign documents using the private key from Podesta's email account.
|
On November 03 2016 13:28 Doodsmack wrote:Show nested quote +On November 03 2016 12:12 Buckyman wrote:On November 03 2016 12:10 Doodsmack wrote: I'd have to a imagine Clinton lawyers could get the wikileaks evidence thrown out pretty easily. Unless the FBI subpoenaed Podesta's gmail or something. A lot of the Podesta emails have cryptographic signatures that prove they were sent from his account. Isn't that just us taking Wikileaks at their word or is there some way to verify? Ultimately Wikileaks is just splashing text up on their website, it could be entirely fabricated for all we know. It could not be entirely fabricated because someone stole his Apple ID because he wrote it and the password in one of the emails. Wikileaks also doesn't have the resources to make enormous and passable forgeries. It's also unlikely because people keep losing their positions and nobody denies the authenticity.
The only thing there is a threat of, disregarding any technical arguments I'm ignorant of, is making up some fake emails. The arguments against that are 1) historical - why would Wikileaks destroy their future and past credibility, all those documents, just to try and pull a fast one here 2) if they faked an email with a bombshell, it would stand out, but many people keep arguing that the Podesta emails contain nothing of public interest, so faked emails would end up being about nothing, in which case we're back to who cares.
|
On November 03 2016 13:39 Buckyman wrote:Show nested quote +On November 03 2016 13:28 Doodsmack wrote:On November 03 2016 12:12 Buckyman wrote:On November 03 2016 12:10 Doodsmack wrote: I'd have to a imagine Clinton lawyers could get the wikileaks evidence thrown out pretty easily. Unless the FBI subpoenaed Podesta's gmail or something. A lot of the Podesta emails have cryptographic signatures that prove they were sent from his account. Isn't that just us taking Wikileaks at their word or is there some way to verify? Ultimately Wikileaks is just splashing text up on their website, it could be entirely fabricated for all we know. Wikileaks can't sign documents using the private key from Podesta's email account. Yes, but they are not dumping the raw data files either.
|
On November 03 2016 12:48 farvacola wrote:I doesn't have to be a big mess if Democrats seize upon the coming election mandate in pushing for single payer or further exchange controls. If Obamacare is left alone or outright repealed, yes, big mess indeed.
i skimmed the article and i have no idea how economist is getting the "this is a mess" - maybe because of the use turmoil? i mean, the author is right on all counts but this is all stuff people know and conclusions that have already been reached.
i've said before, any big new federal program has some hiccups. we didn't get social security, medicare, etc. right on the first try, and these programs need tune ups over time. however, the difference is that back in the day both parties would get together for the common good and say "hey you know we need to make some adjustments to make this work". instead, we have an intransigent GOP that's tried to repeal the ACA god knows how many times and has offered nothing remotely reasonable as an alternative, let alone an improvement.
|
On November 03 2016 12:43 Sermokala wrote: I wonder how it would have played out if Hillary just up and released all the emails on her own and embraced the "transparency" angle. It would have buried the story under a mountain of emails months ago, removed the power the Russians have over her, and get her the only positives she can get out of it.
The simplest explanation is that she could not or did not do this because there is damning evidence in those emails.
If there was nothing in them, why would she put so much at risk? This was a huge story when it first broke and her decision to delete a bunch of the emails would have been seen in a negative light no matter what.
|
On November 03 2016 13:39 Buckyman wrote:Show nested quote +On November 03 2016 13:28 Doodsmack wrote:On November 03 2016 12:12 Buckyman wrote:On November 03 2016 12:10 Doodsmack wrote: I'd have to a imagine Clinton lawyers could get the wikileaks evidence thrown out pretty easily. Unless the FBI subpoenaed Podesta's gmail or something. A lot of the Podesta emails have cryptographic signatures that prove they were sent from his account. Isn't that just us taking Wikileaks at their word or is there some way to verify? Ultimately Wikileaks is just splashing text up on their website, it could be entirely fabricated for all we know. Wikileaks can't sign documents using the private key from Podesta's email account.
Is the private key anything other than a string of text displayed on the wikileaks website? How do we verify that what we're seeing as the key on the wikileaks website came from google? I'm just saying Russian intelligence could in theory modify these emails before giving them to wikileaks. Thus why they could maybe be challenged in court.
Sputnik (Russian news agency) was caught publishing a modified email from Podesta's account, and it was published so soon after Wikileak's actual release of the alleged email that Sputnik had to have known about it beforehand. Funnily enough, Sputnik took it down quickly, but not quickly enough for Trump to reference it in a rally later that day .
|
On November 03 2016 13:53 Vin{MBL} wrote:Show nested quote +On November 03 2016 12:43 Sermokala wrote: I wonder how it would have played out if Hillary just up and released all the emails on her own and embraced the "transparency" angle. It would have buried the story under a mountain of emails months ago, removed the power the Russians have over her, and get her the only positives she can get out of it. The simplest explanation is that she could not or did not do this because there is damning evidence in those emails. If there was nothing in them, why would she put so much at risk? This was a huge story when it first broke and her decision to delete a bunch of the emails would have been seen in a negative light no matter what.
It could just be Hillary's hubris and desire to not have her privacy invaded.
|
On November 03 2016 14:21 Doodsmack wrote:Show nested quote +On November 03 2016 13:39 Buckyman wrote:On November 03 2016 13:28 Doodsmack wrote:On November 03 2016 12:12 Buckyman wrote:On November 03 2016 12:10 Doodsmack wrote: I'd have to a imagine Clinton lawyers could get the wikileaks evidence thrown out pretty easily. Unless the FBI subpoenaed Podesta's gmail or something. A lot of the Podesta emails have cryptographic signatures that prove they were sent from his account. Isn't that just us taking Wikileaks at their word or is there some way to verify? Ultimately Wikileaks is just splashing text up on their website, it could be entirely fabricated for all we know. Wikileaks can't sign documents using the private key from Podesta's email account. Is the private key anything other than a string of text displayed on the wikileaks website? How do we verify that what we're seeing as the key on the wikileaks website came from google? I'm just saying Russian intelligence could in theory modify these emails before giving them to wikileaks. Thus why they could maybe be challenged in court. Sputnik (Russian news agency) was caught publishing a modified email from Podesta's account, and it was published so soon after Wikileak's actual release of the alleged email that Sputnik had to have known about it beforehand. Funnily enough, Sputnik took it down quickly, but not quickly enough for Trump to reference it in a rally later that day  .
That wasn't a modified email, they were mistaken about who sent it.
|
Again, she was legally entitled to delete her 33K "personal" emails... Now if you're questioning whether any email was deleted maliciously (such as to cover up any crimes), the FBI didn't have any reason to believe so because by all accounts she wasn't in charge of filtering out her personal emails from her work emails. If you're questioning how something like half her emails were considered personal, given that we've had a month of Podesta emails and we have been reporting on Podesta talking about how to make risotto...
I brought this up before with an older article, but at what point does the Wikileaks Podesta email release become a violation of Podesta's constitutional right to privacy. Because in the past Wikileaks would work with news agencies to filter the emails and eventually release what was relevant. But this time they blasted everything they had on Podesta. And some emails like the transcript of Hillary's paid speeches have relevance to the current election, but do other emails like Podesta telling his friends how to make better risotto fall in the same category? I agree that transparency-wise if Hillary just released all her emails it would have likely put this issue at rest for all but the most die-hard anti-Clintons, but where do you draw the line here? She hasn't taken this defence as far as I'm aware, but I still think that's an important issue overlooked by this whole email scandal.
On November 03 2016 12:47 LegalLord wrote:+ Show Spoiler +On November 03 2016 02:15 ChristianS wrote: Someone in the thread was saying recently (I forget who, sorry) that the 538 election probabilities are Bayesian probabilities, that is they indicate a "degree of belief" chance rather than a frequentist "how many times out of 100" probability.
That doesn't seem quite right to me though. Among other reasons, they come from running a simulated version of the election 10000 times and calculating totals, rather than assigning a prior probability at the beginning of the race and then doing Bayesian updating.
Problem is, I don't really know what they represent. Nate Silver talks about it like a frequentist probability (e.g. comparing Trump's ~17% chance to Russian Roulette), but I'm not really clear what variables his simulation is changing randomly to produce random results. Like, is it modeling individual demographics' preference and turnout as normal curves, and randomly generating values for each demographic? Determining an average and standard deviation for each state's poll numbers, and then randomly generating results state-by-state?
It kinda seems like it's movement up or down is meaningful, and when it's at 50% it's probably even, but otherwise the precise value doesn't mean much On November 03 2016 02:29 TheTenthDoc wrote: I think they're Bayesian, just reasoning from a prior of complete ignorance (which is basically what all frequentism is). Which is probably a fair prior in this setting.
The frequencies the model creates are based upon t distributions, I think, with correlation between each states' vote results based upon demographics. Alright, I finally have a chance to explain this issue. I'll also take this chance to explain the entire issue of election predictions because that's the context in which we actually care about probabilities and how they are interpreted. There's a lot of reading you could do depending on how much you care about the issue. It's rarely discussed though, because most people don't know and don't care about the "what is a probability" issue and just take the most intuitive approach that happens to be the frequentist one. It's one of those mistakes that is really easy to make because even among the people who are supposed to know better, it's common that they just don't really care. I'm going to start this discussion by linking this wiki page on interpretations of probability, which contains with it a link to the "Bayesian probability" page previously referenced. This takes a more philosophical than mathematical approach to the issue of probabilities (and philosophical Bayesianism is... kind of odd), but it is pretty comprehensive and you'll find more than you need to know. I'm actually going to just focus on a very limited dichotomy, the one most relevant to mathematical statistics: the frequentist versus the Bayesian-logical approach. Frequentism considers probabilities in terms of limiting frequencies - like if I say I have a 45% chance of rolling a 6, that means that if I rolled the dice 100 billion times then I'd get something really, really close to 45 billion 6's. A Bayesian-logical approach (right now the most commonly used Bayesian approach, and henceforth referred to simply as "Bayesian") interprets probabilities as "degrees of belief." To put it in the context of election predictions: if, from a Bayesian perspective I say that "Hillary has a 65 percent chance of winning the election" what I'm really saying is, "based on the model that I have designed, I am 65 percent sure that the outcome of the coming election is that Hillary will win." This difference is one that manifests itself mostly in how you can use the machinery of probability theory in making predictions about future events. The frequentist approach is basically the classical approach to probability, and the one that is mostly quite mature as a field of mathematics. I'm not going to go into it much; either you have the relevant mathematical background to already know what "classical probability" is, or you won't get it. The best book on probability theory from a frequentist/classical perspective is William Feller's two-part series An Introduction to Probability Theory and Its Applications ( Part 1 Part 2) which is at a graduate level of mathematics, but is also the definitive text on classical probability (and one of my personal favorites among math books). Back when Feller wrote his book (1950 for the first edition), Bayesianism wasn't really very well-received in the mathematical community; it was a "big new idea" that more experienced mathematicians were rightfully skeptical about; it did end up being very useful but the concerns about its questionable reliability were really spot-on. The Bayesian approach is an interesting outgrowth of the Bayes' rule or Bayes' theorem, a simple and almost-trivial statement of relation between conditional probabilities (e.g. probability of some event happening given that some other event happened). This simple rule forms the backbone of a range of techniques called Bayesian inference which use Bayes' theorem as a tool to update predictions based on new evidence. There are a lot of approaches, and a lot of techniques that are used, and unlike classical probability I wouldn't call Bayesian inference anything close to mature. It's proven to be extremely useful in a wide range of applications of probability, but it does suffer from a few issues, which I'll get to in a bit. There isn't really any definitive book I can recommend on Bayesianism since it's not a mature field, but I will offer Phil Gregory's Bayesian Logical Data Analysis for the Physical Sciences ( link) as something that develops a Bayesian-logical approach to probability in a useful way. Frequentism basically only uses the classical machinery of probabilistic theory, which is useful but limited in what you can accomplish with it. The use of Bayesian approaches expands what probabilities can be used for to a lot of further applications - the one we are interested in in this case is the application to events that happen only once, something that from a philosophical perspective is very irreconcilable with the definition of frequentist probability. In the Bayesian-logical context, we use evidence (here, polls and possibly fundamentals) within some model to get what we hope is an increasingly accurate picture of how sure we are that the evidence points to a certain outcome. Although the validity of Bayesian approaches is very clear by now, the classical criticism of them being easily misled is a very poignant concern that justifies the skepticism of the mathematicians who were suspicious about it. So if you've taken any statistics or probability coursework, the basic idea of Bayesian inference is rather simple: you have your prior assumption about the state of the world, you have some evidence, and then you have a marginal likelihood (a factor that you use to ensure that the probability of all events sum to one). The basic idea that is used is that you try to make as uninformative an assumption about the past state of events as you can, given only what you automatically assume to be true (not a trivial topic; I'll link maximum entropy as one approach to look at but there's a lot to this), and then you let those assumptions just lose their influence as you add new data. The real issue with Bayesianism, though, is this: when you choose priors and probabilistic representations of the likelihood of your observed data, you prescribe a probabilistic structure to that data. Nothing inherently wrong with that, that's basically the entire idea of what a "model" does - in math, in science, or anywhere else. The problem is that if your model sucks then your results do too, and Bayesianism is especially vulnerable to this issue. Now that we have established a context for what Bayesianism is, let's get into the issue of polls and predictions. I'm going to focus most on FiveThirtyEight, both because it's popular here on TL and because I personally have always been very impressed with Nate Silver's work on election predictions. I'm also going to briefly touch upon the Princeton Election Consortium predictions, which I'm less fond of but that will make a good example for one point I have to make about models. As with any non-trivial predictions, the models used are both complex and partially opaque (partly to avoid people stealing their stuff, partly because not too many people care about all the technical details). To put it simply though, most models focus on two general broad ranges of factors: poll results, and fundamentals (economics, historical results, approval ratings, etc). If you want more background reading, look at this paper on fundamental models and this Pew piece on polling methodology. The major idea behind 538's model is this general line of thought: averages of polls are better than single polls, the results of polls are accurate to some measurable margin of error, and fundamental factors have an effect but are insufficient to predict the results. If you want some further reading on the model that Nate Silver and 538 have made, you should look at their list of pollster ratings, Nate Silver's technical explanation of how his pollster ratings work, their article on the usefulness of polls in making predictions, and their Endorsement Primary article (one of their "fundamentals" factors, of many). As an example of a different methodology, the PEC uses an approach called the meta margin as a "buffer factor" to predict elections, and their model gives a much tighter bound on the chances of Hillary Clinton winning. You may or may not remember that Nate Silver became famous four years ago in the 2012 elections, when he differed from most other predictions and gave a model that looked ridiculously biased in favor of Obama, but ended up being the exact result of the election (he guessed 1 Senate seat and 0 electoral votes wrong). While he admitted that some luck factors played in his favor, the other thing that helped was the soundness of his statistical methods. What I remember most about Nate Silver in 2012 was the controversy over how Gallup polls (the most reliable pollster in terms of raw prediction rate) predicted that Romney would win, and how he had a rather scathing criticism of that assumption that generated a lot of controversy but ultimately ended up being really, really spot-on. That is basically how he became famous; I'm fine with that because I have generally been very impressed with the technical soundness of his statistical analysis (despite some factors I disagree with him on) and I think that the 538 model is the one most representative of the truth at this point. He did have a somewhat famous failure to predict elections in the 2015 UK General Election (read this Quartz piece, this 538 admission of failure, and this follow-up). I attribute that to the fact that his success in 2012 was well-founded in good statistical modeling, but being able to predict results with that uncanny level of accuracy is rare. Polling is a difficult task that is getting increasingly unreliable; read the earlier Pew piece on polling methodology and the 538 piece on polling accuracy if you want more details. Now, after all that, we can talk about Bayesianism in the context of polling predictions. Nate Silver's approach is very Bayesian and best thought of in the previously described Bayesian-logical framework. His inference comes from a form of the model I have described, in which he updates the forecast based on new polls (both state and national), and fundamental factors (but only in the polls-plus model). Everything he does culminates in a bunch of distributions for the predicted result for each state (likely some variant of a Student t distribution, with a few added details), from which we can predict the results of the electoral college. The effect of the "fundamental" factors is to essentially "tip the scales" in one direction or the other in a way that makes sense; Nate Silver has noted that while the polls-plus models are more accurate than polls-only the majority of the time (57% versus 43%), they are only useful when you look at both of them together. In any case, this sounds very much like the Bayesian-logical framework that I have previously explained about. Your objection was about that they run ten thousand simulations so you don't see how it could be a "state of belief." What's happening here is that he is applying a technique called a Monte Carlo simulation. The simple explanation is that the probabilistic description is analytically intractable, but it's based on a whole bunch of simple probability distributions which are analytically trivial to analyze. So all you do is you run a simulation, where you generate a completely random sample from each individual distribution, you combine those results, and you record the ultimate result of all of those factors. Then you repeat until you have enough. The methods for proving "what is enough" are too difficult to get into, and I'm not going to bother trying to explain this. But I will tell you that practically, 10,000 is the general cutoff that is used - it's small enough to be computationally feasible, big enough that it's a good simulation size. Over a broad range of applications I have found that your results don't change enough to matter whether you take ten thousand or ten billion runs of a simulation. So 10,000 it is. The basic idea of all those runs is to see, to what extent does the variance in the prediction model (e.g. polling averages in this state or that state or national margin, etc) have an effect on the chance of a certain election result. The repeated running of the simulation generates enough runs of the possible variation in the results to get a good idea of what chances a candidate really has. But at the end of the day, the statement we're making is something akin to this: "The model was run with the parameters specified, and performed 10,000 simulations of the results. In 7501 of those runs, Hillary Clinton was victorious, in 2406, Donald Trump was victorious, and in 93 we had an electoral draw. So based on these results of my model, I am 75.01% sure that Hillary Clinton is going to win." Ultimately it is a probabilistic statement of belief, because it's a one-time event. You can't run an election more than once, it's a one-time event so the idea of "frequencies" is just not applicable. You are merely collecting evidence supporting a certain hypothesis and using simulations to model to what extent you are sure that that outcome will occur. And as new data is added, that evidence is updated - with new polls, new fundamentals data, and new pollster ratings. Bayesian through-and-through. You will often find people talking about polls like "if we re-ran this election 100 times, we would expect Remain to win 52 times." The people who say that are actually just wrong. I certainly hope Nate Silver didn't say it exactly that way; I've generally been very impressed with how technically accurate he has been in his dialogues. If he did, either he was just making an analogy for saying "you can think of Trump's probability of victory as about this much" or he was just not being careful (for shame). But my description above is really how you should think about the probabilities in the context of the election predictions. And of course, those predictions change a lot based on how the evidence evolves. Trump's chances in the past week have increased substantially, and some recent articles from 538 have discussed that in some depth. There is not a "random chance" of Hillary Clinton and Donald Trump winning - we just don't know who will win. We have some idea, but we only have a general sense for how much the evidence points to one result or the other. At this moment in time, the 538 model is about 67% sure that the factors will line up in favor of a Hillary Clinton win. And that is the long answer for what these probabilities are actually saying.
whoo math
very nice summary
|
@LegalLord:
Thanks for the in-depth post. Some of that I had picked up in various wikipedia dives around the internet (and I'll have to spend some time on those links you gave), but it's nice to have it all laid out in a manner easy enough for a layman to understand. One time on State of the Game, Nony told everyone to go to lesswrong.com and I wound up going down a rabbit hole trying to figure out what Bayesianism is, and this would have helped quite a bit – and still helps quite a bit anyway.
I guess there's a few questions I still have. If you have two different types of probabilities, one of which is sort of summarized as "if an event has an X % probability, it means that out of 100 trials we would expect that event to occur X times," and the other summarized as "if a proposition has X % probability, it means we can say that proposition is true with X % certainty," that makes some intuitive sense. I guess the latter is still a little unclear to me in exactly what it means; if we say in the Bayesian sense that we're 75% sure Hillary Clinton will win the election, my only intuition for explaining what that number means is to imagine 10,000 parallel universes, and then in 7500 of them, Clinton becomes president. But that's the frequentist approach; It might be that the true result of the election has very little variance, but we're just fairly uncertain what the result will be. So due to our lack of information we estimate a 75% certainty, but if we checked in on our 100,000 possible universes, all 100,000 would go to Clinton. I imagine I'll have to read through some of the philosophy of probability wikis you linked to figure out why that 75% isn't a relatively arbitrary number, then, if you take the frequentist definition away.
The other question I have, though, is how to judge what possibilities are factoring into a probability like that. In 538's analysis they've done plenty of talk basically explaining why the model says what it says, and they'll often reference things like "it's accounting for the possibility of a large polling error" or "it's accounting for the possibility that the race will tighten in the week or two before the election." Where exactly do you model factors like that in a Monte Carlo simulation? It would seem very difficult for a model which gets a lot of its work done by averaging polls to account for the likelihood that those polls will change by x amount in one direction or the other by such and such date, and it would seem near impossible for a model with an input consisting almost entirely of polls to calculate the probability that those polls are wrong by x amount in one direction or the other.
I mean, put it this way: it would be pretty easy to come up with averages and t-distributions of polls in individual states (easy enough that I could probably do it). You could then randomly generate results based on these distributions that would, in a frequentist sort of way, mimic the "10,000 parallel universes" idea to generate probabilities. If you did this with all the states, randomly generating an election result for each one based on your distribution, and then adding up the electoral votes of all of them, and then repeating that 10,000 more times, you'd get an estimate of the likelihood taht each candidate would win the race. But this would be kind of like saying "well Trump only has a 65% chance in FL, 72% chance in OH, 33% chance in Pennsylvania... and he needs to win all of those, so I'll just multiply .65 * .72 * .33 ...and see? he has a tiny chance of winning the election!" But as 538 has often pointed out, you can't do that because their errors are correlated, so in the scenarios where he wins Ohio, he's more likely to win those other states, too; not to mention this doesn't account for the possibility of new events moving the polling average this way or that before the election. I don't know how you could just take poll numbers as input and use that to calculate the probability of those polls systematically underestimating one candidate; surely it's impossible to discern that from those data.
I guess this is all to say that the 538 forecast boasts the ability to state an absolute probability of the election coming out one way or the other. And in an absolute sense, it's at least true that when the forecast is above 50% that candidate is more likely than not to win. It's also true that if one candidate was at 75% last week and they moved to 85% this week that means their position has improved.
What doesn't seem clear to me is that if you were a bookie, and you gave people odds based on the 538 forecast for the next 100 elections, that you would come out even most of the time. If you look at 2012, for instance, and see that 99% of the time the candidate he said was winning went on to win, then unless he was giving those candidates a 99% chance of winning, isn't that actually a failure of his model? If most of those winning candidates were only getting 60-80% from the 538 model, and then 98% of them won, then surely he was actually underestimating those candidates quite a bit – or am I thinking of all this wrong?
|
United Kingdom13775 Posts
I'll briefly answer.
I guess there's a few questions I still have. If you have two different types of probabilities, one of which is sort of summarized as "if an event has an X % probability, it means that out of 100 trials we would expect that event to occur X times," and the other summarized as "if a proposition has X % probability, it means we can say that proposition is true with X % certainty," that makes some intuitive sense. I guess the latter is still a little unclear to me in exactly what it means; if we say in the Bayesian sense that we're 75% sure Hillary Clinton will win the election, my only intuition for explaining what that number means is to imagine 10,000 parallel universes, and then in 7500 of them, Clinton becomes president. But that's the frequentist approach; It might be that the true result of the election has very little variance, but we're just fairly uncertain what the result will be. So due to our lack of information we estimate a 75% certainty, but if we checked in on our 100,000 possible universes, all 100,000 would go to Clinton. I imagine I'll have to read through some of the philosophy of probability wikis you linked to figure out why that 75% isn't a relatively arbitrary number, then, if you take the frequentist definition away. Basically, if we had 100,000 parallel universes, they will all have the same result. And I'm 75% sure that they will all have Hillary winning, and 24% sure that they will all have Trump winning. That's what the degree of belief is here.
The other question I have, though, is how to judge what possibilities are factoring into a probability like that. In 538's analysis they've done plenty of talk basically explaining why the model says what it says, and they'll often reference things like "it's accounting for the possibility of a large polling error" or "it's accounting for the possibility that the race will tighten in the week or two before the election." Where exactly do you model factors like that in a Monte Carlo simulation? It would seem very difficult for a model which gets a lot of its work done by averaging polls to account for the likelihood that those polls will change by x amount in one direction or the other by such and such date, and it would seem near impossible for a model with an input consisting almost entirely of polls to calculate the probability that those polls are wrong by x amount in one direction or the other.
I mean, put it this way: it would be pretty easy to come up with averages and t-distributions of polls in individual states (easy enough that I could probably do it). You could then randomly generate results based on these distributions that would, in a frequentist sort of way, mimic the "10,000 parallel universes" idea to generate probabilities. If you did this with all the states, randomly generating an election result for each one based on your distribution, and then adding up the electoral votes of all of them, and then repeating that 10,000 more times, you'd get an estimate of the likelihood taht each candidate would win the race. But this would be kind of like saying "well Trump only has a 65% chance in FL, 72% chance in OH, 33% chance in Pennsylvania... and he needs to win all of those, so I'll just multiply .65 * .72 * .33 ...and see? he has a tiny chance of winning the election!" But as 538 has often pointed out, you can't do that because their errors are correlated, so in the scenarios where he wins Ohio, he's more likely to win those other states, too; not to mention this doesn't account for the possibility of new events moving the polling average this way or that before the election. I don't know how you could just take poll numbers as input and use that to calculate the probability of those polls systematically underestimating one candidate; surely it's impossible to discern that from those data. The Monte Carlo simulation just takes samples from each individual distribution. Like, we say that Kansas has polling results and errors that follow such-and-such distribution, and Louisiana has polling results that follow some other distribution. We generate some random value from the Kansas distribution, and one from the Louisiana distribution, and so on. By that I mean, you generate a random number with the same probability as what you have from the original distribution. That is what the Monte Carlo simulation does; it works just fine as long as you have enough runs.
Of course, these factors aren't entirely independent, and I'm sure 538 has some covariance terms to account for that. I doubt they have an exact technical specification of their model available though, so I just can't tell you.
I guess this is all to say that the 538 forecast boasts the ability to state an absolute probability of the election coming out one way or the other. And in an absolute sense, it's at least true that when the forecast is above 50% that candidate is more likely than not to win. It's also true that if one candidate was at 75% last week and they moved to 85% this week that means their position has improved.
What doesn't seem clear to me is that if you were a bookie, and you gave people odds based on the 538 forecast for the next 100 elections, that you would come out even most of the time. If you look at 2012, for instance, and see that 99% of the time the candidate he said was winning went on to win, then unless he was giving those candidates a 99% chance of winning, isn't that actually a failure of his model? If most of those winning candidates were only getting 60-80% from the 538 model, and then 98% of them won, then surely he was actually underestimating those candidates quite a bit – or am I thinking of all this wrong? If 538 gave a barely-likely chance of winning to 300 certain candidates, but they all won, then yes, I would think that that is a little bit questionable and that he really got pretty lucky. In 2012 he definitely got lucky that the polls systematically underestimated Obama's advantage; if he was exactly right on his probabilities then Obama would have probably lost more states than he did due to poll variance. Sometimes you're wrong but it systematically works in your favor. To his credit Nate Silver has acknowledged this plenty.
If you're asking about if you can be frequentist over many independent Bayesian predictions, I have no idea. It's not something that is easy to test in a way that any statistician would find satisfactory.
|
Sarcasm gone wrong.
+ Show Spoiler +Thought it amusing considering the talk on Silver and probability.
|
|
|
|