2 Contextualizing statistics
The general body of researches in mathematical statistics during the last fifteen years is fundamentally a reconstruction of logical rather than mathematical ideas, although the solution of mathematical problems has contributed essentially to this reconstruction.
— R.A. Fisher, The Logic of Inductive Inference
In Chapter 1, we saw that inductive arguments are such that, even if the premises are true, the conclusions may be false. For example, it might be true that
- P:
-
up to the current time, \(t\), all observed ravens have been black
and false that
- C:
-
All ravens, including those yet to be observed, are black.
Arguably, most scientific and statistical arguments are inductive in this way: the available data, and modeling assumptions (encoded in premises) do not guarantee the veracity of the inferred scientific theory or statistical hypothesis (the argument’s conclusion). Most inferences to theories or hypotheses go beyond the observations at hand. Scientific laws are sufficiently general, in the sense that they refer not to particular entities or cases, but broad categories. For example, Hubble’s Law of Cosmic Expansion states that \(V = h \times d\), where \(V\) is galaxy’s recessional velocity, \(h\) is a parameter representing the rate of universe expansion, and \(d\) is the galaxy’s distance from a reference galaxy. Hubble’s Law is not only about the relationship between velocity and distance for galaxies that have been observed, but about the relationship between distance and velocity for all, including yet-to-be-observed galaxies. Further, the constant, \(h\), is strictly speaking, an unobservable; it represents “the constant rate of cosmic expansion caused by the stretching of space-time itself” Bagla (2009). \(h\) can only be inferred through scientific or statistical methods rather than directly observed.
Inferences to broad generalizations or unobservable entities aren’t particular to the physical sciences. In the social sciences, psychologists are often interested in measuring unobservable psychological traits, called latent variables, such as general intelligence, \(g\), self-esteem, or extroversion. To “measure” latent variables, psychologists must first measure observable variables—e.g., responses to a questionnaire—and have a statistical model describing how the latent variables relate to what was measured.
In this chapter, we study statistical inference as a form of inductive inference. What forms can inductive inference take? What problems arise in attempting to justify inductive inference? How do statistical models contribute to the proper foundation for inductive inference and, by extension, scientific knowledge? How strong are the arguments that justify statistical methodologies? By expanding upon induction and these related questions, we gain a broader and contextualized view of the nature of statistical inference. From there, we will be in the position to begin to evaluate different statistical methodologies.
2.1 Types of inductive inference
To better understand inductive inferences, it may be helpful to study different types of inductive inference. Here, we will study three types: inference to the best explanation, induction by enumeration, and inference from analogy. For more information on types of inductive inference, see Vickers (2006).
2.1.1 Inference to the best explanation
Today, Estelle woke up late. She was in a rush to get ready, and quickly grabbed her phone off of the charger on her way out of the house. Soon after, on her way to work, she noticed that her phone battery was only at 20 percent. Oh no! What could be the explanation for why her phone was not charged to (or near) 100 percent? There any many logically possible explanations. Here are a few:
- \(H_1\)
-
Estelle plugged her phone in properly the night before, but, unbeknownst to her, the power went out for a long period of time, and as a result, her phone did not charge.
- \(H_2\)
-
Estelle plugged her phone in properly the night before, but the phone charging cord is faulty and no longer working, and as a result, her phone did not charge.
- \(H_3\)
-
Estelle plugged her phone in properly the night before, but a being from another planet visited her room and unplugged it for most of the night. As a result, her phone did not charge.
- \(H_4\)
-
Estelle, in fact, didn’t plug her phone in properly the night before, and as a result, her phone did not charge.
Our intuition says that some of these explanations are plausible, and others are not. For example, in the absence of additional information, \(H_1\), \(H_2\), and \(H_4\) seem plausible. \(H_3\) seems implausible because we have no good reasons to believe that such beings exist or can travel to Earth, and even if they did and could, we have no reason to believe that they have the goal of unplugging our phones.
Now, suppose that Estelle thinks a bit more, and remembers two things: First, she remembers that the digital clock on her stove displayed the correct time on the way out of the house. Second, she remembers that a few other times in the last month, she’s plugged in her phone improperly, and once she secured the connection, her phone charged without issue. This information changes which explanations are plausible. In particular, \(H_1\) now becomes much less plausible, and \(H_4\) becomes much more plausible. In fact, we might infer that \(H_4\) is the best explanation for the fact that the phone is only charged to 20 percent, based on the information at hand.
The reasoning employed in this example is a type of inductive inference1 called inference to the best explanation (IBE). Generally, IBE might be characterized as the process of “accepting a hypothesis on the grounds that [it] provides [a] better explanation of the given evidence comparing to the other competing hypotheses” (Erdenk, 2015). Notice that IBE is clearly not deductive, because there is no requirement that, with limited information, the best explanation is logically entailed by the observed phenomena. In the example above, \(H_2\) has not been eliminated on the basis of logical impossibility; rather, it just seems less plausible than \(H_4\).
1 Note that some philosophers do not classify IBE as a type of induction (or deduction); such philosophers carve up the space of non-deductive arguments differently than we have here, to leave space for IBE as its own type of inference. See Chapter 2 of Okasha (2016) for more details.
2 Arguably, using BIC for explanation rather than prediction would require that we know something about the extent to which each input variable in the statistical model is causally related to the output variable. BIC does not, on its own, select for causal relationships, and such relationships are typically what is desired in an explanation.
In science, we often use statistical models to provide explanations for the phenomena that generated the data. Statistical models can help construct such explanations. In many cases, there will be several candidate models for a particular set of data. For example, we might like to explain atmospheric ozone concentration based on certain known conditions, such as temperature, windspeed, humidity, and concentration of certain pollutants, such as sulfur dioxide. Many plausible models could be constructed with respect to these data—some models might include possible pollutants as explanatorily relevant to the variation in atmospheric ozone concentration, while other models might exclude (some of) these pollutants. Statisticians have come up with processes to select a “best” model among the candidates. Some criteria that measure “best”—for example Bayes’ Information Criterion (BIC)—might be thought of as a formalization of inference to the best explanation. That is, among several explanations (models) of the regularities in the data, BIC selects a “best” explanation by balancing goodness of fit with simplicity Faraway (2015).2
2.1.2 Induction by enumeration
What justifies our knowledge that all electrons have a mass of \(9.1 \times 10^{-31}\)g? Or that a hot stove will burn my hand? Or that there will be a full moon on January 18, 2030? The argument for such knowledge is often of the form (Norton, 2002):
- P:
-
All observed instances of \(A\) have had property \(p\).
- C:
-
Therefore, all (including unobserved) instances of \(A\) will have property \(p\).3
3 A more modest version of the conclusion of enumerative induction is (C) Therefore, the next unobserved instance of \(A\) will have property \(p\).
This type of argument—often called induction by enumeration, or enumerative induction—allows us to generalize from observed regularities to unobserved regularities, and as such, is indispensable to science. Often, induction by enumeration is the only justification that we have for a particular scientific fact, as is the case for the mass of electrons (Norton, 2002). In other cases, such as those that predict the phases of the moon, physical theories describe the necessary causes that produce the effect that \(A\) has property \(p\), and we don’t necessarily need to rely on induction by enumeration directly. But the justification for the physical theories themselves seems to rely on induction by enumeration: how do we know that the laws of planetary motion will hold on January 18, 2030, so that our predictive model will be accurate? We know this because all observed phenomena in the universe (\(A\)) have had the property of obeying the laws of planetary motion (\(p\)), and infer that all phenomena—including future phenomen—in the universe will obey the laws of planetary motion. That is, we know they will hold because of induction by enumeration!
2.1.3 Inference from analogy
A 1978 study of the artificial sweetener saccharin concluded that “saccharin is carcinogenic for the urinary bladder in rats and mice, and most likely is carcinogenic in human beings” Reuber (1978). How might we reason from the premise that saccharin is carcinogenic in rats to the conclusion that it is (likely) carcinogenic in humans? We might argue something like the following:
- P1:
-
Humans, on the one hand, and rats and mice on the other, share many anatomical, physiological, and genetic properties.4
4 See Bryda (2013) for evidence of the claim that there are such similarities.
Bryda, E. C. (2013). The mighty mouse: The impact of rodents on advances in biomedical research. In Missouri medicine. Journal of the Missouri State Medical Association. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3987984/ - P2:
-
Many of these shared properties are relevant to the development of different types of cancer.
- P3:
-
Saccharin has been shown to be carcinogenic in rats and mice.
- C:
-
Therefore, cancer is (likely) carcinogenic in humans.
This argument might be strengthened by another premise that claims that often in the past, when a result has been demonstrated in rats, it has also been demonstrated in humans (“Animal Research at the ICR,” 2019). We might interpret such an argument form as an argument from analogy. The general form of an argument from analogy might look something like this:
- P1’:
-
\(A\) and \(B\) share properties \(p_1,...,p_n\).
- P2’:
-
\(A\) has property \(p\) (\(p \ne p_i, \,\,\, i = 1,...,n\)).
- C’:
-
Therefore, \(B\) has property \(p\).
Such an argument is (almost) always categorized as inductive, because it is (almost) never logically inconsistent for \(B\) to not have property \(p\). And in fact, to the best of our knowledge as of this writing, C is believed to be false; there is “no consistent evidence that saccharin is associated with bladder cancer incidence” (“Artificial Sweeteners and Cancer,” 2016).
Arguments by analogy are often used in science and statistics, as suggested by the saccharin case above. For another example, in Origin of Species, Darwin draws analogy between domestic selection by breeders and selective process that arises in nature to argue for natural selection as a key mechanism for evolution Norton (2018).
2.2 The problem of induction
Common to all types of inductive inference is the fact that the inferences from premises to conclusions are risky: even if the premises are true, the conclusion does not necessarily follow. Consider the following inductive inference:
- P:
-
In a sample of \(n = 100\) University of Colorado Boulder students, 85 students claimed to have some amount of student loan debt.
- \(C^\dagger\):
-
Therefore, 85% of all University of Colorado Boulder students have some amount of student loan debt.
How can we justify this inference from P to \(C^\dagger\)? More generally, what makes inductive inference a reliable form of inference? Can we come up with an argument for the conclusion that C: inductive inferences are justified? Intuitively, we believe that inductive inference is a reliable form of inference, for example, when we believe that the key to our home will work today because it worked yesterday. Many of the conclusions that we draw, including scientific conclusions supported by statistical arguments, rely on inductive inference. However, as philosopher David Hume argued, there is no strong argument for the conclusion that C: inductive inferences are justified. That is, there is no rigorous justification for inductive inference. This fact is called the problem of induction. Let’s briefly work through Hume’s argument that leads to the problem of induction.5
5 My explanation of Hume’s argument relies on Henderson (2018).
To gain some insights into Hume’s argument, let’s first consider the ways in which the conclusion of an inductive inference, e.g., \(C^\dagger\), might be wrong. With respect to \(C^\dagger\), it might be the case that the chosen sample is biased in some way; if the sample is biased, then it may be the case that students with student debt had a higher chance (or lower chance) of being chosen for the sample. In that case, we might attempt to take a truly random sample, where every student had the same chance of being chosen. In that case, we could modify our argument:
- \(P^\dagger\)
-
In a random sample of \(n = 100\) University of Colorado Boulder students, 85 students claimed to have some amount of student loan debt.
- \(C^\dagger\)
-
Therefore, 85% of all University of Colorado Boulder students have some amount of student loan debt.
This modification does not solve the issue; still, \(C^\dagger\) can be false, while \(P^\dagger\) true. Even with a large random sample, it is possible that we are unlucky in the sense that the sample percentage differs greatly from the population percentage. A second issue with our argument is that, in inferring from a sample of University of Colorado Boulder students to the population of all University of Colorado Boulder students, we are making some assumptions about the uniformity of nature across time and space. For example, in choosing a random sample, we are assuming that:
the parameter percent of University of Colorado Boulder students who have some amount of student debt stays constant, at least across short periods of time; and
students that we have not observed are similar in the relevant ways (e.g., with respect to finances and student debt) to students that we have observed.
Taken together, philosophers call generalized versions of these assumptions, the “Uniformity Principle” (UP). The UP states that there is a kind of stability to the world; the parameters that we are attempting to estimate and the laws of nature and regularities that are associated with those parameters stay stable, or themselves change in predictable, lawlike ways. The UP plays a critical role in Hume’s claim that there is no strong justification for inductive inference. First, Hume claims that the UP appears to be assumed in any inductive inference. This claim seems quite plausible: any time that we infer a conclusion based on one of the argument types from section 1.1—e.g., that all observed electrons have mass of \(9.1 \times 10^{-31}\)g, therefore all electrons (observed and unobserved) will have this mass—we are implicitly assuming the UP. So, inductive inference cannot be truly justified without some justification for the UP. And in fact, the UP seems like the crucial premise in need of justifying.
Once Hume has established the centrality of the UP, he then notes that any justification of the UP must either be deductive or inductive. That is, the UP will either follow necessarily from the premises (deductive); or it will be possible for the premises to be true but for the UP to be false (inductive). As Hume argues, the UP cannot be justified deductively, because it’s negation does not imply a contradiction; there is nothing incoherent about a universe that isn’t uniform across space or time. So, the deductive route will not work. But further, the UP cannot be justified inductively, because any inductive argument justifying the UP would assume the UP itself, and therefore be circular. Thus, according to Hume, our hopes of justifying inductive inference are hopeless: we have failed to justify the UP, which was a necessary condition for justifying inductive inference.
2.3 The problem of induction and statistical philosophies
Hume’s problem of induction is well-known among philosophers, and especially philosophers of science. To be sure, it is a philosophical problem about science and statistics; contemporary practicing scientists and statisticians do not often engage directly with this problem. Some might even claim that worries about the justifications for the UP and inductive inference are just philosophical quibbling: the lack of a bedrock justification for induction and the UP, they argue, are not real problems for science, at least in not in practice. But there are good reasons to engage with these issues. First, many scientific and statistical methods were developed as a response to known problems with inductive inference, including Hume’s problem of induction. The developers of these methods often had the explicit goal of making inductive inferences stronger. The frequentist statistician Ronald Fisher (1890 – 1962) contextualized his work as a kind of inductive logic in various places, including in papers titled “Statistical Methods and Scientific Induction” (Fisher, 1955) and “The Logic of Inductive Inference”(Fisher, 1935). The first Bayesian analyses—including the work of Reverend Thomas Bayes (1702 – 1761) and Pierre-Simon Laplace’s (1749 – 1827)—were also developed to solve Hume’s problem of induction (Clayton, 2021; Stigler, 2018). Engaging with the problem of induction, along with proposed solutions, may help us gain a deeper understanding of the origins, justifications, and utility of standard statistical methods. In turn, we may then be in a better position to critique and apply them correctly.
2.3.1 The falsification solution
Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis.
— R.A. Fisher, Design of Experiments
Philosopher of science Karl Popper recognized that Hume’s problem of induction was, in a certain sense, insurmountable. Popper writes:
Hume, I felt, was perfectly right in pointing out that induction cannot be logically justified. He held that there can be no valid logical arguments allowing us to establish ‘that those instances, of which we have had no experience, resemble those, of which we have had experience’. Consequently ‘even after the observation of the frequent or constant conjunction of objects, we have no reason to draw any inference concerning any object beyond those of which we have had experience’ (Popper, 2010 [1963]).
As a result, Popper made no attempt to solve the problem of induction by justifying induction. Rather, he denied that induction was necessary for the proper functioning of science. Instead of generalizing from observations to theories (e.g., scientific laws), Popper believed that science properly functions by first posing the theories, and then testing those theories against particular relevant data. In this way, the proper justificatory structure of science is deductive rather than inductive: a scientific theory \(T\), so Popper claimed, can be conclusively falsified given certain empirical evidence. As an example, consider the theory \(T\): All ravens are black. This theory can be conclusively and deductively falsified with the observation of (at least) one non-black raven. The argument would be:
- P1:
-
If \(T\): all ravens are black, then any raven observed will be black.
- P2:
-
A white raven was observed.
- C:
-
Therefore, \(T\) is false.
This general argument form,
- P1:
-
If \(T\), then \(e\).
- P2:
-
Not \(e\).
- C:
-
Therefore, not \(T\).
is valid, and therefore, deductive. For Popper, falsification—the process of proposing theories and attempting to refute them—rather than induction, is the real mode of scientific progress.
To be sure, this view has some problems. For one, we might notice that there is an asymmetry between our ability to (1) reject \(T\) as false, i.e., when evidence \(e\) contradicts \(T\); and (2) accept \(T\) as true, i.e., when \(e\) does not contradict \(T\). In case (1), practically speaking, most scientific theories and hypotheses are not as easily and clearly falsifiable as \(T\). Consider the health science hypothesis \(H\): A high carbohydrate diet causes an increase in body weight. What evidence would conclusively falsify \(H\)? Perhaps, in theory, such evidence exists. We can imagine a world in which any time someone increases their carbohydrate intake for several weeks, they also increase their body weight. In such a world, we might argue:
- P1:
-
If \(H\): a high carbohydrate diet causes an increase in body weight, then any individual observed eating a high carbohydrate diet will see an increase in body weight.
- P2:
-
Thom eats a high carbohydrate diet but has not seen an increase in body weight.
- C:
-
Therefore, \(H\) is false.
However, our imagined world is not the real world; in the real world, diet is complicated. There are many other factors that also influence body weight. Strict falsification is much more difficult to attain. Statistical methods attempt to control for these other factors—as well as random variation—to isolate the effect of diet on body weight. But even then, do we know we controlled for all of the right factors? How do we know that we did not leave something out, or that random variation, rather than diet, led us to reject \(H\)? Conclusive falsification seems, in practice at least, unattainable. It is not clear exactly what evidence we could practically attain that would allow us to conclusively falsify most real-world scientific hypotheses. As we will see in ?sec-frequentist, ?sec-Bayesian, and ?sec-causation, statistical philosophies, including causal inference, can help us make inferences and practical decisions in the absence of conclusive falsification.
In case (2), \(e\) being broadly consistent with \(T\) does not confirm \(T\), because \(e\) will be consistent with other (in fact, infinitely many other) theories, \(T_i\), each of which is not equivalent to \(T\). Yet another observed black raven does not confirm \(T\), and there are many other theories consistent with new observation (e.g., \(T_1\): ravens are \(60\%\) black and \(40\%\) white). Popper’s solution to this problem is to introduce the notion of corroboration. A theory \(T\) is corroborated by \(e\) if \(e\) were produced by a “severe test”. By a “severe test”, Popper means “tests that would probably have falsified a claim if false” (Mayo, 2018). Note that corroboration is not strict confirmation, if by ‘confirmation’ one means conclusively true. Instead, corroboration is a building up of support for \(T\), through the right kinds of probes.
6 Hypothesis testing was developed separately by Fisher, on the one hand, and Neyman and Pearson on the other. The version often taught is a blend of these two methods.
If one is familiar with the statistical hypothesis testing developed by Ronald Fisher, Jerzy Neyman (1894 – 1981), and Egon Pearson (1895 – 1980), Popper’s logic of conjecture and refutation should not be entirely foreign.6 In statistical hypothesis testing, and in Popper’s falsification paradigm, a hypothesis is put forward, and a statistical procedure is conducted to attempt to falsify it. Interestingly, classical frequentist hypothesis testing starts and ends with conjecture and refutation; there was no formal method for corroboration or so-called severe testing. More recently, philosophers Deborah Mayo and Aris Spanos developed a set of statistical tools that formalize the notion of a severe test, which, when used correctly, can help corroborate hypotheses (Mayo, 2018; Mayo & Spanos, 2011). In ?sec-frequentist, we will study the statistical procedures that Fisher, Neyman, Pearson, Mayo, and Spanos have developed to deal with messy, real-world scientific theories and hypotheses.
So, do falsification and hypothesis testing succeed in solving the problem of induction? We will not be able to adequately address this question until ?sec-frequentist. But, as we will see, under the statistical assumptions posed in a statistical model, hypothesis testing provides a framework for quantifying uncertainty in our conclusions and behaviors by controlling error rates over the long run. This error control represents an important step forward in strengthening inductive inference: if the modeling assumptions are (roughly) met, we know how often we will be in error in the long run. While this paradigm does make explicit and precise statements about probabilities, it still assumes the UP—i.e., by making claims about “the long run”. But, as we saw in section 1.2, the UP cannot be justified without circularity. So, in failing to avoid the UP, strictly speaking, these statistical method has failed to circumvent the problem of induction. Nevertheless, these methods provide some guidance for belief and action under uncertainty.
2.3.2 The Bayesian solution
The most popular alternative to Popper’s falsificationist framework—and falsificationist statistical methods like hypothesis testing—is called probabilism. Probabilism is the view that conclusions, theories, and hypotheses can be assigned a degree of support through the use of probability theory (Mayo, 2018). Probabilism assigns theories a number between zero and one, which represents, roughly, how plausible the theory is. Perhaps we have the following argument:
Over one million ravens have been observed, and all have been black.
Therefore, all ravens are black. Intuitively, \(T\) has strong support. Probabilism might assign \(T\) a number close to one. It is possible that \(\neg T\): some ravens are not black. But given the lack of evidence, \(\neg T\) would be assigned a low number, close to zero.
Various attempts have been made to formalize probabilism as a theory of inductive logic (Bayes & Price, 1763; Carnap, 1962; Cox, 1946). The most famous form of probabilism, with the closest connection to statistical practice, is Bayesian probabilism. Bayesian probabilism makes use of Bayes’ theorem to update probabilities assigned to theories based on observed evidence. For example, suppose that we start out by assigning \(H\): a high carbohydrate diet causes an increase in body weight a probability of \(0.3\). Nutrition researchers studying this hypothesis conduct a study and find that \(e\): on average, participants on a high carbohydrate diet gained 3 pounds more than those on a low carbohydrate diet. Suppose that the probability of observing \(e\) if \(H\) were true is \(P(e \, | \, H) = 0.8\), and the probability of \(E\) if \(H\) were false is \(P(e \,| \, \neg H) = 0.4\). Bayes’ theorem states that7 \[ \begin{aligned} P(H \,|\, e) &= \frac{P(e \, | \, H)P(H)}{P(e \,|\, H)P(H) + P(e \,| \,\neg H)P(\neg H)} \\ &= \frac{(0.8)(0.3)}{(0.8)(0.3) + (0.4)(0.7)} \approx 0.46. \end{aligned} \]
7 If this formulation of Bayes’ theorem does not look familiar to you, do not worry. We will discuss Bayes’ theorem in detail in ?sec-Bayesian.
Thus, the updated probability of \(H\), based on observing \(e\), is higher. Probabilism aids inductive logic, in the sense that it provides a number that quantifies the strength of the conclusion (i.e., the theory or hypothesis) given the premises (evidence and assumptions).
Probabilism also has its problems. Many prominent philosophers and statisticians—Popper and Fisher among them—are vehemently opposed to the use of probability to confirm hypotheses. Popper argued that the degree of confirmation that \(e\) confers on \(H\) is not the same as the probability of \(H\) given \(e\) (Mayo, 2018; Popper, [1959] 2005). Chapman (2015) argues that, contrary to the starting point of probabilisim, probability is not equipped to extend deductive logic to reasoning about plausibility (i.e., uncertain reasoning). Fisher wrote that “probability is a ratio of frequencies, and about the frequencies of such [hypotheses] we can know nothing whatever” Fisher (1922).8 A primary problem for these thinkers is that probabilism relies on an epistemic interpretation of probability. Such an interpretation allows for probabilities to be assigned to fixed, non-repeatable features of the world. It’s not clear how such probability assignments arise. How did we come up with \(P(H) = 0.3\)? It is not tied to any repeatable process. It seems like we just made it up! For those that reject Bayesian probabilism, all probabilities must arise from empirical phenomena, and ought to be reserved for events that are (at least theoretically) repeatable.
8 We will consider issues related to interpretations of probability—e.g., whether probability just is a ratio of frequencies or not—in ?sec-probability.
As with falsification and frequentist hypothesis testing, we might ask: does Bayesian probabilism provide a solution to the problem of induction? Again, we will not be able to adequately address this question until we study Bayesian inference in more depth, in ?sec-Bayesian. Bayesian inference provides a formal framework for assessing how evidence bears on different hypotheses. Specifically, under the statistical assumptions, Bayesian inference assigns a “plausibility score”, in the form of a probability, to each hypothesis, e.g., \(P(H \,|\, e) \approx 0.46.\). These probability assignments represent an important step forward in strengthening inductive inference: if the modeling assumptions are (roughly) met then the probabilities of various hypotheses are interpreted as our degrees of belief in those hypotheses. The higher the probability, the more likely the hypothesis is to be true. While it does make explicit and precise statements about degrees of belief in various hypotheses, it still assumes that the future will be roughly like the past, i.e., it assumes the UP. But, as we saw in section 1.2, the UP cannot be justified without circularity. So, in failing to avoid the UP, this statistical method has failed to circumvent the problem of induction. Nevertheless, Bayesian methods provide some guidance for belief and action under uncertainty.
Although, strictly speaking, the statistical inference methods described in this chapter do not solve the problem of induction, they go a long way toward placing induction on a stronger foundation. These methods are also quite different from each other. One champions falsification and refutation, and the other assigns probabilities directly to theories and hypotheses. What allows for these differences? Which one provides a stronger foundation for inductive inference? Are there other statistical inference paradigms that do better? The goal of the next few chapters will be to answer these questions.