# loading tests
source("testing_hypothesis_testing.r")
# importing packages
library(tidyverse)
library(haven)
# reading in the data
<- read_dta("../datasets/01_census2016.dta")
census_data <- filter(census_data, !is.na(census_data$wages)) census_data
04 - ECON 325: Hypothesis Testing
Outline
Prerequisites
- Introduction to Jupyter
- Introduction to R
- Introduction to Visualization
- Central Tendency
- Distribution
- Dispersion and Dependence
- Confidence Intervals
Outcomes
After completing this notebook, you will be able to:
- Set up hypotheses to address a research question
- Conduct 1-sample and 2-sample \(t\)-tests to address these questions in the context of population means
- Use the critical value and \(p\)-value approaches to determine whether or not to reject a null hypothesis
- Interpret type I and type II errors in order to explore how sample and population statistics relate
Introduction
In the previous notebook, we covered a fundamental tool in statistics: confidence intervals. In this notebook, we will build on this knowledge and learn about an important inference technique, perhaps one of the most important concepts in elementary statistics: hypothesis testing.
Hypothesis testing allows us to test precise statements about data, using a straight-forward process.
- We first create a hypothesis about some phenomenon (i.e. the relationship between two variables in our dataset).
- We then select a test to determine whether the sample data gives us credible reason to reject this initial hypothesis.
- Finally, we conduct our test and draw conclusions about the validity of our hypothesis.
This is a very high-level summary of hypothesis testing: we will dive into the concept in much more detail throughout this notebook; along the way, we will rely on some helpful built-in functions in R to make this process more convenient.
However, as you go through this notebook, pay careful attention to not just the mechanics but also the logic of hypothesis testing. This is perhaps the single most important concept in introductory econometrics, so a careful understanding of this material will serve you well in future courses and beyond. Let’s get started!
The Hypothesis Testing Procedure
A hypothesis test always involves two hypotheses: the null hypothesis and the alternative hypothesis. The null hypothesis (\({H_0}\)) expresses a “default” claim which is to be tested, while the alternative hypothesis (\({H_1}\)) expresses the contrary to the null hypothesis. Typically, our alternative hypothesis expresses what we may hope to prove about our data.
Example: Perhaps we suspect that the mean wage of Canadian men is greater than $50,000 per year. As a default claim, we will assert the null hypothesis that the mean wage of Canadian men is less than or equal to 50,000. Our alternative hypothesis will then express that the mean wage of Canadian men is greater than 50,000. If we find sufficient evidence in the data to reject the null hypothesis, we can argue with a certain degree of certainty that we should favour this alternative hypothesis. If we don’t find this strong evidence, we fail to reject the null hypothesis (and our suspicion is probably false).
To determine whether we should reject the null hypothesis in favour of the alternative hypothesis, we need two key features:
- A significance level (denoted by \(\alpha\)) which is the probability which determines the criterion for deciding if a sample statistic is “unlikely” if the null hypothesis is true.
- A test statistic which is number we calculate from our data: this is usually a function of various features of that data such as its mean, standard deviation, and sample size.
Together, these two features provide the criterion under which we can accept or reject our null hypothesis. We can implement these using the following approaches:
There are two common approaches we can use when testing a hypothesis: the critical value approach (rejection region) and \(p\)-value approach. Both have their uses, and we will demonstrate both in this notebook. They also have a series of steps, some of which they share in common.
Steps in Hypothesis Testing
Let’s now concretely explore the steps in both approaches. Steps 1-3 apply identically to both the critical value and \(p\)-value approaches, and are always followed. Step 4, the interpretation step, diverges between the two approaches.
We will start with one important type of test: the one sample \(t\)-test. This kind of test is used to evaluate statements about whether the population average is equal to a particular value - for instance, our example above with average wages being greater than $50,000. This test is appropriate in situations where:
- The statistic is normally distributed: in the case of the sample mean, when \(n > 120\), invoking the Central Limit Theorem for normality.
- We don’t know the population standard deviation of the variable we are testing.
This is very similar to when we constructed confidence intervals for a sample mean when we didn’t know the population standard deviation in the previous notebook. While a common type of test, this is just one of many possibilities for constructing a hypothesis test; however, we will use it as our example for brevity’s sake as we go through the general hypothesis testing procedure.
Tip: Wikipedia actually has quite a useful article containing a chart of Common Hypothesis Tests for different kinds of statistics.
Our Example
Let’s work with our Census data, and suppose that our census data represents the entire Canadian population and we have no prior knowledge of it. That is, pretend we do not observe any population values from our census data, just like in real life where it is impossible to observe population parameters! We will then randomly select a sample of observations from our census data (the population) to represent our sample. We can then test to see if the average wage in our sample data is equal to the hypothesized average wage of the population as a whole. Let’s draw a random sample first!
set.seed(123) # ensures the reproducibility of our code
# (we get the same sample if we start with that same seed each time you run the same process)
<- census_data %>%
sample_data slice_sample(n = 100, # number of observations to sample from the population
replace = FALSE) # without replacement
Step 1: State the Null Hypothesis and Alternative Hypothesis
The null hypothesis that a population mean wage \(\mu^{wage}\) is equal to a certain value \(\mu_{0}\) is:
\[ {H_0}: \mu^{wage} = \mu_{0} \]
At this point, we have 3 choices for how to formulate our alternative hypothesis:
- Two-Sided Test: If we want the rejection of the null hypothesis to allow us to argue that \(\mu^{wage}\) is different from the specific value \(\mu_{0}\), then we can express our alternative hypothesis as:
\[ {H_1}: \mu^{wage} \neq \mu_{0} \]
- One-Sided Test (Left-Tailed): If we want the rejection of the null hypothesis to allow us to argue that \(\mu^{wage}\) is less than the specific value \(\mu_{0}\), then we can express our alternative hypothesis as:
\[ {H_1}: \mu^{wage} < \mu_{0} \]
- One-Sided Test (Right-Tailed): If we want the rejection of the null hypothesis to allow us to argue that \(\mu^{wage}\) is greater than the specific value \(\mu_{0}\), then we can express our alternative hypothesis as:
\[ {H_1}: \mu^{wage} > \mu_{0} \]
In all of these hypotheses, and for hypotheses in general, it is important to remember that we construct our hypotheses about population parameters, not sample statistics. This is because the purpose of the hypothesis testing procedure is for us to make inferences about a population. When we collect a sample, we can immediately calculate its sample mean, variance or other features. There is thus no use to construct hypotheses about these parameters! Further, while the example above considers hypotheses about a population mean, we can make hypotheses about a population variance, proportion, or various other parameters of interest. The population mean is just the default we are considering since we are focusing our attention on the one-sample \(t\)-test for now.
Let’s suppose that the literature on labour market earnings presents a null hypothesis that the mean wage of Canadians is $54,000 per year. We will set this up against a two-sided alternative, since this is a more stringent alternative hypothesis and requires us to have more certainty in our results in order to reject the null (as explained above).
\[ H_{0}: \mu = 54000 \] \[ H_{1}: \mu \neq 54000 \]
Exercise
Suppose that you are investigating the mean years of education among all citizens in a country who are over the age of 18. Let’s say you want to find support for the idea that the mean years of education among adults in this country equals 12, indicating that the average years of schooling accumulated per person amounts to some degree of post-secondary education. Think about how you would set up your null hypothesis. Which of the following is NOT a correct alternative hypothesis?
- $ {H_1}: ^{education} < 12 $
- $ {H_1}: ^{education} $
- $ {H_1}: ^{education} > 12 $
<- ... # fill in th e.. with your answer of either 1, 2 or 3
answer_0 test_0()
Step 2: Choose a Significance Level \(\alpha\)
Before any calculation of test statistics, we must choose a significance level. As a reminder, this is the probability of seeing a sample statistic at least as extreme as the one we find from our data when we assume that our null hypothesis is actually true. We most commonly set our significance level at 0.05, or 5%, but other common values include 1%, 10% and even 20%. You will understand the importance of this number more when we reach the interpretation stage.
Tip: Remember in the previous notebook on confidence intervals that the confidence level is denoted as \(1 - \alpha\). Here, \(\alpha\) itself is the significance level, meaning that the confidence level and significance level add up to 1. It is important not to use these two terms interchangeably!
We will choose our confidence level to be 5% since this is a common standard in applied economic work:
\[ \alpha = 0.05 \]
Step 3: Compute the Test Statistic
This is the most mathematical step, requiring some calculation. Our test statistic gives us a numeric benchmark at which we can reject our null hypothesis in Step 4. Calculating the test statistic is quick, but it is important to understand the intuition behind it and how it is derived.
When we calculate our test statistic with the one-sample \(t\)-test we use the following approach:
- We take our sample statistic
- We subtract it from the mean of our sampling distribution
- We then divide this value by the standard deviation of our sampling distribution
In general, this process will differ slightly depending on the situation we find ourselves in. The type of parameter we are making inferences about, as well as our sample size and the shape of our population distribution will all play important roles in determining how exactly we calculate the mean and standard deviation of our desired sampling distribution; however, the general process outlined above will always hold for calculating a test statistic.
As noted, we will look below for calculating the test statistic for one case: one sample \(t\)-tests.
Since we don’t know the standard deviation of our population but do know that the distribution of our sampling statistic is normal (since the sample size is > 30), we calculate our test statistic using the following one sample \(t\)-statistic:
\[
\frac{\bar x - \mu_{0}}{(s / \sqrt n)}
\]
where \(\bar x\) is the sample mean we have found, \(\mu_{0}\) is the population mean we are assuming to be true under the null hypothesis \(H_{0} : \mu = \mu_{0}\), \(s\) is the sample standard deviation and \(n\) is the sample size.
Again, the test statistic will be calculated using many different formulas depending on the test being performed, what we know about it, the size of our sample, and whether our hypothesis is about a population mean, proportion, or variance.
# Compute the t-statistic/critical value for the one sample t-test
<- (mean(sample_data$wages) - 54000)/(sd(sample_data$wages)/sqrt(100))
t_stat t_stat
Step 4: Interpret the Results
This the final step of our hypothesis testing. It allows us to conclusively reject or fail to reject the null hypothesis we have set up. Importantly, this is the first step in the hypothesis testing process that allows us to choose one of a couple of methods to arrive at the same answer. We can either use the critical value approach or the \(p\)-value approach. Let’s look at each of them individually.
The Critical Value (or Rejection Region) Approach:
In this approach, we compare our calculated test statistic to a critical value (or values) corresponding to our chosen significance level. The critical value serves as the cutoff point beyond which we reject our null hypothesis. This means that if our calculated test statistic is more extreme than the critical value (situated more within the tail of the investigated distribution), we reject our null hypothesis. If the test statistic is instead within these bounds, we fail to reject our null hypothesis.
How are critical values computed? Depending on our test, we determine a critical value by determining what values of statistics have an \(\alpha\)-percent chance of being more extreme than the critical value. These values are called the rejection region and are specific to the test. The diagrams below illustrate this process.
# alt text for accessibility
<- "A plot visualizing the left-tailed rejection region in a probability distribution curve."
alt_text
draw_cr("left-tailed", df = 30, cv = -2)
mtext(alt_text, side=3, line=2.75, cex=1.1)
In this one-sided test, our null hypothesis that \(\mu \leq\mu_0\), and the alternative is that \(\mu > \mu_0\). The value which had an \(\alpha\)-percent of lying above it is called the critical value, and the red region represents the rejection region. The first diagram shows us where we can reject a null hypothesis such as \(H_{0}:\mu = \mu_{0}\) in favour of the alternative hypothesis \(H_{1}:\mu < \mu_{0}\)”
# alt text for accessibility
<- "A plot visualizing the right-tailed rejection region in a probability distribution curve."
alt_text
draw_cr("right-tailed", df = 30, cv = 2)
mtext(alt_text, side=3, line=2.75, cex=1.1)
This second diagram shows us where we can reject a null hypothesis such as \(H_{0}:\mu = \mu_{0}\) in favour of the alternative hypothesis \(H_{1}:\mu > \mu_{0}\). In either case, the red region is our rejection region. If our calculated test statistic falls in this region, it is “more extreme” than the critical value corresponding to our chosen significance level. This signals to us that we can can reject the null hypothesis in favour of the alternative hypothesis. If our calculated test statistic falls within the white region, we fail to reject our null.
# alt text for accessibility
<- "A plot visualizing the two-tailed rejection region in a probability distribution curve"
alt_text
draw_cr("two-tailed", df = 30, cv = 2)
mtext(alt_text, side=3, line=2.75, cex=1.1)
The third diagram, on the other hand, shows us how to use the critical value approach to choose whether or not to reject a null hypothesis for a two-sided test. There are now two red regions for this type of test, indicative of the fact that our alternative hypothesis is now \(H_{1}: \mu \neq \mu_{0}\). Again, if our calculated test statistic falls within either of these rejection regions, we reject our null hypothesis. If it falls within the white region, we fail to reject our null. Pay attention to the fact that in all three cases, the area of the total red region equals \(\alpha\), our chosen significance level.
However, for the two-sided test, this area is split up. Considering that our test statistic will deterministically fall near (or in) either the left-hand rejection region or right-hand region, the probability that it falls within that specific region itself is now half as likely as it was for the one-sided tests (the probability is now \(\alpha / 2\)). In this sense, the two-sided test is more conservative. It is less likely for our calculated test statistic to fall inside the rejection region and allow us to reject our null hypothesis.
# examples of how to compute critical values for different types of test
# suppose the significance level is 0.05
# finding the lower and upper critical values for a two-sided test
qt(p=0.025, df=8, lower.tail=TRUE)
qt(p=0.025, df=8, lower.tail=FALSE)
# finding the critical value for a left-sided test
qt(p=0.05, df=8, lower.tail=TRUE)
# finding the critical value for a right-sided test
qt(p=0.05, df=8, lower.tail=FALSE)
Let’s compute the rejection regions for our sample data:
# finding the lower and upper critical values for our sample data
qt(p=0.025, df=99, lower.tail=TRUE)
qt(p=0.025, df=99, lower.tail=FALSE)
We can see from the above that our test statistic of about 0.72 fits nicely within the upper and lower bound critical values of -1.98 and 1.98. Thus, again, it is very likely we see the sample statistic we have actually found conditioned on the null hypothesis being true. We thus do not have strong evidence to reject the null hypothesis. Further, we say that our sample mean is not statistically significant. Statistically significant results are those which we find upon rejecting the null hypothesis.
The \(p\)-value Approach:
In this approach, we again use our test statistic to make inferences about our population. However, we no longer rely on the diagrams above. In this approach, we instead calculate what is called a \(p\)-value. The \(p\)-value is the probability of observing a value at least as extreme as the test statistic if \({H_0}\) is true. \(p\)-value lies between 0 and 1 because it is a probability. For example, a \(p\)-value of 0.05 would mean that if the \({H_0}\) is true, there is a 5% chance of getting a test-statistic as extreme or more extreme than the one we obtained.
Small \(p\)-values provide evidence to reject the null hypothesis, since they indicate that the observed data is not likely if the hypothesis was true. Think of it this way, would you be more happy if I say that there is a 10% chance of seeing this test statistic if \({H_0}\) was true or if I said there is a 1% chance of seeing this test statistic if \({H_0}\) was true. I think you would be more happy if there was a 1% chance because that would mean that it is more unlikely to see this test statistic if \({H_0}\) was true which means that we can be more confident that \({H_0}\) is not true. Recall that in hypothesis testing we are trying to find evidence to reject the null hypothesis. A lower \(p\)-value provide evidence to reject the null hypothesis. Formally, if the \(p\)-value is less than or equal to our significance level \(\alpha\), the null hypothesis is rejected; otherwise, the null hypothesis is not rejected.
For example,
To find the \(p\)-value associated with a t-score in R, we can use the pt()
function:
# Examples of how to compute p values for different types of test
# If the t-score/critical value is 1.8 for a right-tailed test
# find p-value if the degrees of freedom is 30
pt(q=1.8, df=30, lower.tail=FALSE)
# If the t-score/critical value is 1.8 for a two-tailed test
#find two-tailed p-value if the degrees of freedom is 30
2*pt(q=1.8, df=30, lower.tail=FALSE)
Let’s compute the p-value for our sample data:
# Compute the p-value
2*pt(q=t_stat, df=99, lower.tail=FALSE)
Our \(p\)-value is about 0.47, which is much larger than our confidence level of 0.05. This means that, assuming the null is true, it is very likely that we see a value as extreme as our sample mean. It is thus not bizarre to imagine pulling such a sample statistic when the null of 54000 is in fact true. This causes us to fail to reject the null hypothesis.
Exercise
Let’s say that you choose a 5% significance level and conduct a one sample \(t\)-test (since you’re testing a hypothesis about the mean of a single population for which you don’t know the standard deviation). You receive a \(p\)-value of 0.02 and correctly reject your null hypothesis. Have you proved that your null hypothesis is false?
<- "x" # your answer of "yes" or "no" in place of "x" here
answer_1 test_1()
Let’s now run through the hypothesis testing procedure more quickly with a few examples, but this time through an automation process with t.test()
. It performs one and two sample t-tests on vectors of data.
Applications of the Procedure with the R function t.test()
Example 1: One Sample \(t\)-test
Recall from our previous example:
\[ H_{0}: \mu = 54000 \] \[ H_{1}: \mu \neq 54000 \]
# conduct one sample t-test
t.test(x = sample_data$wages, mu = 54000, alternative = "two.sided", conf.level = 0.95)
Now we may go back and check if the t statistic and \(p\)-value here match with our manual calculation above. It matches!
The t.test()
function in R is super helpful in that it allows us to either reject or fail to reject the null hypothesis immediately. This is because it outputs a \(p\)-value and test statistic immediately. In our sample data, our large \(p\)-value and non-extreme test statistic prevent us from rejecting the null. If we received much different results (say a \(p\)-value < 0.05 or test statistic very large in magnitude), we would say that the probability of finding the particular sample mean under the null being true is incredibly unlikely. This sample mean would thus be a statistically significant result which we could try with a high degree of uncertainty, allowing us to reject the null in favour of the alternative hypothesis.
One final crucial point about the \(p\)-value: In this example, the \(p\)-value was about 0.47. This does not mean that the probability the null hypothesis is true is 47%. It instead means that, conditioning on the null hypothesis being true, the probability of seeing a sample mean at least as far away from 54000 is 47%. This is why we cannot reject the null and did not have statistically significant results. It is quite likely to pull a sample mean this large by chance under the given situation.
Example 2: Two Sample \(t\)-test
Next let us look at the two-sample \(t\)-test. Unlike the one-sample \(t\)-test where we use a sample mean point estimate to test a hypothesis about a population mean, the two-sample \(t\)-test uses two sample means to test a hypothesis about the difference between two population means. More specifically, in this case, the two sample \(t\)-test will test whether the means of two independent populations differ from each another. We will specifically use the two sample unpooled \(t\)-test with unequal variances, which is used to tests hypotheses about the difference between population means when we know: 1. That both populations are normally distributed (or the sum of their sample sizes exceeds 40, invoking normality), 2. That the observations are independent between the two groups (i.e. observations are not paired between populations), 3. And we assume that both population standard deviations, while unknown, are different.
For this example, we will test the hypothesis that there is no difference between the mean wages of Canadians and Non-Canadians. We will set this up against a two-sided alternative.
\[ H_{0}: \mu_{Canadian} = \mu_{Non-Canadian} \] \[ H_{1}: \mu_{Canadian} \neq \mu_{Non-Canadian} \]
We will again set our significance level at 5%.
\[ \alpha = 0.05 \]
Again, we will assume our census data represents our population and take two random samples from it, each of which will consistent exclusively of Canadians or Non-Canadians.
set.seed(123) # ensures the reproducibility of our code (we get the same sample if we start with that same seed each time you run the same process)
<- census_data %>%
sample_cad filter(immstat == 1) %>%
slice_sample(n = 100, # number of observations to sample from the population
replace = FALSE) # without replacement
<- census_data %>%
sample_noncad filter(immstat == 2) %>%
slice_sample(n = 100, # number of observations to sample from the population
replace = FALSE) # without replacement
For fun, let’s look at our sample statistics.
mean(sample_cad$wages)
mean(sample_noncad$wages)
We can already see a large difference in mean wages between Canadians and Non-Canadians here. However, we will have to conduct our \(t\)-test to determine whether or not this difference is statistically significant and thus whether or not we can reject our null hypothesis.
# conducting our two sample t-test
t.test(x=sample_cad$wages, y=sample_noncad$wages, conf.level=0.95)
Our \(t\)-test yields a \(p\)-value of about 0.1478, greater than our significance level of 0.05. Thus, our result is not statistically significant and we cannot reject our null hypothesis; however, this reveals nothing about why this is the case and does not control for any relevant factors. You will learn more about this in upcoming courses.
We ran this \(t\)-test on two independent populations (Canadians and Non-Canadians). Alternatively, if we want to compare the means of dependent populations and test whether or not they are the same, we can employ the y ~ x
option to our t.test()
function. The variable on the left-hand side of the tilde (~
) is the dependent variable, while the variables on the right-hand side is the independent variable (or variables). We also need to specify within the t.test()
function arguments to the options paired
and var.equal
. Both of these are set to FALSE by default, but we can change one or both of them to TRUE if we believe that our two samples come in pairs (a specific case of dependent samples) or the variances of the two populations are equal.
Note: The dependent sample t-test is also called the paired sample t-test. A perfect example of two dependent samples would be before-treatment and after-treatment patient groups in medical research.
# Let's create some fake data to demonstrate paired sample t-test
<-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
before <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
after
<- data.frame(
df group = rep(c("before", "after"), each = 10),
weight = c(before, after)
)
t.test(weight ~ group,
data = df,
paired = TRUE,
conf.level=0.95)
Exercise
Suppose you instead want to compare the mean earnings of those who have and have not graduated from high school. You determine that these are independent populations. Further, even though you don’t know the population standard deviations of earnings in each group, you determine that these standard deviations must not be the same, arguing that there is a wider spread of earnings among those who graduated high school. For these reasons, you conduct an unpooled, unequal variance two sample \(t\)-test, the type of two sample \(t\)-test we explored earlier in our applications. You then set up the following hypotheses.
\[ H_{0}: \mu_{graduated} = \mu_{didn't \ graduate} \] \[ H_{1}: \mu_{graduated} \neq \mu_{didn't \ graduate} \]
You then choose a significance level of 5%, the default level used. Suppose a friend instead sets up a one-sided alternative, namely that \(\mu_{graduated} > \mu_{didn't \ graduate}\). Assuming the null hypothesis, significance level, sample data and type of test used are identical for both you and your friend, who is more likely to receive statistically significant results?
<- "" # your answer for "you" or "your friend" in place of "x" here
answer_2 test_2()
Moving forward with your two-sided hypothesis test, you find a sample mean statistic of 60000 for those who graduated high school and 25000 for those who didn’t graduate high school. You find for your chosen significance level and distribution of sample means for each population that the resulting test statistic in your test is 1.5, while the critical values from the student’s t-distribution are -2 and 2 respectively. Should you reject the null hypothesis that there is no statistically significant difference between the mean earnings of each population?
<- "x" # your answer for "yes" or "no" in place of "x" here
answer_3 test_3()
Applications of Hypothesis Testing with Pearson’s Correlation Coefficient
Pearson Correlation Test
Another parameter we can make hypotheses about is the correlation coefficient. We can use hypothesis testing to test inferences about the correlation between two variables by analyzing random samples. Let’s do this with wages
and mrkinc
. Recall from the Dependence and Dispersion Notebook that two variables are highly positively correlated if their correlation coefficient is close to 1, while they are highly negatively correlated if it is close to -1. Let’s suppose that we have reason to believe that wages
and mrkinc
are quite correlated (hence their correlation coefficient is not 0). To find support for this, we will set this up as an alternative hypothesis to be supported after rejecting the null hypothesis that there is no correlation. In this way, we have to work to reject our null hypothesis in order to find support for this belief. Let’s set up the hypotheses below.
\[ {H_0}: r = 0 \] \[ {H_1}: r \neq 0 \]
where \(r\) is the population correlation coefficient between the wages and market income of Canadians. Let’s further choose our significance level to be the default 5% (95% confidence level).
\[ \alpha = 0.05 \]
Let’s now look at our sample statistic (sample correlation coefficient) to shed some light on the number whose significance we will be testing in our hypothesis test.
# finding the cor() between wages and mrkinc, including use="complete.obs" to remove NA entries
cor(census_data$wages, census_data$mrkinc, use="complete.obs")
This correlation coefficient appears to be quite different from 0, giving us some reason to believe we will likely be able to reject the null hypothesis in favour of our alternative hypothesis of some relationship between wages
and mrkinc
(in this case, a possibly very strongly positive relationship). However, there is always the small chance that we happen to have pulled a sample with a strong correlation which does not otherwise exist. To guard against this error of a false positive, we will conduct a Pearson Correlation test. Luckily, instead of having to calculate a test statistic and then calculate critical values or a \(p\)-value, we can invoke the cor.test()
function.
# Pearson correlation test
cor.test(census_data$wages, census_data$mrkinc, use="complete.obs")
The correlation test yields an incredibly small \(p\)-value of 2.2e-16 < \(\alpha\) = 0.05. Thus, we see that this correlation is statistically significant and reject the null hypothesis in favour of the alternative hypothesis that the true correlation coefficient is not zero.
Type I and Type II Errors
One thing that is crucial to remember is that our hypothesis test may not always be correct. While a hypothesis test provides strong evidence for us to reject or fail to reject a null hypothesis, it is not concrete. That is why we never say that we “accept” the null hypothesis, instead preferring to say that we “fail to reject” the null hypothesis when no strong evidence exists against it. Similarly, we never say that we “accept” the alternative hypothesis, only that we “reject the null hypothesis in favour of the alternative hypothesis”. Neither hypothesis can conclusively be proven as true or false. Otherwise, we could calculate these parameters directly and there would be no need for constructing hypotheses about them!
Due to this lack of concreteness, we may occasionally make incorrect decisions about rejecting or failing to reject a null hypothesis. These errors are called type I errors and type II errors. A type I error occurs when we incorrectly reject a true null hypothesis, otherwise known as a “false positive”. This can occur when we draw a sample statistic which appears incredibly unlikely under the null hypothesis and then falsely assume that means our null hypothesis is incorrect. In reality, that sample statistic could have just been an unlikely pull under a true null hypothesis. Type II errors, on the other hand, occur when we to fail to reject a false null hypothesis, otherwise known as a “false negative”. This can occur when we pull a sample statistic which is seemingly reasonable under our null hypothesis and falsely assume that we cannot reject the null. In reality, that sample statistic could have just been an unlikely pull which would have otherwise encouraged us to reject the null.
The probability of making a type I error is denoted by \(\alpha\) and is the significance level of our test. Conversely, the probability of correctly failing to reject a true null hypothesis is \(1 - \alpha\). The probability of a type II error is denoted by \(\beta\), while the probability of correctly rejecting a false null hypothesis is \(1 - \beta\) and is known as the power of the test. A higher confidence level (corresponds to lower significance value) reduces the chances of getting a false positive and increases the chances of getting a false negative. In other words, for a fixed sample size, the smaller the \(\alpha\), the larger the \(\beta\). This indicates that there is a constant tradeoff betweeen making type I and II errors. As well, it is important to remember that we select our significance level and hence the probability of falsely rejecting a true null hypothesis before we even calculate our test statistic (it is Step 2). Conversely, we can never select for the probability of failing to reject a false null, \(\beta\). This probability instead emerges in the testing process.
Exercise
Suppose you choose a 5% significance level and conduct a one sample t-test with \(p\)-value of 0.13 and correctly reject the null hypothesis, but then concludes that the results are not statistically significant. What error have you made?
<- "x" # your answer of "type 1" or "type 2" in place of "x" here
answer_4 test_4()