{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 1.4.2 - Beginner - Hypothesis Testing (325)\n", "\n", "COMET Team
*Oliver (Junye) Xu, Colby Chambers, Jonathan Graves,\n", "Jasmine Arora* \n", "2023-01-12\n", "\n", "## Outline\n", "\n", "### Prerequisites\n", "\n", "- Introduction to Jupyter\n", "- Introduction to R\n", "- Introduction to Visualization\n", "- Central Tendency\n", "- Distribution\n", "- Dispersion and Dependence\n", "- Confidence Intervals\n", "\n", "### Outcomes\n", "\n", "After completing this notebook, you will be able to:\n", "\n", "- Set up hypotheses to address a research question\n", "- Conduct 1-sample and 2-sample $t$-tests to address these questions\n", " in the context of population means\n", "- Use the critical value and $p$-value approaches to determine whether\n", " or not to reject a null hypothesis\n", "- Interpret type I and type II errors in order to explore how sample\n", " and population statistics relate\n", "\n", "## Introduction\n", "\n", "In the previous notebook, we covered a fundamental tool in statistics:\n", "*confidence intervals*. In this notebook, we will build on this\n", "knowledge and learn about an important inference technique, perhaps one\n", "of the most important concepts in elementary statistics: **hypothesis\n", "testing**.\n", "\n", "Hypothesis testing allows us to test precise statements about data,\n", "using a straight-forward process.\n", "\n", "1. Create a hypothesis about some phenomenon (i.e. the relationship\n", " between two variables in our dataset).\n", "2. Select a test to determine whether the sample data gives us credible\n", " reason to reject this initial hypothesis.\n", "3. Conduct the test and draw conclusions about the validity of our\n", " hypothesis.\n", "\n", "This is a very high-level summary of hypothesis testing: we will dive\n", "into the concept in much more detail throughout this notebook; along the\n", "way, we will rely on some helpful built-in functions in R to make this\n", "process more convenient.\n", "\n", "However, as you go through this notebook, pay careful attention to not\n", "just the mechanics but also the **logic** of hypothesis testing. This is\n", "perhaps the single most important concept in introductory econometrics,\n", "so a careful understanding of this material will serve you well in\n", "future courses and beyond. Let’s get started!" ], "id": "81ed7120-c392-48c1-a3af-84c71292cb15" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# loading tests\n", "source(\"beginner_hypothesis_testing(325)_tests.r\")\n", "\n", "# importing packages\n", "library(tidyverse)\n", "library(haven)\n", "\n", "# reading in the data\n", "census_data <- read_dta(\"../datasets_beginner/01_census2016.dta\")\n", "census_data <- filter(census_data, !is.na(census_data$wages))" ], "id": "8382ade5-04b4-4eb3-b014-f4ba2a5daccc" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Hypothesis Testing Procedure\n", "\n", "A hypothesis test always involves two hypotheses: the **null\n", "hypothesis** and the **alternative hypothesis**.\n", "\n", "- The null hypothesis (${H_0}$) expresses a “default” claim which is\n", " to be tested\n", "\n", "- The hypothesis (${H_1}$) expresses the contrary to the null\n", " hypothesis. Typically, our alternative hypothesis expresses what we\n", " may hope to prove about our data.\n", "\n", "> **Example**: Perhaps we suspect that the mean wage of Canadian men is\n", "> greater than \\$50,000 per year\n", ">\n", "> - Null hypothesis (${H_0}$): The mean wage of Canadian men is less\n", "> than or equal to 50,000.\n", ">\n", "> - The hypothesis (${H_1}$): The mean wage of Canadian men is greater\n", "> than 50,000.\n", ">\n", "> - If we find sufficient evidence in the data to reject the null\n", "> hypothesis, we can argue with a certain degree of certainty that\n", "> we should favour this alternative hypothesis. If we don’t find\n", "> this strong evidence, we fail to reject the null hypothesis (and\n", "> our suspicion is probably false).\n", "\n", "To determine whether we should reject the null hypothesis in favour of\n", "the alternative hypothesis, we need two key features:\n", "\n", "1. A **significance level** (denoted by $\\alpha$):\n", " - The probability which determines the criterion for deciding if a\n", " sample statistic is “unlikely” if the null hypothesis is true.\n", "2. A **test statistic**:\n", " - The number we calculate from our data: this is usually a\n", " function of various features of that data such as its mean,\n", " standard deviation, and sample size.\n", "\n", "Together, these two features provide the criterion under which we can\n", "accept or reject our null hypothesis. We can implement these using the\n", "following approaches:\n", "\n", "There are two common approaches we can use when testing a hypothesis:\n", "\n", "- The **critical value approach (rejection region)**\n", "\n", "- The $p$-value approach.\n", "\n", "Both have their uses, and we will demonstrate both in this notebook.\n", "They also have a series of steps, some of which they share in common.\n", "\n", "## Steps in Hypothesis Testing\n", "\n", "- Steps 1-3 apply identically to *both* the critical value and\n", " $p$-value approaches\n", "\n", "- Step 4, the interpretation step, diverges between the two\n", " approaches.\n", "\n", "We will start with one important type of test: the **one sample**\n", "$t$-test. This kind of test is used to evaluate statements about whether\n", "the population average is equal to a particular value - for instance,\n", "our example above with average wages being greater than \\$50,000. This\n", "test is appropriate in situations where:\n", "\n", "1. The statistic is **normally distributed**: in the case of the sample\n", " mean, when $n > 120$, invoking the Central Limit Theorem for\n", " normality.\n", "2. We **don’t know the population standard deviation** of the variable\n", " we are testing.\n", "\n", "This is very similar to when we constructed confidence intervals for a\n", "sample mean when we didn’t know the population standard deviation in the\n", "previous notebook.\n", "\n", "> **Tip**: Wikipedia actually has quite a useful article containing a\n", "> chart of [Common Hypothesis\n", "> Tests](https://en.wikipedia.org/wiki/Test_statistic) for different\n", "> kinds of statistics.\n", "\n", "### Our Example\n", "\n", "Let’s work with our Census data, and suppose that our census data\n", "represents the **entire Canadian population** and we have no prior\n", "knowledge of it.\n", "\n", "- Let’s pretend we do not observe any population values from our\n", " census data, just like in real life where it is impossible to\n", " observe population parameters!\n", "\n", "- We will randomly select a sample of observations from our census\n", " data (the population) to represent our sample.\n", "\n", "- We can then test to see if the average wage in our sample data is\n", " equal to the hypothesized average wage of the population as a whole.\n", " Let’s draw a random sample first!" ], "id": "39f01c1c-4776-4c04-a10f-daff171ff978" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "set.seed(123) # ensures the reproducibility of our code \n", "# (we get the same sample if we start with that same seed each time you run the same process)\n", "\n", "sample_data <- census_data %>% \n", " slice_sample(n = 100, # number of observations to sample from the population\n", " replace = FALSE) # without replacement" ], "id": "eb73fbed-1b5b-45c8-9600-94298078adfa" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: State the Null Hypothesis and Alternative Hypothesis\n", "\n", "The null hypothesis that a population mean wage $\\mu^{wage}$ is equal to\n", "a certain value $\\mu_{0}$ is:\n", "\n", "$$\n", "{H_0}: \\mu^{wage} = \\mu_{0}\n", "$$\n", "\n", "At this point, we have 3 choices for how to formulate our alternative\n", "hypothesis:\n", "\n", "1. **Two-Sided Test**: If we want the rejection of the null hypothesis\n", " to allow us to argue that $\\mu^{wage}$ is different from the\n", " specific value $\\mu_{0}$, then we can express our alternative\n", " hypothesis as:\n", "\n", "$$\n", "{H_1}: \\mu^{wage} \\neq \\mu_{0}\n", "$$\n", "\n", "1. **One-Sided Test (Left-Tailed)**: If we want the rejection of the\n", " null hypothesis to allow us to argue that $\\mu^{wage}$ is less than\n", " the specific value $\\mu_{0}$, then we can express our alternative\n", " hypothesis as:\n", "\n", "$$\n", "{H_1}: \\mu^{wage} < \\mu_{0}\n", "$$\n", "\n", "1. **One-Sided Test (Right-Tailed)**: If we want the rejection of the\n", " null hypothesis to allow us to argue that $\\mu^{wage}$ is greater\n", " than the specific value $\\mu_{0}$, then we can express our\n", " alternative hypothesis as:\n", "\n", "$$\n", "{H_1}: \\mu^{wage} > \\mu_{0}\n", "$$\n", "\n", "Note: We should always construct our hypotheses about *population\n", "parameters*, not sample statistics (ie: sample mean, variance or other\n", "features from the sample that can be immediately calculated).\n", "\n", "We can make hypotheses about a population variance, proportion, or\n", "various other parameters of interest. The population mean is just the\n", "default we are considering since we are focusing our attention on the\n", "one-sample $t$-test for now.\n", "\n", "Let’s take an example that literature on labour market earnings presents\n", "\n", "- A null hypothesis that the mean wage of Canadians is \\$54,000 per\n", " year.\n", "\n", "- We will set this up against a two-sided alternative—the more\n", " stringent alternative hypothesis that requires more certainty in\n", " findings to reject the null (as explained above).\n", "\n", "$$\n", "H_{0}: \\mu = 54000\n", "$$ $$\n", "H_{1}: \\mu \\neq 54000\n", "$$\n", "\n", "### Exercise\n", "\n", "Aim: Investigate the mean years of education among all citizens in a\n", "country who are over the age of 18.\n", "\n", "Hypothesis: The average years of education among adults in this country\n", "is 12 years (some degree of post-secondary education).\n", "\n", "Think about how you would set up your null hypothesis. Which of the\n", "following is *NOT* a correct alternative hypothesis?\n", "\n", "1. \\$ {H_1}: ^{education} \\< 12 \\$\n", "2. \\$ {H_1}: ^{education} \\$\n", "3. \\$ {H_1}: ^{education} \\> 12 \\$" ], "id": "99c489ee-88a9-474f-bad0-809ca9c06325" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answer_0 <- ... # fill in th e.. with your answer of either 1, 2 or 3 \n", "test_0()" ], "id": "1ff31982-0de2-4ea3-befe-f350448cfa3d" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Step 2: Choose a Significance Level $\\alpha$\n", "\n", "Before any calculation of test statistics, we must choose a\n", "**significance leve**l.\n", "\n", "- This is the probability of seeing a sample statistic at least as\n", " extreme as the one we find from our data when we assume that our\n", " null hypothesis is actually true.\n", "\n", "- We most commonly set our significance level at 0.05, or 5%, but\n", " other common values include 1%, 10% and even 20%.\n", "\n", "> **Tip**: Remember that confidence level is denoted as $1 - \\alpha$.\n", "> Here, $\\alpha$ itself is the significance level, meaning that the\n", "> **confidence level and significance level add up to 1**. It is\n", "> important not to use these two terms interchangeably!\n", "\n", "We will choose our confidence level to be 5% since this is a common\n", "standard in applied economic work:\n", "\n", "$$\n", "\\alpha = 0.05\n", "$$\n", "\n", "### Step 3: Compute the Test Statistic\n", "\n", "This is the most mathematical step, requiring some calculation. Our\n", "**test statistic** gives us a numeric benchmark at which we can reject\n", "our null hypothesis in Step 4. Calculating the test statistic is quick,\n", "but it is important to understand the intuition behind it and how it is\n", "derived.\n", "\n", "When we calculate our test statistic with the one-sample $t$-test we use\n", "the following approach:\n", "\n", "1. Take our sample statistic\n", "2. Subtract it from the mean of the sampling distribution\n", "3. Divide this value by the standard deviation of our sampling\n", " distribution\n", "\n", "The general process outlined above will always hold for calculating a\n", "test statistic. However, determining how exactly we calculate the mean\n", "and standard deviation of our desired sampling distribution will differ\n", "slightly depending on the situation, the type of parameter we are making\n", "inferences about, as well as our sample size and the shape of our\n", "population distribution.\n", "\n", "As noted, we will look below for calculating the test statistic for one\n", "case: one sample $t$-tests.\n", "\n", "Since we don’t know the standard deviation of our population but do know\n", "that the distribution of our sampling statistic is normal (since the\n", "sample size is \\> 30), we calculate our test statistic using the\n", "following **one sample** $t$-statistic:\n", "\n", "$$\n", "\\frac{\\bar x - \\mu_{0}}{(s / \\sqrt n)}\n", "$$
\n", "\n", "- $\\bar x$ is the sample mean we have found\n", "\n", "- $\\mu_{0}$ is the population mean we are assuming to be true under\n", " the null hypothesis\n", "\n", "- $H_{0} : \\mu = \\mu_{0}$, $s$ is the sample standard deviation and\n", " $n$ is the sample size.\n", "\n", "Again, the formula for calculating the test statistic will be will\n", "differ depending on the test being performed, the size of our sample,\n", "and whether our hypothesis is about a population mean, proportion, or\n", "variance." ], "id": "7859501e-f624-4458-b82c-87217d801037" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute the t-statistic/critical value for the one sample t-test\n", "t_stat <- (mean(sample_data$wages) - 54000)/(sd(sample_data$wages)/sqrt(100))\n", "t_stat" ], "id": "faf0b704-8c13-43d1-9940-d0018621290b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Interpret the Results\n", "\n", "The last step in hypothesis testing requires us to conclusively **reject\n", "or fail to reject the null hypothesis**. We can either use the\n", "**critical value approach or the** $p$**-value approach** that would\n", "bring us to the same answer. Let’s look at each of them individually.\n", "\n", "### The Critical Value (or Rejection Region) Approach:\n", "\n", "- The critical value defines the upper and lower bounds of a\n", " confidence interval, or which defines the threshold of statistical\n", " significance in a statistical test.\n", "\n", "- In this approach, we **compare our calculated test statistic to a\n", " critical value** (or values) corresponding to our chosen\n", " significance level.\n", "\n", "The critical value serves as the cutoff point beyond which we reject our\n", "null hypothesis.\n", "\n", "- We **reject our null hypothesis** if our calculated **test statistic\n", " is more extreme than the critical value** (situated more within the\n", " tail of the investigated distribution.\n", "\n", "- We **fail to reject our null hypothesis** if the **test statistic is\n", " within these bounds.**\n", "\n", "#### How are critical values computed?\n", "\n", "- Depending on our test, we determine a critical value by determining\n", " what values of statistics have an $\\alpha$-percent chance of being\n", " more extreme than the critical value.\n", "\n", "- These values are called the **rejection region** and are specific to\n", " the test. The diagrams below illustrate this process.\n", "\n", "#### **One-Sided Test (Left-Tail** **ed)**:" ], "id": "9d5cc80b-fc70-4f6d-aeba-ca401bea2f09" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# alt text for accessibility\n", "alt_text <- \"A plot visualizing the left-tailed rejection region in a probability distribution curve.\"\n", "\n", "draw_cr(\"left-tailed\", df = 30, cv = -2)\n", "mtext(alt_text, side=3, line=2.75, cex=1.1)" ], "id": "e1cf314c-eb49-4cdc-9706-55d4bef884eb" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first diagram shows where we can reject a null hypothesis such as\n", "$H_{0}:\\mu = \\mu_{0}$ in favour of the alternative hypothesis\n", "$H_{1}:\\mu < \\mu_{0}$.\n", "\n", "- Null hypothesis: $\\mu \\leq\\mu_0$\n", "\n", "- Alternative: $\\mu > \\mu_0$.\n", "\n", "- The value which had an $\\alpha$-percent of lying above it is called\n", " the critical value, and the red region represents the rejection\n", " region.\n", "\n", "#### **One-Sided Test (Right-Tailed)**:" ], "id": "140b4e12-ae3e-44ad-b206-24679a37ac18" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# alt text for accessibility\n", "alt_text <- \"A plot visualizing the right-tailed rejection region in a probability distribution curve.\"\n", "\n", "draw_cr(\"right-tailed\", df = 30, cv = 2)\n", "mtext(alt_text, side=3, line=2.75, cex=1.1)" ], "id": "4b5e5706-f86c-4ef4-aa92-859f8dc47f3b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "This second diagram shows us where we can reject a null hypothesis such\n", "as $H_{0}:\\mu = \\mu_{0}$ in favour of the alternative hypothesis\n", "$H_{1}:\\mu > \\mu_{0}$.\n", "\n", "- If our calculated test statistic falls in this red rejection region,\n", " it is “more extreme” than the critical value corresponding to our\n", " chosen significance level. This means we can can reject the null\n", " hypothesis in favour of the alternative hypothesis.\n", "\n", "- If our calculated test statistic falls within the white region, we\n", " fail to reject our null.\n", "\n", "#### **Two-Sided Test**:" ], "id": "cb12d486-70b3-4f9c-804e-4f26c683cc0a" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# alt text for accessibility\n", "\n", "alt_text <- \"A plot visualizing the two-tailed rejection region in a probability distribution curve\"\n", "\n", "draw_cr(\"two-tailed\", df = 30, cv = 2)\n", "mtext(alt_text, side=3, line=2.75, cex=1.1)" ], "id": "e349ce4e-3c5d-4bfe-a172-3a7b3a3c735c" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The third diagram shows us how to use the critical value approach to\n", "choose whether or not to reject a null hypothesis for a two-sided test.\n", "\n", "Like previous cases:\n", "\n", "- If our calculated test statistic falls within either of these\n", " rejection regions, we reject our null hypothesis. If it falls within\n", " the white region, we fail to reject our null.\n", "\n", "- The area of the total red region equals $\\alpha$, our chosen\n", " significance level.\n", "\n", "However, for a two-sided test\n", "\n", "- There are now two red regions since our alternative hypothesis is\n", " now $H_{1}: \\mu \\neq \\mu_{0}$.\n", "\n", "- Our test statistic will fall near (or in) either the left-hand\n", " rejection region or right-hand region.\n", "\n", "- The probability that it falls within that specific region itself is\n", " now half as likely as it was for the one-sided tests (the\n", " probability is now $\\alpha / 2$).\n", "\n", "- Therefore, the two-sided test is more conservative as it is less\n", " likely for our calculated test statistic to fall inside the\n", " rejection region and allow us to reject our null hypothesis." ], "id": "2becf011-eda2-49ba-8aa8-863a61aa818c" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# examples of how to compute critical values for different types of test\n", "# suppose the significance level is 0.05\n", "\n", "# finding the lower and upper critical values for a two-sided test\n", "qt(p=0.025, df=8, lower.tail=TRUE)\n", "qt(p=0.025, df=8, lower.tail=FALSE)\n", "\n", "# finding the critical value for a left-sided test\n", "qt(p=0.05, df=8, lower.tail=TRUE)\n", "\n", "# finding the critical value for a right-sided test\n", "qt(p=0.05, df=8, lower.tail=FALSE)" ], "id": "7c436214-adda-4263-8f3d-174421f25b33" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s compute the rejection regions for our sample data:" ], "id": "5145eb50-0ba3-4c7a-9e8e-6641c6a0c6f9" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# finding the lower and upper critical values for our sample data\n", "qt(p=0.025, df=99, lower.tail=TRUE)\n", "qt(p=0.025, df=99, lower.tail=FALSE)" ], "id": "b32e400e-d942-4b6c-9bca-21932e75446c" }, { "cell_type": "markdown", "metadata": {}, "source": [ "- We can see from the above that our test statistic of about 0.72 fits\n", " within the upper and lower bound critical values of -1.98 and 1.98.\n", "\n", "- Therefore the sample statistic is conditioned on the null hypothesis\n", " being true.\n", "\n", "- We thus do not have strong evidence to reject the null hypothesis.\n", "\n", "- We say that our sample mean is **not** **statistically\n", " significant**.\n", "\n", " Statistically significant results are those which we find upon\n", " rejecting the null hypothesis.\n", "\n", "### The $p$-value Approach:\n", "\n", "- In this approach, we again use our test statistic to make inferences\n", " about our population. However, we no longer rely on the diagrams\n", " above.\n", "- We instead calculate what is called a $p$-value: A number between 0\n", " and 1 indicating the probability of observing a value at least as\n", " extreme as the test statistic if ${H_0}$ is true. - For example, a\n", " $p$-value of 0.05 would mean that if the ${H_0}$ is true, there is a\n", " 5% chance of getting a test-statistic as extreme or more extreme\n", " than the one we obtained.\n", "- Small $p$-values provide evidence to reject the null hypothesis,\n", " since they indicate that the observed data is not likely if the\n", " hypothesis was true. Recall that in hypothesis testing we are trying\n", " to find evidence to reject the null hypothesis. A lower $p$-value\n", " provide evidence to reject the null hypothesis. Formally, if the\n", " $p$-value is less than or equal to our significance level $\\alpha$,\n", " the null hypothesis is rejected; otherwise, the null hypothesis is\n", " not rejected.\n", "\n", "For example,\n", "\n", "To find the $p$-value associated with a t-score in R, we can use the\n", "`pt()` function:" ], "id": "8705797b-c09a-4602-ac6b-d159afa6a801" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Examples of how to compute p values for different types of test\n", "\n", "# If the t-score/critical value is 1.8 for a right-tailed test\n", "# find p-value if the degrees of freedom is 30\n", "pt(q=1.8, df=30, lower.tail=FALSE)\n", "\n", "# If the t-score/critical value is 1.8 for a two-tailed test\n", "#find two-tailed p-value if the degrees of freedom is 30\n", "2*pt(q=1.8, df=30, lower.tail=FALSE)" ], "id": "8d57072d-96c4-4c29-9a21-8977c3f287ba" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s compute the *p*-value for our sample data:" ], "id": "ddcdc6a9-a1c2-4a40-a938-061ec554d18d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compute the p-value\n", "2*pt(q=t_stat, df=99, lower.tail=FALSE)" ], "id": "3b91058d-23ee-4fbf-9910-08ce0c0619ef" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our $p$-value is about 0.47, which is much larger than our confidence\n", "level of 0.05. This means that, assuming the null is true, it is very\n", "likely that we see a value as extreme as our sample mean. It is thus not\n", "bizarre to imagine pulling such a sample statistic when the null of\n", "54000 is in fact true. This causes us to fail to reject the null\n", "hypothesis.\n", "\n", "## Exercise\n", "\n", "Let’s say that you choose a 5% significance level and conduct a one\n", "sample $t$-test (since you’re testing a hypothesis about the mean of a\n", "single population for which you don’t know the standard deviation). You\n", "receive a $p$-value of 0.02 and correctly reject your null hypothesis.\n", "Have you proved that your null hypothesis is false?" ], "id": "ac986882-502c-4759-99d7-05c0c616d0af" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answer_1 <- \"x\" # your answer of \"yes\" or \"no\" in place of \"x\" here\n", "test_1()" ], "id": "1346a438-7e6d-4d8a-b228-730f4458f026" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s now run through the hypothesis testing procedure more quickly with\n", "a few examples, but this time through an automation process with\n", "`t.test()`. It performs one and two sample t-tests on vectors of data.\n", "\n", "## Applications of the Procedure with the R function `t.test()`\n", "\n", "## Example 1: One Sample $t$-test\n", "\n", "Recall from our previous example:\n", "\n", "$$\n", "H_{0}: \\mu = 54000\n", "$$ $$\n", "H_{1}: \\mu \\neq 54000\n", "$$" ], "id": "14f5f79a-e3b9-46a5-8dbe-56b4822d2d99" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# conduct one sample t-test\n", "t.test(x = sample_data$wages, mu = 54000, alternative = \"two.sided\", conf.level = 0.95)" ], "id": "843684fb-411b-4747-a771-5aae734009f8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Now we may go back and check if the t statistic and $p$-value here\n", "> match with our manual calculation above. It matches!\n", "\n", "The `t.test()` function in R is super helpful in that it outputs a\n", "$p$-value and test statistic immediately, allowing us to either reject\n", "or fail to reject the null hypothesis immediately.\n", "\n", "- In our sample data, our large $p$-value and non-extreme test\n", " statistic prevent us from rejecting the null.\n", "- If we had different results (ie: a $p$-value \\< 0.05 or test\n", " statistic very large in magnitude), we would say that the\n", " probability of finding the particular sample mean under the null\n", " being true is incredibly unlikely. This sample mean would thus be a\n", " statistically significant result which we could try with a high\n", " degree of uncertainty, allowing us to reject the null in favor of\n", " the alternative hypothesis.\n", "\n", "> Note: In this example, the $p$-value was about 0.47.\n", ">\n", "> - This does not mean that the probability the null hypothesis is\n", "> true is 47%.\n", ">\n", "> \n", ">\n", "> - Rather, it means that if the null hypothesis is true, the\n", "> probability of seeing a sample mean at least as far away from\n", "> 54000 is 47%.\n", ">\n", "> - Therefore, since it is quite likely to pull a sample mean this\n", "> large by chance, we cannot reject the null and we do not have\n", "> statistically significant results.\n", "\n", "## Example 2: Two Sample $t$-test\n", "\n", "- Unlike the one-sample $t$-test where we use a sample mean point\n", " estimate to test a hypothesis about a population mean, the\n", " two-sample $t$-test uses two sample means to test a hypothesis about\n", " whether the means of two independent populations differ from each\n", " another.\n", "\n", "- We will use the two sample unpooled $t$-test with unequal variances\n", " when we know:\n", "\n", " - 1\\. That both populations are normally distributed (or the sum\n", " of their sample sizes exceeds 40, invoking normality),\n", "\n", " - 2\\. That the observations are independent between the two groups\n", " (i.e. observations are not paired between populations),\n", "\n", " - 3\\. And we assume that both population standard deviations,\n", " while unknown, are different.\n", "\n", "For this example, we will test the hypothesis that there is no\n", "difference between the mean wages of Canadians and Non-Canadians. We\n", "will set this up against a two-sided alternative.\n", "\n", "$$\n", "H_{0}: \\mu_{Canadian} = \\mu_{Non-Canadian}\n", "$$ $$\n", "H_{1}: \\mu_{Canadian} \\neq \\mu_{Non-Canadian}\n", "$$\n", "\n", "We will again set our significance level at 5%.\n", "\n", "$$\n", "\\alpha = 0.05\n", "$$\n", "\n", "Again, we will assume our census data represents our population and take\n", "two random samples from it, each of which will consistent exclusively of\n", "Canadians or Non-Canadians." ], "id": "9eab0060-51ee-4261-9919-d8da502c76af" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "set.seed(123) # ensures the reproducibility of our code (we get the same sample if we start with that same seed each time you run the same process)\n", "\n", "\n", "sample_cad <- census_data %>% \n", " filter(immstat == 1) %>% \n", " slice_sample(n = 100, # number of observations to sample from the population\n", " replace = FALSE) # without replacement\n", "\n", "sample_noncad <- census_data %>% \n", " filter(immstat == 2) %>% \n", " slice_sample(n = 100, # number of observations to sample from the population\n", " replace = FALSE) # without replacement" ], "id": "3fe00cce-a15c-482e-84b5-23287c2b53bc" }, { "cell_type": "markdown", "metadata": {}, "source": [ "For fun, let’s look at our sample statistics." ], "id": "c42277a8-83d5-40ca-940f-33acfcfb0f5d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mean(sample_cad$wages)\n", "mean(sample_noncad$wages)" ], "id": "03b60ef0-1c1b-46d9-9b55-032d5fcc7a2c" }, { "cell_type": "markdown", "metadata": {}, "source": [ "- We can already see a large difference in mean wages between\n", " Canadians and Non-Canadians here.\n", "\n", "- However, we will have to conduct our $t$-test to determine if this\n", " difference is statistically significant and thus if we can reject\n", " our null hypothesis." ], "id": "2fd9800e-c529-471a-82ef-2dbd1bb68040" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# conducting our two sample t-test\n", "t.test(x=sample_cad$wages, y=sample_noncad$wages, conf.level=0.95)" ], "id": "4df4f0d6-db67-4edc-8f08-287ae6a10b82" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our $t$-test yields a $p$-value of about 0.1478, greater than our\n", "significance level of 0.05. Thus, our result is not statistically\n", "significant and we cannot reject our null hypothesis.\n", "\n", "~Note: This reveals nothing about why this is the case and does not\n", "control for any relevant factors. You will learn more about this in\n", "upcoming courses.~\n", "\n", "- We ran this $t$-test on two independent populations (Canadians and\n", " Non-Canadians).\n", "\n", "- Alternatively, if we want to compare the means of dependent\n", " populations and test whether or not they are the same, we can employ\n", " the `y ~ x` option to our `t.test()` function.\n", "\n", " - The dependent variable is on the left of the tilde (`~`)\n", "\n", " - The independent variable (or variables) is on the right of the\n", " tilde (`~`)\n", "\n", "- We also need to specify within the `t.test()` function arguments to\n", " the options `paired` and `var.equal`.\n", "\n", "- Both of these are set to FALSE by default, but we can change one or\n", " both of them to TRUE if we believe that our two samples come in\n", " pairs (a specific case of dependent samples) or the variances of the\n", " two populations are equal.\n", "\n", "> **Note**: The dependent sample t-test is also called the paired sample\n", "> t-test ie: before-treatment and after-treatment patient groups in\n", "> medical research." ], "id": "fe3301df-4fb4-4964-aa92-ebd18e801b35" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Let's create some fake data to demonstrate paired sample t-test\n", "\n", "before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)\n", "after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)\n", "\n", "df <- data.frame( \n", " group = rep(c(\"before\", \"after\"), each = 10),\n", " weight = c(before, after)\n", " )\n", "\n", "t.test(weight ~ group, \n", " data = df, \n", " paired = TRUE,\n", " conf.level=0.95)" ], "id": "7f1231c4-55b5-4da2-807f-b3fa6c905f91" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise\n", "\n", "Compare the mean earnings of those who have and have not graduated from\n", "high school. Although you don’t know the population standard deviations\n", "of earnings in each group, you determine that these are independent\n", "populations and their standard deviations must not be the same, arguing\n", "that there is a wider spread of earnings among those who graduated high\n", "school.\n", "\n", "Method:\n", "\n", "- You conduct an unpooled, unequal variance two sample $t$-test (the\n", " type of two sample $t$-test we explored earlier in our\n", " applications).\n", "\n", "- You choose a significance level of 5%, the default level used.\n", "\n", "- You set up the following hypotheses.\n", "\n", "$$\n", "H_{0}: \\mu_{graduated} = \\mu_{didn't \\ graduate}\n", "$$ $$\n", "H_{1}: \\mu_{graduated} \\neq \\mu_{didn't \\ graduate}\n", "$$\n", "\n", "- Suppose a friend instead sets up a one-sided alternative, namely\n", " that $\\mu_{graduated} > \\mu_{didn't \\ graduate}$.\n", "\n", "Assuming the null hypothesis, significance level, sample data and type\n", "of test used are identical for both you and your friend, who is more\n", "likely to receive statistically significant results?" ], "id": "7af36b86-6353-41ca-8c7d-0f9994cc5258" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answer_2 <- \"\" # your answer for \"you\" or \"your friend\" in place of \"x\" here\n", "test_2()" ], "id": "6cf5838d-49f9-430a-8dc8-a5d6441a527e" }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Moving forward with your two-sided hypothesis test, you find a\n", " sample mean statistic of 60000 for those who graduated high school\n", " and 25000 for those who didn’t graduate high school." ], "id": "f8921af4-6750-44f8-b222-d4a7c0f722fa" }, { "cell_type": "raw", "metadata": { "raw_mimetype": "text/html" }, "source": [ "" ], "id": "ca3d85e0-5ec6-461c-ba10-23778da84391" }, { "cell_type": "markdown", "metadata": {}, "source": [ "- You find for your chosen significance level and distribution of\n", " sample means for each population that the resulting test statistic\n", " in your test is 1.5, while the critical values from the student’s\n", " t-distribution are -2 and 2 respectively.\n", "\n", "Should you reject the null hypothesis that there is no statistically\n", "significant difference between the mean earnings of each population?" ], "id": "b52cded0-6fd4-421b-bad3-a6ecd2bbd2ea" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answer_3 <- \"x\" # your answer for \"yes\" or \"no\" in place of \"x\" here\n", "test_3()" ], "id": "e299e587-3627-497e-a73c-b8ea83428360" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Applications of Hypothesis Testing with Pearson’s Correlation Coefficient\n", "\n", "### Pearson Correlation Test\n", "\n", "Another parameter we can make hypotheses about is the correlation\n", "coefficient. We can use hypothesis testing to test inferences about the\n", "correlation between two variables by analyzing random samples.\n", "\n", "Let’s do this with `wages` and `mrkinc`.\n", "\n", "> Recall two variables are highly positively correlated if their correlation coefficient is close to 1, while they are highly negatively correlated if it is close to -1.\n", "\n", "Let’s suppose that we have reason to believe that `wages` and `mrkinc`\n", "are quite correlated (hence their correlation coefficient is not 0).\n", "\n", "To find support for this, we will set this up as an alternative\n", "hypothesis to be supported after rejecting the null hypothesis that\n", "there is no correlation. To determine this we have to work to reject our\n", "null hypothesis. Let’s set up the hypotheses below.\n", "\n", "$$\n", "{H_0}: r = 0\n", "$$ $$\n", "{H_1}: r \\neq 0\n", "$$\n", "\n", "Where $r$ is the population correlation coefficient between the wages\n", "and market income of Canadians. Let’s set the significance level at 5%\n", "(95% confidence level).\n", "\n", "$$\n", "\\alpha = 0.05\n", "$$\n", "\n", "Let’s now look at our sample statistic (sample correlation coefficient)\n", "to shed some light on the number whose significance we will be testing\n", "in our hypothesis test." ], "id": "cc42c6e0-121c-4eed-8845-3272696da407" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# finding the cor() between wages and mrkinc, including use=\"complete.obs\" to remove NA entries\n", "cor(census_data$wages, census_data$mrkinc, use=\"complete.obs\")" ], "id": "0c9690f0-9ae4-4ba9-b0af-ac32ce80e535" }, { "cell_type": "markdown", "metadata": {}, "source": [ "This correlation coefficient appears quite far from 0, hence we will\n", "likely be able to reject the null hypothesis in favour of our\n", "alternative hypothesis of some relationship between `wages` and `mrkinc`\n", "(possibly very strongly positive relationship).\n", "\n", "However, there is always the small chance that we happen to have pulled\n", "a sample with a strong correlation which does not otherwise exist. To\n", "prevent this error of a false positive, let’s conduct a Pearson\n", "Correlation test. Instead of having to calculate a test statistic and\n", "then calculate critical values or a $p$-value, we can just invoke the\n", "`cor.test()` function." ], "id": "2b584249-7a27-4f1b-ab5c-2e5d4aeb0f45" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Pearson correlation test\n", "cor.test(census_data$wages, census_data$mrkinc, use=\"complete.obs\") " ], "id": "603f27a8-ee73-4f59-a854-0f39ea7adfb5" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The correlation test yields a small $p$-value of 2.2e-16 \\< $\\alpha$ =\n", "0.05.\n", "\n", "Thus, we see that this correlation is statistically significant and\n", "reject the null hypothesis in favour of the alternative hypothesis that\n", "the true correlation coefficient is not zero.\n", "\n", "## Type I and Type II Errors\n", "\n", "One thing that is crucial to remember is that our hypothesis test may\n", "not always be correct. While a hypothesis test provides strong evidence\n", "for us to reject or fail to reject a null hypothesis, it is not\n", "concrete.\n", "\n", "We never say that we “accept” the null hypothesis, instead preferring to\n", "say that we “fail to reject” the null hypothesis when no strong evidence\n", "exists against it.\n", "\n", "Similarly, we never say that we “accept” the alternative hypothesis,\n", "only that we “reject the null hypothesis in favour of the alternative\n", "hypothesis”.\n", "\n", "Neither hypothesis can conclusively be proven as true or false.\n", "Therefore, we may occasionally make incorrect decisions about rejecting\n", "or failing to reject a null hypothesis.\n", "\n", "These errors are called **type I errors** and **type II errors**.\n", "\n", "| Null Hypothesis is… | True | False |\n", "|----------------------|-------------------------|-------------------------|\n", "| **Rejected** | Type I Error: False Positive probability = $\\alpha$ | Correct Decision: False positive probability= $1 -\\beta$ |\n", "| **Not rejected** | Correct Decision: True negative probability= $1 - \\alpha$ | Type II Error: False negative probability= $\\beta$. |\n", "\n", "#### Type I error (false positive):\n", "\n", "- This happens when we draw a sample statistic which appears\n", " incredibly unlikely under the null hypothesis and then **falsely\n", " assume that our null hypothesis is incorrect.** In reality, that\n", " sample statistic could have just been an unlikely pull under a true\n", " null hypothesis.\n", "- It means concluding that results are **statistically significant**\n", " when, in reality, they came about purely by chance or because of\n", " unrelated factors.\n", "- The probability of making a type I error is denoted by $\\alpha$ and\n", " is the significance level that we choose in the beginning.\n", "\n", "#### Type II error (false negative):\n", "\n", "- This can occur when we pull a sample statistic which is seemingly\n", " reasonable under our null hypothesis and **falsely assume that we\n", " cannot reject the null**. In reality, that sample statistic could\n", " have just been an unlikely pull which would have otherwise\n", " encouraged us to reject the null.\n", "- Probability of Type II error increases when we may **not have had\n", " enough statistical power**, such as when our sample size is too\n", " small, our level of significance is too high, or we are using a test\n", " that is not sensitive enough to detect a true difference.\n", "- The probability of making a type II error is denoted as $1 - \\alpha$\n", " or $\\beta$, while the probability of correctly rejecting a false\n", " null hypothesis is $1 - \\beta$ and is known as the power of the\n", " test.\n", "\n", "> Note:\n", ">\n", "> - A higher confidence level and a lower significance level decreases\n", "> $\\alpha$, Type I error risk, but increases $\\beta$,Type II error\n", "> risk.\n", ">\n", "> - A higher statistical power decreases $\\beta$, Type II error risk,\n", "> but increases $\\alpha$, Type I error risk.\n", ">\n", "> - For a fixed sample size, the smaller the $\\alpha$, the larger the\n", "> $\\beta$.\n", ">\n", "> - Hence, there is a constant tradeoff between making type I and II\n", "> errors.\n", ">\n", "> - Remember we select $\\alpha$ , our significance level and hence the\n", "> probability of falsely rejecting a true null hypothesis before we\n", "> even calculate our test statistic. However, we can never select\n", "> for the probability of failing to reject a false null, $\\beta$.\n", "> This probability instead emerges in the testing process.\n", "\n", "### Exercise\n", "\n", "Suppose you choose a 5% significance level and conduct a one sample\n", "t-test with $p$-value of 0.13 and correctly reject the null hypothesis,\n", "but then concludes that the results are not statistically significant.\n", "What error have you made?" ], "id": "9da52076-f68a-4d68-8529-a8521bb5ba97" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "answer_4 <- \"x\" # your answer of \"type 1\" or \"type 2\" in place of \"x\" here\n", "test_4()" ], "id": "02994233-1bc0-43e4-b20d-fa42d37b2f0d" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "ir", "display_name": "R", "language": "r" } } }