{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 09 - Combining Graphs\n",
        "\n",
        "Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt  \n",
        "2024-05-29\n",
        "\n",
        "## Prerequisites\n",
        "\n",
        "1.  Load data and packages.\n",
        "2.  Create variables and objects.\n",
        "3.  Have some familiarity with the syntax of commands to create basic\n",
        "    graphs.\n",
        "\n",
        "## Learning Outcomes\n",
        "\n",
        "1.  Identify best practices for data visualization.\n",
        "2.  Feel comfortable with combining graphs using facets in `ggplot2`.\n",
        "\n",
        "## 9.0 Intro\n",
        "\n",
        "We’ll continue working with the fake data set we have been using so far.\n",
        "Recall that this data set is simulating information for workers in the\n",
        "years 1982-2012 in a fake country where a training program was\n",
        "introduced in 2003 to boost their earnings."
      ],
      "id": "6f42e5cc-e4ad-4aeb-98b5-ac87d6fff705"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Clear the memory from any pre-existing objects\n",
        "rm(list=ls())\n",
        "\n",
        "# Load packages\n",
        "library(tidyverse)\n",
        "library(magrittr)\n",
        "library(ggplot2)\n",
        "library(haven)\n",
        "library(tidyverse)\n",
        "\n",
        "# Import dataset\n",
        "fake_data <- read_dta(\"../econ490-r/fake_data.dta\")"
      ],
      "id": "d8f96685-ac15-4523-81d8-af11de0e3c9b"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In this module, we will work on two examples. The first example covers\n",
        "combining two graphs with the same schema, while the second covers\n",
        "combining two graphs with different schemas. It will soon be very clear\n",
        "what we mean by *schema*.\n",
        "\n",
        "## 9.1 Example 1\n",
        "\n",
        "For this example, we want to generate two graphs with the same schema\n",
        "(they are the same type of graph and use the same variables as their x\n",
        "and y axis). For example, let’s say we want to see the evolution of\n",
        "average earnings over time for treated and untreated workers in two\n",
        "different regions. Instead of having four lines in one graph, we would\n",
        "like to separate the two regions in two different panels of the same\n",
        "graph.\n",
        "\n",
        "Let’s do this step by step. We start by creating the data: we want a\n",
        "data frame with average earnings by year and treatment status for the\n",
        "first two regions. We use `group_by`, as seen in [Module\n",
        "6](https://comet.arts.ubc.ca/docs/Research/econ490-r/06_Within_Group.html)."
      ],
      "id": "f1025141-ff81-46ca-8be6-cba43deff698"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "figure1_data <- fake_data %>%\n",
        "        mutate(log_earnings = log(earnings)) %>% # take log of earnings\n",
        "        group_by(year, region, treated) %>%      # group by time, treatment status, and region\n",
        "        summarise(mean_earnings = mean(log_earnings)) %>% # take average by group\n",
        "        filter(region==1|region==2) %>%          # keep only first two regions\n",
        "        mutate(treatment = case_when(treated == 1 ~ 'Treated', treated == 0 ~ 'Untreated')) # create a character variable for treatment"
      ],
      "id": "2b6408fb-86c5-44c2-9d09-52033bdd093c"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Once we have created our data, we proceed with the same steps we have\n",
        "used in section 8.2.2 of [Module\n",
        "8](https://comet.arts.ubc.ca/docs/Research/econ490-r/08_ggplot_graphs.html)\n",
        "to create a line plot with one line for each treatment status (*treated*\n",
        "and *untreated*).\n",
        "\n",
        "In this case, we need to add a crucial component: `facet_grid`. This\n",
        "allows to split up our data by one or two variables that vary on the\n",
        "horizontal and/or vertical direction. The syntax is\n",
        "`facet_grid(vertical ~ horizontal)`.\n",
        "\n",
        "In the code below, we split vertically our data for the two regions, by\n",
        "adding `facet_grid(region ~ .)` to our code."
      ],
      "id": "28a9c726-a5bb-4dee-97dd-6df203516b18"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Specify data and axis\n",
        "figure1 <- ggplot(data = figure1_data, # referencing the data we want to use\n",
        "                 aes(\n",
        "                     x = year,  # x is year\n",
        "                     y = mean_earnings, # our y is avg logearnings\n",
        "                     group=treatment, # each line is data for one value of treatment\n",
        "                     color=treatment # each value of treatment as one color\n",
        "                 ))\n",
        "\n",
        "# Tell R the graph will be a line graph\n",
        "figure1 <- figure1 + geom_line() \n",
        "\n",
        "# Add labels\n",
        "figure1 <- figure1 + labs(x = \"Year\", y = \"Average Log-earnings\")\n",
        "\n",
        "# \"split\" vertically graph by region\n",
        "figure1 + facet_grid(region ~ .)"
      ],
      "id": "c3083fa9-628a-432f-91fc-0e7938fa413f"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Notice that now our graph is made of two panels whose titles are the\n",
        "names of the regions. However, we do not know what region 1 and region 2\n",
        "mean. We can add a character variable to our data, named *region_name*,\n",
        "and split horizontally the graph into the two names stored in\n",
        "*region_name*."
      ],
      "id": "85c84a49-efdf-4bc2-93f1-87e476d8949f"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Add names to regions\n",
        "figure1_data <- figure1_data %>%\n",
        "                mutate(region_name = case_when(region == 1 ~ 'Ontario', region == 2 ~ 'Manitoba'))\n",
        "\n",
        "# Specify data and axis\n",
        "figure1 <- ggplot(data = figure1_data, # referencing the data we want to use\n",
        "                 aes(\n",
        "                     x = year,  # x is year\n",
        "                     y = mean_earnings, # our y is avg logearnings\n",
        "                     group=treatment, # each line is data for one value of treatment\n",
        "                     color=treatment # each value of treatment as one color\n",
        "                 ))\n",
        "\n",
        "# Tell R the graph will be a line graph\n",
        "figure1 <- figure1 + geom_line() \n",
        "\n",
        "# Add labels\n",
        "figure1 <- figure1 + labs(x = \"Year\", y = \"Average Log-earnings\")\n",
        "\n",
        "# Split horizontally and use labels\n",
        "figure1 + facet_grid(. ~ region_name)"
      ],
      "id": "6feb5b77-9df9-4c71-b74e-c23fa42422f6"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can also add a vertical line for year 2003, the moment in which the\n",
        "treatment has been introduced. More information about how to do this is\n",
        "available in [Module\n",
        "8](https://comet.arts.ubc.ca/docs/Research/econ490-r/08_ggplot_graphs.html)"
      ],
      "id": "d5893dc6-5836-42e3-addf-ea3fc6cf2045"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Add dashed red vertical line\n",
        "figure1 <- figure1 + geom_vline(aes(xintercept=2002), color=\"#bb0000\", linetype=\"dashed\")\n",
        "\n",
        "# Split vertically by region_name\n",
        "figure1 + facet_grid(. ~ region_name)"
      ],
      "id": "21240923-2866-427d-8591-0d4608a17314"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 9.2 Example 2\n",
        "\n",
        "For this example, we want to combine graphs that do not follow the same\n",
        "schema. Let’s say we are interested in seeing if there is any\n",
        "relationship between the distribution of earnings (*log_earnings*) and\n",
        "how worker’s earnings change over time in region 1. Which graphs do you\n",
        "think would best present this information?\n",
        "\n",
        "As we have seen in [Module\n",
        "8](https://comet.arts.ubc.ca/docs/Research/econ490-r/08_ggplot_graphs.html),\n",
        "we usually use histograms to represent density distributions and we can\n",
        "use a scatterplot or a line plot for the graph of earnings over time.\n",
        "\n",
        "We will now see how to use `grid.arrange` to put multiple graphs on one\n",
        "page. The `grid.arrange` function is stored in the library *gridExtra*,\n",
        "which we need to install. We do so below."
      ],
      "id": "6e7138b3-8ac7-451c-8992-c7ce1d3a0c9e"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Install and laod\n",
        "#install.packages(\"gridExtra\")\n",
        "library(gridExtra)"
      ],
      "id": "b12af347-3a15-4b0f-8cc2-27209265e728"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let’s first create the two plots separately and then combine them\n",
        "together.\n",
        "\n",
        "Start with the histogram: use `geom_histogram` to create a histogram for\n",
        "the density of log-earnings and store it into the object *plot1*. You\n",
        "can find a detailed explanation in section 8.2.3 of [Module\n",
        "8](https://comet.arts.ubc.ca/docs/Research/econ490-r/08_ggplot_graphs.html)."
      ],
      "id": "d320819c-6f97-46da-a7dd-7e23722cd5f9"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Add log earnings to dataset\n",
        "fake_data <- fake_data %>% mutate(log_earnings = log(earnings))\n",
        "\n",
        "# Plot 1: histogram\n",
        "plot1 <- ggplot(data = fake_data, aes(x = log_earnings)) + geom_histogram()"
      ],
      "id": "1539cc69-0c85-414d-ae79-046f15549264"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Then, create a line graph for the average log-earning by year using\n",
        "`geom_line` and store it into the object *plot2*. You can find a\n",
        "detailed explanation in section 8.2.2 of [Module\n",
        "8](https://comet.arts.ubc.ca/docs/econ_490/econ490-r/08_ggplot_graphs.html)."
      ],
      "id": "2a56e22a-d224-42e2-8f72-3d63119c3600"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# Create a dataframe with the average wage by year\n",
        "plot2_data <- fake_data %>%\n",
        "              group_by(year) %>%\n",
        "              summarise(mean_earnings = mean(log_earnings))\n",
        "\n",
        "# Plot 2: line graph\n",
        "plot2 <- ggplot(data = plot2_data, aes(x = year, y = mean_earnings)) + geom_line()"
      ],
      "id": "91931bfc-9e7a-437c-ad53-f419999f59ee"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now combine the objects *plot1* and *plot2* into one single page using\n",
        "the function `grid.arrange`. Notice that we can specify how many numbers\n",
        "of columns or rows we want with `ncol` or `nrow`, respectively."
      ],
      "id": "a1c9404a-e4c0-46eb-87bf-2ca87a10f7c1"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "grid.arrange(plot1, plot2, nrow=1)"
      ],
      "id": "4af46756-61c7-4c06-99f9-9b0f8c65a1c4"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 9.3 Wrap Up\n",
        "\n",
        "In this module we learned how to combine graphs using , whether they\n",
        "have the same schema or not. When producing a research paper we might\n",
        "want to compare statistics from different countries or regions, such as\n",
        "GDP, population density, inflation, exports, etc. These types of graphs\n",
        "allow us to see how the same variables diverge across different\n",
        "categories (as in Example 1) or how different variables influence each\n",
        "other (as in Example 2).\n",
        "\n",
        "## 9.4 Wrap-up Table\n",
        "\n",
        "| Command | Function |\n",
        "|--------------------------------|----------------------------------------|\n",
        "| `facet_grid(vertical ~ horizontal)` | It combines two graphs with the same or different schemas. |\n",
        "| `grid.arrange(plot1, plot2, nrow=, ncol=)` | It combines two graphs with the same or different schemas. |"
      ],
      "id": "3c1335bf-4aaf-4c41-a2c8-2046b2a57bd7"
    }
  ],
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "name": "ir",
      "display_name": "R",
      "language": "r"
    }
  }
}