{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 11 - Exporting Regression Output\n", "\n", "Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt \n", "2024-05-29\n", "\n", "## Prerequisites\n", "\n", "1. Run OLS Regressions.\n", "\n", "## Learning Outcomes\n", "\n", "1. Being able to export regression output in a table.\n", "2. Being able to plot regression coefficients in a graph.\n", "\n", "## 11.1 Exporting Regression Output\n", "\n", "When doing our project, presenting our results in a clear and organized\n", "manner is as important as obtaining the results themselves. R’s output\n", "is very clear on the computer display, but at some point we need to\n", "“move” it from R to our draft. In this module, we will see how to save a\n", "regression output in a table.\n", "\n", "Once again, we will be using the fictional data set. Recall that this\n", "data is simulating information of workers in the years 1982-2012 in a\n", "fictional country where a training program was introduced in 2003 to\n", "boost their earnings.\n", "\n", "Let’s start by loading our packages and opening the dataset." ], "id": "b378f6ee-4dd6-488b-884e-0cec0deafca6" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Loading in our packages\n", "library(tidyverse)\n", "library(haven)\n", "library(IRdisplay)\n", "\n", "# Open the data\n", "fake_data <- read_dta(\"../econ490-stata/fake_data.dta\")" ], "id": "ddcdcf4e-6f13-44ac-bfd0-cb4f5d8fc47e" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imagine we are interested in estimating a multivariate regression of the\n", "following form:\n", "\n", "$$\n", "\\text{earnings}_{it} = \\alpha + \\beta_1 \\text{age}_{it} + \\beta_2 \\text{sex}_i + \\varepsilon_{it}\n", "$$\n", "\n", "where $\\text{Earnings}_{it}$ is the logarithm of earnings of individual\n", "$i$ at time $t$, $\\text{Age}_{it}$ is the logarithm of age of individual\n", "$i$ at time $t$, and $\\text{Sex}_i$ is a dummy variable equal to one if\n", "the sex of individual $i$ is female.\n", "\n", "First, we create the variables we need." ], "id": "ea0a7e94-1302-4d1b-b618-64099cb44071" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fake_data <- fake_data %>%\n", " mutate(log_earnings = log(earnings)) %>%\n", " mutate(log_age = log(age)) %>%\n", " mutate(sexdummy = as.factor(sex))" ], "id": "549b90cd-28e2-4bbe-843f-a9414a2bddfd" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we can estimate our regression using the function `lm`. We have\n", "seen how to do it in [Module\n", "10](https://comet.arts.ubc.ca/docs/Research/econ490-r/10_Linear_Reg.html)." ], "id": "0f26b4e5-2897-4a24-9be4-efa46cb88bd1" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "summary(lm(data=fake_data, log_earnings ~ log_age + sexdummy))" ], "id": "d8202196-842c-4a03-bd31-b71c59e0bef1" }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are different options available to export this table to another\n", "file. In this module, we will use `stargazer`.\n", "\n", "`stargazer` can take several options. In its simplest form, we just need\n", "to type `stargazer(modelname, type=\"filetype\", output=\"filename\")` to\n", "save the results of the model *modelname* in a file of type *filetype*\n", "named *filename*. We can use text files, tex files, and html files.\n", "\n", "For example, let’s save our results in a text file named *table.txt*.\n", "First, we have to call the stargazer library." ], "id": "17129d71-7459-4726-9c1a-31f4f734ef3e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#uncomment this line to install the package! install.packages(\"stargazer\")\n", "library(stargazer)" ], "id": "b08230b8-92a5-4013-8929-e680af9624c1" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, we can save our linear model in a object called *model1* and use\n", "it as input of the `stargazer` function." ], "id": "6e7eb759-a8b1-473f-8771-112487066110" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model1 <- lm(data=fake_data, log_earnings ~ log_age + sexdummy)\n", "stargazer(model1, type=\"text\", out=\"table.txt\")" ], "id": "3ba0554f-3bd1-45f1-9f9c-bc99fff8252d" }, { "cell_type": "markdown", "metadata": {}, "source": [ "A file named *table.txt* should appear in your folder. Notice that this\n", "worked, but our table does not have a very professional appearance yet.\n", "We can add more options to the function `stargazer` to make our results\n", "more clear and organized.\n", "\n", "Here are some of the options we can add:\n", "\n", "- we can align the numeric values within our table with option\n", " `align=TRUE`;\n", "- we can keep only selected statistics using `keep.stat`;\n", "- we can add a title *titlename* with the option `title=\"titlename\"`;\n", "- we can modify the labels of covariates in the regression table with\n", " the option `covaraiate.labels`;\n", "- we can show only some coefficients, by including them in\n", " `keep(coeffnames)`. Similarly, we can omit some of the coefficients\n", " by including them in `omit(coeffnames)`.\n", "\n", "Let’s try all of them in practice. Let’s save again the same table, with\n", "the following modifications:\n", "\n", "- keep only the coefficients for *log_age* and *sexdummy*;\n", "- rename those coefficients;\n", "- keep only the statistics on number of observations and adjusted\n", " R$^2$;\n", "- add a title." ], "id": "ce21528b-2bc1-40a6-8b83-35158f8fd3de" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "stargazer(model1, type=\"text\", out=\"table.txt\", title=\"Earnings analysis\", keep.stat=c(\"n\",\"adj.rsq\"), keep=c(\"log_age\",\"sexdummy\"), covariate.labels=c(\"Age (ln)\", \"Male\"))" ], "id": "293124ed-d233-4bf1-b0cc-bd112ad96c4a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is way nicer, but what if we want to show the results of multiple\n", "models in the same table?\n", "\n", "Suppose we want to first estimate a model with only *age* or only *sex*\n", "as an explanatory variable, and then a multivariate model encompassing\n", "both. In this case, we only need to store the results of each model in a\n", "separate object and then add all of them as inputs of `stargazer`.\n", "\n", "In the example below, we store the three models in objects *model1*,\n", "*model2*, and *model3* before adding them as inputs of `stargazer`." ], "id": "982b08d5-728f-4110-9eeb-f30900f91a91" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Store regressions\n", "model1 <- lm(data=fake_data, log_earnings ~ log_age)\n", "model2 <- lm(data=fake_data, log_earnings ~ sexdummy)\n", "model3 <- lm(data=fake_data, log_earnings ~ log_age + sexdummy)\n", "\n", "# Create table\n", "stargazer(model1, model2, model3, title=\"Comparison\", align=TRUE, type=\"text\", out=\"table.txt\", keep.stat=c(\"n\",\"adj.rsq\"))" ], "id": "f58f8d33-f496-4bd3-9323-181f682430e5" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11.2 Plotting Regression Coefficients\n", "\n", "Visual representations can be better than tables. Sometimes we need to\n", "plot our estimated coefficients and their confidence intervals.\n", "\n", "In R, this is easily done with command `coefplot`. The graphs obtained\n", "with `coefplot` are easy to customize. In its simplest use, we only need\n", "to save our regression results in an object and then give that object as\n", "input of `coefplot`.\n", "\n", "Once again, let’s try it on our multivariate model. The first thing to\n", "do, is to load the corresponding library." ], "id": "bc28ec21-71d0-4471-9e0e-5004d135463d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load package\n", "#uncomment this line to install the package! install.packages(\"coefplot\")\n", "\n", "library(coefplot)" ], "id": "894e947d-81b8-4d75-bd6f-def4a0c04c91" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can save our estimated coefficients in an object named *model1*\n", "and use it as input for the `coefplot` function. Note that we can omit\n", "the constant by adding the option `intercept=FALSE`." ], "id": "a6e151dd-43ec-47d2-a86e-dab93fdb948d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model1 <- lm(data=fake_data, log_earnings ~ log_age + sexdummy)\n", "coefplot(model1, intercept=FALSE)" ], "id": "e1dea7db-e9f2-4979-ab76-cbfd6d7996e4" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can customize our graph further by using options that are specific to\n", "`coefplot`. By default, R draws two confidence intervals: the first at\n", "one standard deviation from the coefficient, and the second at two\n", "standard deviations from the coefficient. We can modify them with the\n", "options `innerCI` and `outerCI`, respectively. By default, they are set\n", "to `innerCI=1` and `outerCI=2`.\n", "\n", "We can also change the color of the estimates and their confidence\n", "intervals with the option `color`.\n", "\n", "Finally, we can display the estimated coefficients horizontally with the\n", "option `horizontal=TRUE`.\n", "\n", "Let’s apply these options to our example and generate an horizontal plot\n", "with red objects and only one confidence interval at 1.5 standard\n", "deviations distance." ], "id": "b775aa96-abf7-49e4-bcd9-bae3c66f8e85" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "coefplot(model1, intercept=FALSE, horizontal=TRUE, color=\"red\", innerCI=0, outerCI=1.5)" ], "id": "98c5c606-29e7-41fa-a706-2d75a89153cb" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11.3 Wrap Up\n", "\n", "We have learned in this module how to store regression output in a clear\n", "and organized manner using the command `stargazer` and how to plot\n", "regression coefficients using the command `coefplot`.\n", "\n", "Remember to check the R documentation when creating graphs and exporting\n", "tables. The documentation can be your best ally if you end up using it.\n", "\n", "## 11.4 Wrap-up Table\n", "\n", "| Command | Function |\n", "|--------------------------------|----------------------------------------|\n", "| `stargazer(modelname, type=\"filetype\", output=\"filename\")` | It saves *modelname* in a file of type *filetype* named *filename*. |\n", "| `coefplot(modelname)` | It plots regression coefficients and two confidence intervals, one at 1 standard deviation and the other at 2 standard deviations distance. |" ], "id": "55abdfbc-7b65-4e2a-802f-baf452fa278e" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "ir", "display_name": "R", "language": "r" } } }