{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 12 - Exporting Regression Output\n",
"\n",
"Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt \n",
"2024-05-29\n",
"\n",
"## Prerequisites\n",
"\n",
"1. Run OLS Regressions.\n",
"\n",
"## Learning Outcomes\n",
"\n",
"1. Being able to export regression output in a table.\n",
"2. Being able to plot regression coefficients in a graph.\n",
"\n",
"## 12.0 Intro"
],
"id": "41b7fa96-255a-48c0-a9dd-820c126119fa"
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import stata_setup\n",
"stata_setup.config('C:\\Program Files\\Stata18/','se')"
],
"id": "7b712b13"
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
">>> import sys\n",
">>> sys.path.append('/Applications/Stata/utilities') # make sure this is the same as what you set up in Module 01, Section 1.3: Setting Up the STATA Path\n",
">>> from pystata import config\n",
">>> config.init('se')"
],
"id": "bff2b793"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12.1 Exporting Regression Output\n",
"\n",
"When doing our project, presenting our results in a clear and organized\n",
"manner is as important as obtaining the results themselves. Stata’s\n",
"output is very clear on the computer display, but at some point we need\n",
"to “move” it from Stata to our draft. In this module, we will see how to\n",
"save a regression output in a table.\n",
"\n",
"Once again, we will be using the fictional data set. Recall that this\n",
"data is simulating information of workers in the years 1982-2012 in a\n",
"fictional country where a training program was introduced in 2003 to\n",
"boost their earnings.\n",
"\n",
"Let’s start by opening the dataset."
],
"id": "3dcf7479-b3ce-4657-8f2e-c1ff3852f997"
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"* Load the dataset\n",
"clear *\n",
"*cd \"\"\n",
"use \"fake_data.dta\", clear"
],
"id": "9a7df255"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Imagine we are interested in estimating a multivariate regression of the\n",
"following form:\n",
"\n",
"$$\n",
"\\text{Earnings}_{it} = \\alpha + \\beta_1 \\text{Age}_{it} + \\beta_2 \\text{Sex}_i + \\varepsilon_{it}\n",
"$$\n",
"\n",
"where $\\text{Earnings}_{it}$ is the logarithm of earnings of individual\n",
"$i$ at time $t$, $\\text{Age}_{it}$ is the logarithm of age of individual\n",
"$i$ at time $t$, and $\\text{Sex}_i$ is a dummy variable equal to one if\n",
"the sex of individual $i$ is female.\n",
"\n",
"First, we create the variables we need."
],
"id": "13db0d54-ff36-4e38-bad4-bc1840f35bcf"
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"* Create the variables\n",
"generate logearn = log(earnings)\n",
"generate logage = log(age)\n",
"generate sexdummy = 1 if sex == \"F\"\n",
"replace sexdummy = 0 if missing(sexdummy)"
],
"id": "c6fa66a7"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we can estimate our specification using the command `regress`. We\n",
"have seen how to do it in [Module\n",
"11](https://comet.arts.ubc.ca/docs/Research/econ490-pystata/11_Linear_Reg.html)."
],
"id": "93355a7e-9352-4a33-87ed-791c9fb9d198"
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"regress logearn logage sexdummy"
],
"id": "1be62358"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are different options available to export this table to another\n",
"file. In this module, we will use `etable`, a command available by\n",
"default in Stata 17 and subsequent versions.\n",
"\n",
"`etable` can take several options. In its simplest form, we just need to\n",
"type `etable, export(filename)` after fitting a model to save a table in\n",
"a file named *filename*. We can use files of Microsoft Word, Microsoft\n",
"Excel, LATEX, Markdown, or PDF, but we need to specify the right\n",
"extension.\n",
"\n",
"For example, let’s save our results in a Microsoft Word file named\n",
"*table.docx*."
],
"id": "d411ac4f-0a89-4581-8d8f-5ef8e2a74ecd"
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"regress logearn logage sexdummy\n",
"etable, export(table.docx)"
],
"id": "d4141eb7"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A file named *table.docx* should appear in your folder. Notice that this\n",
"worked, but our table does not have a very professional appearance yet.\n",
"We can add more options to the command `etable` to make our results more\n",
"clear and organized. Here are some of the options we can add:\n",
"\n",
"- we can add more statistics, such as the number of observations\n",
" (*N*), the R$^2$ (*r2*), the adjusted R$^2$ (*r2_a*), and the F\n",
" statistic (*F*), with the options `mstat(N)`, `mstat(r2)`,\n",
" `mstat(r2_a)`, and `mstat(F)`;\n",
"- we can add a title *titlename* with the option `title(titlename)`;\n",
"- we can show the stars indicating the level of significance of our\n",
" coefficients with the option `showstars` and add a footnote\n",
" explaining them with `showstarsnote`;\n",
"- for the coefficients, we can display the variable labels instead of\n",
" their names by adding the option `varlabel`;\n",
"- for the dependent variable, we can display its variable label\n",
" instead of its name by adding the option `column(dvlabel)`;\n",
"- we can show only some coefficients, by including them in\n",
" `keep(coeffnames)`. For example, we can show only the coefficients\n",
" for age and sex by adding the option `keep(logage sexdummy)`.\n",
"\n",
"Let’s try all of them in practice. Notice that now we add the option\n",
"*replace* when we save the file because there is already a Microsoft\n",
"Word file named *table.docx*: `export(table.docx, replace)`."
],
"id": "984c5684-0868-4a70-8ca4-b06d99faed97"
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"* Add labels to variables\n",
"label var logearn \"Earnings (ln)\"\n",
"label var logage \"Age (ln)\"\n",
"label var sexdummy \"Female\"\n",
"\n",
"* Run regression\n",
"regress logearn logage sexdummy\n",
"\n",
"* Store results\n",
"etable, export(table.docx, replace) mstat(N) mstat(r2_a) title(Earnings) showstars showstarsnote keep(logage sexdummy) varlabel column(dvlabel)"
],
"id": "dda20b5e"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is way nicer, but what if we want to show the results of multiple\n",
"models in the same table?\n",
"\n",
"Suppose we want to first estimate a model with only *age* or only *sex*\n",
"as an explanatory variable, and then a multivariate model encompassing\n",
"both. In this case, we just need to store the results of each model\n",
"using the command `estimates store`.\n",
"\n",
"In the example below, we store the three models in objects *model1*,\n",
"*model2*, and *model3*."
],
"id": "05e6a461-f8c8-4f89-bf56-e4f8c4a46f83"
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"* Store first regression in model1\n",
"regress logearn logage\n",
"estimates store model1\n",
"\n",
"* Store second regression in model2\n",
"regress logearn sexdummy\n",
"estimates store model2\n",
"\n",
"* Store third regression in model3\n",
"regress logearn logage sexdummy\n",
"estimates store model3"
],
"id": "f03818d9"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can export all the objects in one single table by calling their\n",
"names in the options `estimates()`."
],
"id": "815ff703-4878-416d-bf7f-762875f0cf3a"
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"etable, estimates(model1 model2 model3) mstat(N) mstat(r2_a) showstars showstarsnote varlabel column(dvlabel) export(table.docx, replace)"
],
"id": "a3e5daa2"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12.2 Plotting Regression Coefficients\n",
"\n",
"Visual representations can be better than tables. Sometimes we need to\n",
"plot our estimated coefficients and their confidence intervals.\n",
"\n",
"In Stata, this is easily done with command `coefplot`. The graphs\n",
"obtained with `coefplot` are easy to customize. In its simplest use, we\n",
"only need to run `coefplot` right after our regression.\n",
"\n",
"**Note:** You will need to install command `coefplot` from the SSC\n",
"Archive the first time you use it on your local computer. To do so, type\n",
"`ssc install coefplot`.\n",
"\n",
"Once again, let’s try it on our multivariate model. We can omit the\n",
"constant by adding the option `drop(_cons)`. Remember to save the graph."
],
"id": "d36cb680-74d6-45f3-8c64-4a741b048366"
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"regress logearn logage sexdummy\n",
"coefplot, drop(_cons)\n",
"graph export graph1.jpg, as(jpg) replace"
],
"id": "6195128e"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Since it is a graph, we can add most of the options that we have seen in\n",
"[Module\n",
"9](https://comet.arts.ubc.ca/docs/Research/econ490-pystata/09_Stata_Graphs.html).\n",
"For example, we can change the color of the background from light blue\n",
"to white with the option `graphregion(color(white))`.\n",
"\n",
"There are some options that are specific to `coefplot`. By default,\n",
"confidence intervals are drawn at 95% significance levels. We can\n",
"specify different and multiple levels in the option `levels()`. For\n",
"example, we can show both the 95% and 99.9% confidence intervals with\n",
"`levels(99.9 95)`.\n",
"\n",
"Additionally, we can use a vertical layout with the option `vertical`.\n",
"\n",
"Let’s apply these options to our example."
],
"id": "b624c739-ab78-42b1-87b0-fc673097608b"
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"%%stata\n",
"\n",
"regress logearn logage sexdummy\n",
"coefplot, drop(_cons) graphregion(color(white)) levels(99.9 95) vertical\n",
"graph export graph1.jpg, as(jpg) replace"
],
"id": "ac3efa77"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12.3 Wrap Up\n",
"\n",
"We have learned in this module how to store regression output in a clear\n",
"and organized manner using the command `etable` and how to plot\n",
"regression coefficients using the command `coefplot`.\n",
"\n",
"Remember to check the Stata documentation when creating graphs and\n",
"exporting tables. The documentation can be your best ally if you end up\n",
"using it.\n",
"\n",
"## 12.4 Wrap-up Table\n",
"\n",
"| Command | Function |\n",
"|--------------------------------|----------------------------------------|\n",
"| `etable, export(filename)` | It exports the regression output to a file named *filename*. |\n",
"| `coefplot` | It plots regression coefficients and their 95% confidence intervals. |\n",
"\n",
"## References\n",
"\n",
"[etable manual](https://www.stata.com/manuals/retable.pdf)
[How to\n",
"use\n",
"coefplot](https://repec.sowi.unibe.ch/stata/coefplot/getting-started.html)\n",
"
"
],
"id": "314d37cc-768e-444a-8a13-24a4834c4538"
}
],
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"path": "/usr/local/share/jupyter/kernels/python3"
},
"language_info": {
"name": "python",
"codemirror_mode": {
"name": "ipython",
"version": "3"
},
"file_extension": ".py",
"mimetype": "text/x-python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
}
}