{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 04 - Working with Locals and Globals\n", "\n", "Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt \n", "2024-05-29\n", "\n", "## Prerequisites\n", "\n", "1. View the characteristics of any dataset using the command\n", " `describe`.\n", "2. Use `help` to learn how to run new commands and understand their\n", " options.\n", "3. Understand the Stata command syntax.\n", "4. Create loops using the commands `for`, `while`, `forvalues` and\n", " `foreach`.\n", "\n", "## Learning Outcomes\n", "\n", "1. Recognize the difference between data set variables and Stata\n", " variables.\n", "2. Recognize the difference between local and global Stata variables.\n", "3. Use the command `local` to create temporary macros.\n", "4. Use the command `global` to create permanent macros.\n", "5. Forecast how you will use macros in your own research.\n", "\n", "## 4.0 Intro" ], "id": "3ee22a64-5766-4f4d-b6d3-ad5c54cfc26c" }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import stata_setup\n", "stata_setup.config('C:\\Program Files\\Stata18/','se')" ], "id": "c866891b" }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [], "source": [ ">>> import sys\n", ">>> sys.path.append('/Applications/Stata/utilities') # make sure this is the same as what you set up in Module 01, Section 1.3: Setting Up the STATA Path\n", ">>> from pystata import config\n", ">>> config.init('se')" ], "id": "376ac5ea" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.1 Stata Variables\n", "\n", "In early econometrics courses, we learned that “variables” are\n", "characteristics of a data set. For example, if we had a data set that\n", "included all of the countries in the world, we might have a variable\n", "which indicates each country’s population. As another example, if we had\n", "a data set that included a sample of persons in Canada, we might have a\n", "variable which indicates each person’s marital status. These are data\n", "set variables, and they can be qualitative (strings) or quantitative\n", "(numeric).\n", "\n", "In Stata, there is a separate category of variables available for use\n", "which we call “macros”. Macros work as placeholder variables for values\n", "that we want to store either temporarily or permanently in our\n", "workspace. Locals are macros that store data temporarily (within the\n", "span of the executed code), while globals are macros that store data\n", "permanently, or at least as long as we have Stata open on our computer.\n", "We can think of Stata macros as analogous to workspace objects in Python\n", "or R. Below, we are going to learn how to use these macros in our own\n", "research.\n", "\n", "## 4.2 Locals\n", "\n", "Locals are an extremely useful object in Stata. A local name is usually\n", "enwrapped between two backticks.\n", "\n", "Here we will cover two popular applications of locals.\n", "\n", "### 4.2.1 Storing Results\n", "\n", "The first use of local macros is to store the results of our code. Most\n", "Stata commands have hidden results stored after they are run. We can\n", "then put those into local macros to use later. Consider the following\n", "example:" ], "id": "2442a345-806b-47ab-a82b-25624dea0c6e" }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "sysuse auto, clear\n", "\n", "summarize price" ], "id": "a51d9859" }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we ran `summarize` above, Stata produced output that was stored in\n", "several local variables. We can access those stored results with the\n", "command `return list` (for regular commands) or `ereturn list` (for\n", "estimation commands, which we’ll cover later in [Module\n", "11](https://comet.arts.ubc.ca/docs/Research/econ490-pystata/11_Linear_Reg.html).\n", "Since `summarize` is not an estimation command, we can run the\n", "following:" ], "id": "b869b136-1ae9-4309-85a7-f98414de123e" }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "return list" ], "id": "6e174118" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that Stata has reported that variables have been stored as\n", "scalars, where a scalar is simply a quantity.\n", "\n", "If we want Stata to tell us the mean price from the automobile data set\n", "that was just calculated using `summarize`, we can use the following:" ], "id": "0620c036-3e7f-4980-bbb9-c46b3b8094c4" }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "display return(mean)" ], "id": "7908f641" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now store that scalar as a local, and use that local in other\n", "Stata commands:" ], "id": "bc83d59a-1f6f-4a75-9351-96ea45eb5a23" }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "local price_mean = return(mean)\n", "display \"The mean of price variable is `price_mean'.\" " ], "id": "b97577e1" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also modify the format of our local, so that the average price is\n", "rounded to the closest integer and there is a comma separator for\n", "thousand units. We do so by typing `%5.0fc`. To learn more about\n", "different formats in Stata, type `help format`." ], "id": "842cc944-e3c6-4cb4-a93a-d8a166acefb6" }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "local price_mean_formatted : display %5.0fc return(mean)\n", "display \"The average price is `price_mean_formatted'.\"" ], "id": "8df8cc53" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Imagine that we wanted to create a new variable that is equal to the\n", "price minus the mean of that same variable. We would do this if we\n", "wanted to de-mean that variable or, in other words, create a new price\n", "variable that has a mean of zero. To do this, we could use the\n", "`generate` command along with the local we just created to do exactly\n", "that:" ], "id": "76e76bdb-4ec1-4fb7-8d5f-5a4f6ece7859" }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "local price_mean = return(mean)\n", "generate price_demean = price - `price_mean'" ], "id": "0cd36b00" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that there is no output when we run this command.\n", "\n", "If we try to run this command a second time, we will get an error\n", "because Stata doesn’t want us to accidentally overwrite an existing\n", "variable. In order to correct this problem, we need to use the command\n", "`replace` instead of the command `generate`. Try it yourself above!\n", "\n", "Let’s take a look at the mean of our new variable using `summarize`\n", "again." ], "id": "1c702d2c-7250-43f8-bf6f-447111f71893" }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "su price_demean" ], "id": "65189319" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the mean is roughly zero just as we expected.\n", "\n", "### 4.2.2 Executing loops\n", "\n", "When we looked at loops in [Module\n", "3](https://comet.arts.ubc.ca/docs/Research/econ490-stata/03_Stata_Essentials.html),\n", "we took a look at the second popular use of locals. Specifically, our\n", "examples of `foreach`, `forvalues`, and `while` use locals to iterate\n", "over strings or integers.\n", "\n", "In this subsection, we will see how to use locals both **inside** of a\n", "loop (these locals are automatically generated by Stata) and **outside**\n", "of the loop (when we store the list of values into a local for the loop\n", "to loop from).\n", "\n", "Consider the following common application here involving a categorical\n", "variable that can take on 5 possible values." ], "id": "cd6e7efb-9b75-44c2-9643-6227a369be6c" }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "summarize rep78" ], "id": "84dc16b2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if we run the command that we used to display the mean of\n", "price, we will now get a different value. Try it yourself!\n", "\n", "There are times when we might want to save all of the possible\n", "categorical values in a local. When we use the `levelsof` command as is\n", "done below, we can create a new local with a name that we choose. Here,\n", "that name is *levels_rep*." ], "id": "d8a6682a-87f0-473d-a9e7-e466d14a53fe" }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "levelsof rep78, local(levels_rep)" ], "id": "d6017e16" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do different things with this new list of values. For instance,\n", "we can now summarize a variable based on every distinct value of\n", "*rep78*, by creating a loop using `foreach` and looping through all the\n", "values of the newly created local." ], "id": "66cdb3b7-45f0-46b3-a8e3-428cf719c346" }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "foreach x in `levels_rep' {\n", "summarize price if rep78 == `x'\n", "}" ], "id": "3f721bee" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that in the loop above there are two locals:\n", "\n", "1. *levels_rep* : the local containing the list of values taken by\n", " variable *rep*;\n", "2. *x* : the local containing, in each loop, one specific value from\n", " the list stored in *levels_rep*.\n", "\n", "## 4.3 Globals\n", "\n", "Globals are equally useful in Stata. They have the same applications as\n", "locals, but their values are stored permanently. Due to their permanent\n", "nature, globals cannot be used *inside* loops. They can be used for all\n", "the other applications for which locals are used.\n", "\n", "Here we will cover two popular applications of globals.\n", "\n", "### 4.3.1 Storing Lists\n", "\n", "Globals are used to store lists of variable names, paths, and/or\n", "directories that we need for our research project.\n", "\n", "Consider the following example where we create a global called\n", "*covariates* that is simply a list of two variable names:" ], "id": "dd335ec5-8f87-4adc-8d72-8cf3db863e62" }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "global covariates \"rep78 foreign\"" ], "id": "662e65a8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use this global anywhere we want to invoke the two variables\n", "specified. When we want to indicate that we are using a global, we refer\n", "to this type of macro with the dollar sign symbol `$`.\n", "\n", "Here we `summarize` these two variables." ], "id": "f8a824cf-fcda-43de-baf6-21a7db3d4864" }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "summarize ${covariates}" ], "id": "5889d4bb" }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the empty cell below, `describe` these three variables using the\n", "macro we have just created." ], "id": "d76ad2ed-27a8-43f1-9f00-e28bd7071971" }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "%%stata" ], "id": "00f5fad3" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that lists of variables can be very useful when we estimate\n", "multiple regression models. Suppose that we want to estimate how price\n", "changes with mileage, controlling for the car origin and the trunk\n", "space. We can store all our control variables in one global called\n", "*controls* and then call that global directly when estimating our\n", "regression." ], "id": "ea3f34e4-b038-4dc6-9726-5a6a217be420" }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "global controls trunk foreign\n", "reg price mpg $controls" ], "id": "07ce2f26" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using globals for estimating regressions is very helpful when we have to\n", "estimate many specifications, as it reduces the likelihood of making\n", "typos or mistakes.\n", "\n", "### 4.3.2 Changing Directories\n", "\n", "Globals are useful to store file paths. We will see more of them in the\n", "module of project workflow ([Module\n", "18](https://comet.arts.ubc.ca/docs/Research/econ490-stata/18_Wf_Guide2.html)).\n", "\n", "In the following example, we are saving the file path for the folder\n", "where our data is stored in a global called *datadirectory* and the file\n", "path where we want to save our results in a global called\n", "*outputdirectory*.\n", "\n", "Note that this is a fictional example, so no output will be produced." ], "id": "9ba8691e-c4b6-40b0-9fa0-54aafea780e9" }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "global datadirectory C:\\project\\mydata\\\n", "global outputdirectory C:\\project\\output\\" ], "id": "fc7fb58b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the global *datadirectory* to load our data more easily:" ], "id": "993aeef0-5b1b-47a5-936d-ee05d641e3d8" }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "use \"$datadirectory\\data.dta\", clear" ], "id": "16f238c2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, once we have finished editing our data, we can store our\n", "results in the folder saved within the global *outputdirectory*:" ], "id": "7ee7a88f-b57c-4515-b2c5-e3148fa62645" }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "save using \"$outputdirectory\\output.dta\", replace" ], "id": "ea21cc13" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.4 Common Mistakes\n", "\n", "The most common mistake that happens when using locals or globals is to\n", "accidentally save an empty macro. In those cases, the local or global\n", "will contain no value. This can happen if we run only some lines of the\n", "do-file in our local machine, as the local macros defined in the\n", "original do-file are not defined in the smaller subset of the do-file\n", "that we are running. These errors can happen if we run Stata on our\n", "local machine, but not if we run our code on JupyterLab. To avoid this\n", "kind of mistake, run your do-file entirely, not pieces of it.\n", "\n", "Another common mistake is to save the wrong values in our local\n", "variable. Stata always updates the automatically created locals in\n", "`return list` or `ereturn list`. In the following example, we fail to\n", "save the average price because Stata has updated the value of\n", "`return(mean)` with the average length." ], "id": "bf5313de-57a8-4234-8bdc-f0e3ba86decf" }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "summarize price length" ], "id": "cd8787d3" }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "return list" ], "id": "1c27f754" }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "local price_mean = r(mean)\n", "display \"The average price is `price_mean'.\" " ], "id": "5486c120" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4.5 Wrap Up\n", "\n", "In this module, we learned how Stata has its own set of variables that\n", "have some very useful applications. We will see these macros throughout\n", "the following modules. You will also use them in your own research\n", "project.\n", "\n", "To demonstrate how useful macros can be, we can use our *covariates*\n", "global to run a very simple regression in which *price* is the dependent\n", "variable and the explanatory variables are *rep78* and *foreign*. That\n", "command using our macro would be:" ], "id": "b0f01cad-8a88-4a99-8cde-1d8a8c54474a" }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "regress price ${covariates}" ], "id": "cc8be34f" }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we only wanted to include observations where price is above average,\n", "then using the local we created earlier in this module the regression\n", "would be:" ], "id": "952d22be-d5d0-41c0-988e-eb9747b7be68" }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "%%stata\n", "\n", "regress price ${covariates} if price > `price_mean'" ], "id": "7e90fb75" }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see for yourself that Stata ran the regression on only a subset\n", "of the data.\n", "\n", "In the next module, we will work on importing data sets in various\n", "formats." ], "id": "df028d2a-5882-4e19-93f7-ad88183800c1" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3 (ipykernel)", "language": "python", "path": "/usr/local/share/jupyter/kernels/python3" }, "language_info": { "name": "python", "codemirror_mode": { "name": "ipython", "version": "3" }, "file_extension": ".py", "mimetype": "text/x-python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } } }