___ ____ ____ ____ ____ ®
/__ / ____/ / ____/ 18.0
___/ / /___/ / /___/ SE—Standard Edition
Statistics and Data Science Copyright 1985-2023 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: Unlimited-user network, expiring 19 Aug 2024
Serial number: 401809301518
Licensed to: Irene Berezin
UBC
Notes:
1. Unicode is supported; see help unicode_advice.
2. Maximum number of variables is set to 5,000 but can be increased;
see help set_maxvar.
>>>import sys>>> sys.path.append('/Applications/Stata/utilities') # make sure this is the same as what you set up in Module - 1, Section 1.5.1: Setting Up PyStata>>>from pystata import config>>> config.init('se')
4.1 Stata Variables
In ECON 325 and ECON 326, you learned that “variables” are characteristics of a data set. For example, if we had a data set that included all of the countries in the world, we might have a variable which indicates each country’s population. As another example, if we had a data set that included a sample of persons in Canada, we might have a variable which indicates each person’s marital status. These are data set variables, and they can be qualitative (strings) or quantitative (numeric).
In Stata, there is a separate category of variables available for use which we call “macros”. Macros work as placeholders for values that we want to store either temporarily or permanently. Locals are macros that store data temporarily (within the span of the executed code), while globals are macros that store data permanently, or at least as long as we have Stata open on our computer. We can think of Stata macros as analogous to workspace objects in Python or R. Below, you are going to learn how to use these macros in your own research.
4.2 Locals
Locals are an extremely useful object in Stata. A local name is usually enwrapped between two backticks.
Here we will cover two popular applications of locals.
4.2.1 Storing results
The first use of local macros is to store results of your code. To help you understand how powerful this is, you should be aware that most Stata commands have hidden results stored after they are run. Consider the following example
%%statasysuse auto, clearsummarize price
.
. sysuse auto, clear
(1978 automobile data)
.
. summarize price
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
.
When we ran summarize above, Stata produced output that was stored in several local variables. We can access those stored results with the command return list (for regular commands) or ereturn list (for estimation commands, which we’ll cover later in Module 12). Since summarize is not an estimation command, we can run the following:
Notice that Stata has reported that variables have been stored as scalars, where a scalar is simply a quantity.
If we want Stata to tell us the mean price from the automobile data set that was just calculated using summarize, we can use the following:
%%statadisplay r(mean)
.
. display r(mean)
6165.2568
.
We can now store that scalar as a local, and use that local in other Stata commands:
%%statalocal price_mean = r(mean)display "The mean of price variable is `price_mean'."
.
. local price_mean = r(mean)
. display "The mean of price variable is `price_mean'."
The mean of price variable is 6165.256756756757.
.
We can also modify the format of our local, so that the average price is rounded to the closest integer and there is a comma separator for thousand units. We do so by typing %5.0fc. To learn more about different formats in Stata, type help format.
%%statalocal price_mean_formatted : display %5.0fc r(mean)display "The average price is `price_mean_formatted'."
.
. local price_mean_formatted : display %5.0fc r(mean)
. display "The average price is `price_mean_formatted'."
The average price is 6,165.
.
Imagine that we wanted to create a new variable that is equal to the price minus the mean of that same variable. We would do this if we wanted to de-mean that variable or, in other words, create a new price variable that has a mean of zero. To do this, we could use the generate command along with the local we just created to do exactly that:
.
. local price_mean = r(mean)
. g price_demean = price - `price_mean'
.
Note that there is no output when we run this command.
If we try to run this command a second time, we will get an error because Stata doesn’t want us to accidentally overwrite an existing variable. In order to correct this problem, we need to use the command replace instead of the command generate. Try it yourself above!
Let’s take a look at the mean of our new variable using summarize again.
%%statasu price_demean
.
. su price_demean
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price_demean | 74 -.0000154 2949.496 -2874.257 9740.743
.
We can see that the mean is roughly zero just as we expected.
4.2.2 Executing loops
Locals are automatically generated whenever we use loops (as discussed in Module 3). In this subsection, we will see how to use locals both inside the loop (these locals are automatically generated by Stata) and outside the loop (when we store the list of values to loop from into a local).
Consider another common application here involving a categorical variable that can take on 5 possible values.
%%statasu rep78
.
. su rep78
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
rep78 | 69 3.405797 .9899323 1 5
.
Note that if we run the command above that we used to display the mean of price, we will now get a different value. Try it yourself!
There are times when we might want to save all the possible categorical values in a local. When we use the levelsof command as is done below, we can create a new local with a name that we choose. Here, that name is levels_rep.
%%statalevelsof rep78, local(levels_rep)
.
. levelsof rep78, local(levels_rep)
1 2 3 4 5
.
We can do different things with this new list of values. For instance, we can now summarize a variable based on every distinct value of rep78, by creating a loop using foreach and looping through all the values of the newly created local.
%%stataforeach x in `levels_rep' {su price if rep78 == `x'}
.
. foreach x in `levels_rep' {
2. su price if rep78 == `x'
3. }
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 2 4564.5 522.5519 4195 4934
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 8 5967.625 3579.357 3667 14500
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 30 6429.233 3525.14 3291 15906
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 18 6071.5 1709.608 3829 9735
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 11 5913 2615.763 3748 11995
.
Notice that in the loop above there are two locals: 1. levels_rep : the local containing the list of values taken by variable rep; 2. x : the local containing, in each loop, one specific value from the list stored in levels_rep.
4.3 Globals
Globals are equally useful in Stata. Global’s name is preceded by a dollar sign.
Globals have the same applications as locals, but their values are stored permanently. Due to their permanent nature, globals cannot be used inside loops. They can be used for all the other applications for which locals are used.
Here we will cover two popular applications of globals.
4.3.1 Storing lists
Globals are used to store lists of variable names, paths and/or directories that we need for our research project.
Consider the following example where we create a global called covariates that is simply a list of two variable names:
%%stataglobal covariates "rep78 foreign"
.
. global covariates "rep78 foreign"
.
We can now use this global anywhere we want to invoke the two variables specified. When we want to indicate that we are using a global, we refer to this type of macro with the dollar sign symbol $.
Here we summarize these two variables.
%%statasu ${covariates}
.
. su ${covariates}
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
rep78 | 69 3.405797 .9899323 1 5
foreign | 74 .2972973 .4601885 0 1
.
In the empty cell below, describe these three variables using the macro we have just created.
%%stata
Notice that lists of variables can be very useful when we estimate multiple regression models. Suppose that we want to estimate how price changes with mileage, controlling for the car origin and the trunk space. We can store all our control variables in one global called controls and then call that global directly when estimating our regression.
Using globals for estimating regressions will be very helpful when you will have to estimate many specifications, as it reduces the likelihood of making typos or mistakes.
4.3.2 Changing directories
Globals are useful to store file paths and we see more of them in the module of project workflow.
In the following example, we are saving in global datadirectory the file path for the folder where our data is stored and in global outputdirectory the file path where we want to save our results.
Note that this is a fictional example, so no output will be produced.
.
. global datadirectory C:\project\mydata\
. global outputdirectory C:\project\output\
.
We can use global datadirectory to load our data more easily:
%%statause "$datadirectory\data.dta", clear
SystemError:
.
. use "$datadirectory\data.dta", clear
file C:\project\mydata\\data.dta not found
r(601);
r(601);
Similarly, once we have finished editing our data, we can store our results in the folder saved within global outputdirectory:
%%statasave using "$outputdirectory\output.dta", replace
SystemError:
.
. save using "$outputdirectory\output.dta", replace
invalid '"C:\project\output\\output.dta'
r(198);
r(198);
4.5 Common mistakes
The most common mistake that happens when using locals or globals is to accidentally save an empty macro. In those cases, the local or global will contain no value. This can happen if you run only some lines of the do file in your local machine, as the local macros defined in the original do file are not defined in the smaller subset of do file you are running. These errors can happen if you run Stata on your local machine, but not if you run your code on JupyterLab. To avoid this kind of mistake, run your do file entirely, not pieces of it.
Another common mistake is to save the wrong values in your local variable. Stata always updates the automatically created locals in return list or ereturn list. In the following example, we fail to save the average price because Stata has updated the value of r(mean) with the average length.
%%statasummarize price length
.
. summarize price length
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
price | 74 6165.257 2949.496 3291 15906
length | 74 187.9324 22.26634 142 233
.
%%statalocal price_mean = r(mean)display "The average price is `price_mean'."
.
. local price_mean = r(mean)
. display "The average price is `price_mean'."
The average price is 187.9324324324324.
.
4.6 Wrap Up
In this module we learned how Stata has its own set of variables that have some very useful applications. We will see these macros throughout the following modules. You will also use them in your own research project.
To demonstrate how useful macros can be, we can use our covariates global to run a very simple regression in which price is the dependent variable and the explanatory variables are rep78 and foreign. That command using our macro would be:
If we only wanted to include observations where price is above average, then using the local we created earlier in this module the regression would be:
%%stataregress price ${covariates} if price > `price_mean'