COMET
  • Get Started
    • Quickstart Guide
    • Install and Use COMET
    • Get Started
  • Learn By Skill Level
    • Getting Started
    • Beginner
    • Intermediate - Econometrics
    • Intermediate - Geospatial
    • Advanced

    • Browse All
  • Learn By Class
    • Making Sense of Economic Data (ECON 226/227)
    • Econometrics I (ECON 325)
    • Econometrics II (ECON 326)
    • Statistics in Geography (GEOG 374)
  • Learn to Research
    • Learn How to Do a Project
  • Teach With COMET
    • Learn how to teach with Jupyter and COMET
    • Using COMET in the Classroom
    • See COMET presentations
  • Contribute
    • Install for Development
    • Write Self Tests
  • Launch COMET
    • Launch on JupyterOpen (with Data)
    • Launch on JupyterOpen (lite)
    • Launch on Syzygy
    • Launch on Colab
    • Launch Locally

    • Project Datasets
    • Github Repository
  • |
  • About
    • COMET Team
    • Copyright Information
  1. STATA Notebooks
  2. STATA Essentials (3)
  • Learn to Research


  • STATA Notebooks
    • Setting Up (1)
    • Working with Do-files (2)
    • STATA Essentials (3)
    • Locals and Globals (4)
    • Opening Datasets (5)
    • Creating Variables (6)
    • Within Group Analysis (7)
    • Combining Datasets (8)
    • Creating Meaningful Visuals (9)
    • Combining Graphs (10)
    • Conducting Regression Analysis (11)
    • Exporting Regression Output (12)
    • Dummy Variables and Interactions (13)
    • Good Regression Practices (14)
    • Panel Data Regression (15)
    • Difference in Differences (16)
    • Instrumental Variable Analysis (17)
    • STATA Workflow Guide (18)

  • R Notebooks
    • Setting Up (1)
    • Working with R Scripts (2)
    • R Essentials (3)
    • Opening Datasets (4)
    • Creating Variables (5)
    • Within Group Analysis (6)
    • Combining Datasets (7)
    • Creating Meaningful Visuals (8)
    • Combining Graphs (9)
    • Conducting Regression Analysis (10)
    • Exporting Regression Output (11)
    • Dummy Variables and Interactions (12)
    • Good Regression Practices (13)
    • Panel Data Regression (14)
    • Difference in Differences (15)
    • Instrumental Variable Analysis (16)
    • R Workflow Guide (17)

  • Pystata Notebooks
    • Setting Up (1)
    • Working with Do-files (2)
    • STATA Essentials (3)
    • Locals and Globals (4)
    • Opening Datasets (5)
    • Creating Variables (6)
    • Within Group Analysis (7)
    • Combining Datasets (8)
    • Creating Meaningful Visuals (9)
    • Combining Graphs (10)
    • Conducting Regression Analysis (11)
    • Exporting Regression Output (12)
    • Dummy Variables and Interactions (13)
    • Good Regression Practices (14)
    • Panel Data Regression (15)
    • Difference in Differences (16)
    • Instrumental Variable Analysis (17)
    • STATA Workflow Guide (18)

On this page

  • Prerequisites
  • Learning Outcomes
  • 3.1 Describing Your Data
  • 3.2 Introduction to Stata Command Syntax
    • 3.2.1 Using HELP to understand commands
    • 3.2.2 Imposing IF conditions
    • 3.2.3 Imposing IN conditions
    • 3.2.4 Command options
  • 3.3 Using Loops
    • 3.3.1 Creating loops Using forvalues
    • 3.3.2 Creating loops using foreach
    • 3.3.3 Writing loops with conitions using while
  • 3.4 Errors
  • 3.5 Wrap Up
  • 3.6 Wrap-up Table
  • References
  • Report an issue

Other Formats

  • Jupyter
  1. STATA Notebooks
  2. STATA Essentials (3)

03 - Stata Essentials

econ 490
stata
describe
summarize
loops
help
This notebook dives into a few essentials commands in Stata, including describe, summarize, loops, and help.
Author

Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt

Published

29 May 2024

Prerequisites

  1. Understand how to effectively use Stata do-files and know how to generate log files.

Learning Outcomes

  1. View the characteristics of any dataset using the command describe.
  2. Use help to learn best how to run commands.
  3. Understand the Stata command syntax using the command summarize.
  4. Create loops using the commands for, while, forvalues and foreach .

3.1 Describing Your Data

Let’s start by opening a dataset that was provided when we installed Stata onto our computers. We will soon move on to importing our own data, but this Stata data set will help get us started. This is a data set on automobiles and their characteristics. We can install this dataset by running the command in the cell below:

sysuse auto.dta, clear

We can begin by checking the characteristics of the data set we have just downloaded. The command describe allows us to see the number of observations, the number of variables, a list of variable names and descriptions, and the variable types and labels of that data set.

describe 

Notice that this data set consists of 12 variables and 74 observations. We can see that the first variable is named make, which indicates the make and model of the vehicle. We can also see that the variable make is a string variable (made up of text). Other variables in this data set are numeric. For example, the variable mpg indicates the vehicle’s mileage (miles per gallon) as an integer. The variable foreign is also numeric, and it only takes the values 0 or 1, indicating whether the car is foreign or domestically made; this is a dummy variable.

3.2 Introduction to Stata Command Syntax

3.2.1 Using HELP to understand commands

Stata has a help manual installed in the program which provides documentation for all Stata published commands. This information can be reached by typing the command help and then the name of the command we need extra information about.

Let’s try to see what extra information Stata provides by using the help command with the summarize command. summarize gives us the basic statistics from any variable(s) in the data set, such as the variables we have discussed above, but what else can it do? To see the extra information that is available by using summarize, let’s run the command below:

help summarize

We need to run this command directly into the Stata console on our computer in order to able to see all of the information provided by help. Running this command now will allow us to see that output directly.

When we do, we can see that the first 1-2 letters of the command are often underlined. This underlining indicates the shortest permitted abbreviation for a command (or option).

For example, if we type help rename, we can see that rename can be abbreviated as ren, rena, or renam, or it can be spelled out in its entirety.

Other examples are, generate, append, rotate, run.

If there is no underline, then no abbreviation is allowed. For example, the command replace cannot be abbreviated. The reason for this is that Stata doesn’t want us to accidentally make changes to our data by replacing the information in the variable.

We can write the summarize command with its shortest abbreviation su or a longer abbreviation such as sum.

Also, in the Stata help output we can see that some words are written in blue and are encased within square brackets. We will talk more about these options below, but in Stata we can directly click on those links for more information from help.

Finally, help provides a list of the available options for a command. In the case of summarize, these options allow us to display extra information for a variable. We will learn more about this below in section 3.2.4.

3.2.2 Imposing IF conditions

When the syntax of the command allows for [if], we can run the command on a subset of the data that satisfies any condition we choose. Here is the list of conditional operators available to us:

  1. Equal: ==
  2. Greater than and less than: > and <
  3. Greater than or equal and less than or equal: >= and <=
  4. Not Equal: !=

We can also compound different conditions using the list of logical operators:

  1. And: &
  2. Or: |
  3. Not: ! or ~

Let’s look at an example which applies this new knowledge: summarizing the variable price when the make of the car is domestic (i.e. not foreign):

su price if foreign == 0

Let’s do this again, but now we will impose the additional condition that the mileage must be less than 25.

su price if foreign == 0  & mpg < 25

Maybe we want to restrict to a particular list of values. Here we can write out all of the conditions using the “or” operator, or we can simply make use of the option inlist():

su price if mpg == 10 | mpg == 15 | mpg == 25 | mpg == 40

This works exactly the same way as this command:

su price if inlist(mpg,10,15,25,40)

Maybe we want to restrict to values in a particular range. Here we can use the conditional operators, or we can make use of the option inrange():

su price if mpg >= 5 & mpg <= 25

Notice the output returned by the code below is equal to the previous cell:

su price if inrange(mpg,5,25) 

There might be variables for which there is no information recorded for some observations. For example, when we summarize our automobile data we will see that there are 74 observations for most variables, but that the variable rep78 has only 69 observations - for five observations there is no repair record indicated in the data set.

su price rep78 

If, for some reason, we only want to consider observations without missing values, we can use the option !missing() which combines the command missing() with the negative conditional operator “!”. For example, the command below says to summarize the variable price for all observations for which rep78 is NOT missing.

su price if !missing(rep78)

This command can also be written using the conditional operator since missing numeric variables are indicated by a “.”. This is shown below:

su price if rep78 != .

Notice that in both cases there are only 69 observations.

If we wanted to do this with missing string variables, we could indicate those with ““.

3.2.3 Imposing IN conditions

We can also subset the data by using the observation number. The example below summarizes the data in observations 1 through 10.

su price in 1/10

But be careful! This type of condition is generally not recommended because it depends on how the data is ordered.

To see this, let’s sort the observations in ascending order by running the command sort:

sort price 
su price in 1/10

We can see that the result changes because the observations 1 through 10 in the data are now different.

Always avoid using in whenever you can. Try to use if instead!

3.2.4 Command options

When we used the help command, we saw that we can introduce some optional arguments after a comma. In the case of the summarize command, we were shown the following options: detail, meanonly, format and separator(#).

If we want additional statistics apart from the mean, standard deviation, min, and max values, we can use the option detail or just d for short.

su price, d

3.3 Using Loops

Much like any other programming language, there are for and while loops that we can use to iterate through many times. In particular, the for loops are also sub-divided into forvalues (which iterate across a range of numbers) and foreach (which iterate across a list of names).

It is very common that these loops create a local scope (i.e. the iteration labels only exist within a loop). A local in Stata is a special variable that we create ourselves that temporarily stores information. We’ll discuss locals in the next module, but consider this simple example in which the letter “i” is used as a place holder for the number 95 – it is a local.

For a better understanding of locals and globals, please visit Module 4.

local i = 95

display `i'

We can also create locals that are strings rather than numeric in type. Consider this example:

local course = "ECON 490"

display "`course'"

We can store anything inside a local. When we want to use that information, we include the local encased in a backtick (`) and apostrophe (’).

local course = "ECON 490"

display "I am enrolled in `course' and hope my grade will be `i'%!"

3.3.1 Creating loops Using forvalues

Whenever we want to iterate across a range of values defined as forvalues = local_var_name = min_value(steps)max_value, we can write the command below. Here we are iterating from 1 to 10 in increments of 1.

forvalues counter=1(1)10{
    *Notice that now counter is a local variable
    display `counter'
}

Notice that the open brace { needs to be on the same line as the for command, with no comments after it. Similarly, the closing brace } needs to be on its own line.

Experiment below with the command above by changing the increments and min or max values. See what your code outputs.

/*
forvalues counter=???(???)???{
    display `counter'
}
*/ 

3.3.2 Creating loops using foreach

Whenever we want to iterate across a list of names, we can use the foreach command below. This asks Stata to summarize for a list of variables (in this example, mpg and price).

The syntax for foreach is similar to that of forvalues: foreach local_var_name in "list of variables". Here, we are asking Stata to perform the summarize command on two variables (mpg and price):

foreach name in "mpg" "price"{
    summarize `name'
}

We can have a list stored in a local variable as well. Here, we are storing a list, which includes two variable names (mpg and price) in a local called namelist. Then, using foreach, we summarize name which runs through the list we created above, called namelist.

local namelist "mpg price"
foreach name in `namelist'{
    summarize `name'
}

3.3.3 Writing loops with conitions using while

Whenever we want to iterate until a condition is met, we can write the command below. The condition here is simply “while counter is less than 5”.

local counter = 1 
while `counter'<5{
    display `counter'
    local counter = `counter'+1
}

3.4 Errors

A common occurrence while working with Stata is encountering various errors. Whenever an error occurs, the program will stop executing and an error message will pop-up. Most commonly occuring errors can be attributed to syntax issues, so we should always verify our code before execution. Below we have provided 3 common errors that may pop up.

summarize hello

We must always verify that the variable you use for a command exists and that you are using its correct spelling. Stata alerts you when you try to execute a command with a non-existing variable.

su price if 5 =< mpg =< 25

In this example, the error is due to the use of invalid conditional operators. To make use of the greater than or equal to operator, you must use the symbol (mpg >= ) and to use the less than or equal to operator, you use the symbol (mpg <= ).

local word = 95

display "I am enrolled in `course' and hope my grade will be 'word'%!" // this is incorrect 

display "I am enrolled in `course' and hope my grade will be `word'%!" // this is correct

The number 95 does not display in the string due to the wrong punctuation marks being used to enclose the local. We make the error of using two apostraphes instead of a backtick (`) and an apostrophe (’).

3.5 Wrap Up

In this module, we looked at the way Stata commands function and how their syntax works. In general, many Stata commands will follow the folllowing structure:

name_of_command [varlist] [if] [in] [weight] [, options]

At this point, you should feel more comfortable reading a documentation file for a Stata command. The question that remains is how to find new commands!

You are encouraged to search for commands using the command search. For example, if you are interested in running a regression you can write:

search regress 

We can see that a new Stata window pops up on our computer, and we can click on the different options that are shown to look at the documentation for all these commands. Try it yourself in the code cell below!

In the following modules, whenever there is a command which confuses you, feel free to write search command or help command to redirect to the documentation for reference.

Note: These commands have to be used on your Stata console!

In the next module, we will expand on our knowledge of locals, as well as globals, another type of variable.

3.6 Wrap-up Table

Command Function
describe Provides the characteristics of our dataset including the number of observations and variables, and variable types
summarize Calculates and provides a variety of summary statistics of the general dataset or specific variables
help Provides information on each command including its definition, syntax, and the options associated with the command
if-conditions Used to verify a condition before executing a command. If conditions make use of logical and conditional operators and are preceded by the desired command
sort Used to sort the observations of the data set into ascending order
detail Provides additional statistics, including skewness, kurtosis, the four smallest and four largest values, and various percentile
display Displays strings and values of scalar expressions
search Can be used to find useful commands
while A type of loop that iterates until a condition is met
forvalues A type of for-loop that iterates across a range of numbers
foreach A type of for-loop that iterates across a list of items

References

PDF documentation in Stata
Stata Interface tour
One-way tables of summary statistics
Two-way tables of summary statistics

Working with Do-files (2)
Locals and Globals (4)
  • Creative Commons License. See details.
 
  • Report an issue
  • The COMET Project and the UBC Vancouver School of Economics are located on the traditional, ancestral and unceded territory of the xʷməθkʷəy̓əm (Musqueam) and Sḵwx̱wú7mesh (Squamish) peoples.