{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 03 - Stata Essentials\n", "\n", "Marina Adshade, Paul Corcuera, Giulia Lo Forte, Jane Platt \n", "2024-05-29\n", "\n", "## Prerequisites\n", "\n", "1. Understand how to effectively use Stata do-files and know how to\n", " generate log files.\n", "\n", "## Learning Outcomes\n", "\n", "1. View the characteristics of any dataset using the command\n", " `describe`.\n", "2. Use `help` to learn best how to run commands.\n", "3. Understand the Stata command syntax using the command `summarize`.\n", "4. Create loops using the commands `for`, `while`, `forvalues` and\n", " `foreach` .\n", "\n", "## 3.1 Describing Your Data\n", "\n", "Let’s start by opening a dataset that was provided when we installed\n", "Stata onto our computers. We will soon move on to importing our own\n", "data, but this Stata data set will help get us started. This is a data\n", "set on automobiles and their characteristics. We can install this\n", "dataset by running the command in the cell below:\n", "\n", "``` {stata}\n", "sysuse auto.dta, clear\n", "```\n", "\n", "We can begin by checking the characteristics of the data set we have\n", "just downloaded. The command `describe` allows us to see the number of\n", "observations, the number of variables, a list of variable names and\n", "descriptions, and the variable types and labels of that data set.\n", "\n", "``` {stata}\n", "describe \n", "```\n", "\n", "Notice that this data set consists of 12 variables and 74 observations.\n", "We can see that the first variable is named *make*, which indicates the\n", "make and model of the vehicle. We can also see that the variable *make*\n", "is a string variable (made up of text). Other variables in this data set\n", "are numeric. For example, the variable *mpg* indicates the vehicle’s\n", "mileage (miles per gallon) as an integer. The variable *foreign* is also\n", "numeric, and it only takes the values 0 or 1, indicating whether the car\n", "is foreign or domestically made; this is a dummy variable.\n", "\n", "## 3.2 Introduction to Stata Command Syntax\n", "\n", "### 3.2.1 Using HELP to understand commands\n", "\n", "Stata has a help manual installed in the program which provides\n", "documentation for all Stata published commands. This information can be\n", "reached by typing the command `help` and then the name of the command we\n", "need extra information about.\n", "\n", "Let’s try to see what extra information Stata provides by using the\n", "`help` command with the `summarize` command. `summarize` gives us the\n", "basic statistics from any variable(s) in the data set, such as the\n", "variables we have discussed above, but what else can it do? To see the\n", "extra information that is available by using `summarize`, let’s run the\n", "command below:\n", "\n", "``` {stata}\n", "help summarize\n", "```\n", "\n", "We need to run this command directly into the Stata console on our\n", "computer in order to able to see all of the information provided by\n", "`help`. Running this command now will allow us to see that output\n", "directly.\n", "\n", "When we do, we can see that the first 1-2 letters of the command are\n", "often underlined. This underlining indicates the shortest permitted\n", "abbreviation for a command (or option).\n", "\n", "For example, if we type `help rename`, we can see that `rename` can be\n", "abbreviated as `ren`, `rena`, or `renam`, or it can be spelled out in\n", "its entirety.\n", "\n", "Other examples are, `g`enerate, `ap`pend, `rot`ate, `ru`n.\n", "\n", "If there is no underline, then no abbreviation is allowed. For example,\n", "the command `replace` cannot be abbreviated. The reason for this is that\n", "Stata doesn’t want us to accidentally make changes to our data by\n", "replacing the information in the variable.\n", "\n", "We can write the `summarize` command with its shortest abbreviation `su`\n", "or a longer abbreviation such as `sum`.\n", "\n", "Also, in the Stata help output we can see that some words are written in\n", "blue and are encased within square brackets. We will talk more about\n", "these options below, but in Stata we can directly click on those links\n", "for more information from help.\n", "\n", "Finally, help provides a list of the available options for a command. In\n", "the case of `summarize`, these options allow us to display extra\n", "information for a variable. We will learn more about this below in\n", "section 3.2.4.\n", "\n", "### 3.2.2 Imposing IF conditions\n", "\n", "When the syntax of the command allows for `[if]`, we can run the command\n", "on a subset of the data that satisfies any condition we choose. Here is\n", "the list of conditional operators available to us:\n", "\n", "1. Equal: ==\n", "2. Greater than and less than: \\> and \\<\n", "3. Greater than or equal and less than or equal: \\>= and \\<=\n", "4. Not Equal: !=\n", "\n", "We can also compound different conditions using the list of logical\n", "operators:\n", "\n", "1. And: &\n", "2. Or: \\|\n", "3. Not: ! or ~\n", "\n", "Let’s look at an example which applies this new knowledge: summarizing\n", "the variable *price* when the make of the car is domestic (i.e. not\n", "foreign):\n", "\n", "``` {stata}\n", "su price if foreign == 0\n", "```\n", "\n", "Let’s do this again, but now we will impose the additional condition\n", "that the mileage must be less than 25.\n", "\n", "``` {stata}\n", "su price if foreign == 0 & mpg < 25\n", "```\n", "\n", "Maybe we want to restrict to a particular list of values. Here we can\n", "write out all of the conditions using the “or” operator, or we can\n", "simply make use of the option `inlist()`:\n", "\n", "``` {stata}\n", "su price if mpg == 10 | mpg == 15 | mpg == 25 | mpg == 40\n", "```\n", "\n", "This works exactly the same way as this command:\n", "\n", "``` {stata}\n", "su price if inlist(mpg,10,15,25,40)\n", "```\n", "\n", "Maybe we want to restrict to values in a particular range. Here we can\n", "use the conditional operators, or we can make use of the option\n", "`inrange()`:\n", "\n", "``` {stata}\n", "su price if mpg >= 5 & mpg <= 25\n", "```\n", "\n", "Notice the output returned by the code below is equal to the previous\n", "cell:\n", "\n", "``` {stata}\n", "su price if inrange(mpg,5,25) \n", "```\n", "\n", "There might be variables for which there is no information recorded for\n", "some observations. For example, when we `summarize` our automobile data\n", "we will see that there are 74 observations for most variables, but that\n", "the variable *rep78* has only 69 observations - for five observations\n", "there is no repair record indicated in the data set.\n", "\n", "``` {stata}\n", "su price rep78 \n", "```\n", "\n", "If, for some reason, we only want to consider observations without\n", "missing values, we can use the option `!missing()` which combines the\n", "command `missing()` with the negative conditional operator “!”. For\n", "example, the command below says to summarize the variable *price* for\n", "all observations for which *rep78* is NOT missing.\n", "\n", "``` {stata}\n", "su price if !missing(rep78)\n", "```\n", "\n", "This command can also be written using the conditional operator since\n", "missing numeric variables are indicated by a “.”. This is shown below:\n", "\n", "``` {stata}\n", "su price if rep78 != .\n", "```\n", "\n", "Notice that in both cases there are only 69 observations.\n", "\n", "If we wanted to do this with missing string variables, we could indicate\n", "those with ““.\n", "\n", "### 3.2.3 Imposing IN conditions\n", "\n", "We can also subset the data by using the observation number. The example\n", "below summarizes the data in observations 1 through 10.\n", "\n", "``` {stata}\n", "su price in 1/10\n", "```\n", "\n", "But be careful! This type of condition is generally not recommended\n", "because it depends on how the data is ordered.\n", "\n", "To see this, let’s sort the observations in ascending order by running\n", "the command `sort`:\n", "\n", "``` {stata}\n", "sort price \n", "su price in 1/10\n", "```\n", "\n", "We can see that the result changes because the observations 1 through 10\n", "in the data are now different.\n", "\n", "Always avoid using `in` whenever you can. Try to use `if` instead!\n", "\n", "### 3.2.4 Command options\n", "\n", "When we used the `help` command, we saw that we can introduce some\n", "optional arguments after a comma. In the case of the `summarize`\n", "command, we were shown the following options: `d`etail, `mean`only,\n", "`f`ormat and `sep`arator(#).\n", "\n", "If we want additional statistics apart from the mean, standard\n", "deviation, min, and max values, we can use the option `detail` or just\n", "`d` for short.\n", "\n", "``` {stata}\n", "su price, d\n", "```\n", "\n", "## 3.3 Using Loops\n", "\n", "Much like any other programming language, there are `for` and `while`\n", "loops that we can use to iterate through many times. In particular, the\n", "`for` loops are also sub-divided into `forvalues` (which iterate across\n", "a range of numbers) and `foreach` (which iterate across a list of\n", "names).\n", "\n", "It is very common that these loops create a *local* scope (i.e. the\n", "iteration labels only exist within a loop). A `local` in Stata is a\n", "special variable that we create ourselves that temporarily stores\n", "information. We’ll discuss locals in the next module, but consider this\n", "simple example in which the letter “i” is used as a place holder for the\n", "number 95 – it is a `local`.\n", "\n", "For a better understanding of locals and globals, please visit [Module\n", "4](https://comet.arts.ubc.ca/docs/Research/econ490-stata/04_Locals_and_Globals.html).\n", "\n", "``` {stata}\n", "local i = 95\n", "\n", "display `i'\n", "```\n", "\n", "We can also create locals that are strings rather than numeric in type.\n", "Consider this example:\n", "\n", "``` {stata}\n", "local course = \"ECON 490\"\n", "\n", "display \"`course'\"\n", "```\n", "\n", "We can store anything inside a local. When we want to use that\n", "information, we include the local encased in a backtick (\\`) and\n", "apostrophe (’).\n", "\n", "``` {stata}\n", "local course = \"ECON 490\"\n", "\n", "display \"I am enrolled in `course' and hope my grade will be `i'%!\"\n", "```\n", "\n", "### 3.3.1 Creating loops Using `forvalues`\n", "\n", "Whenever we want to iterate across a range of values defined as\n", "`forvalues = local_var_name = min_value(steps)max_value`, we can write\n", "the command below. Here we are iterating from 1 to 10 in increments of\n", "1.\n", "\n", "``` {stata}\n", "forvalues counter=1(1)10{\n", " *Notice that now counter is a local variable\n", " display `counter'\n", "}\n", "```\n", "\n", "Notice that the open brace `{` needs to be on the same line as the `for`\n", "command, with no comments after it. Similarly, the closing brace `}`\n", "needs to be on its own line.\n", "\n", "Experiment below with the command above by changing the increments and\n", "min or max values. See what your code outputs.\n", "\n", "``` {stata}\n", "/*\n", "forvalues counter=???(???)???{\n", " display `counter'\n", "}\n", "*/ \n", "```\n", "\n", "### 3.3.2 Creating loops using `foreach`\n", "\n", "Whenever we want to iterate across a list of names, we can use the\n", "`foreach` command below. This asks Stata to `summarize` for a list of\n", "variables (in this example, *mpg* and *price*).\n", "\n", "The syntax for `foreach` is similar to that of `forvalues`:\n", "`foreach local_var_name in \"list of variables\"`. Here, we are asking\n", "Stata to perform the `summarize` command on two variables (*mpg* and\n", "*price*):\n", "\n", "``` {stata}\n", "foreach name in \"mpg\" \"price\"{\n", " summarize `name'\n", "}\n", "```\n", "\n", "We can have a list stored in a local variable as well. Here, we are\n", "storing a list, which includes two variable names (*mpg* and *price*) in\n", "a local called *namelist*. Then, using `foreach`, we summarize *name*\n", "which runs through the list we created above, called *namelist*.\n", "\n", "``` {stata}\n", "local namelist \"mpg price\"\n", "foreach name in `namelist'{\n", " summarize `name'\n", "}\n", "```\n", "\n", "### 3.3.3 Writing loops with conitions using `while`\n", "\n", "Whenever we want to iterate until a condition is met, we can write the\n", "command below. The condition here is simply “while counter is less than\n", "5”.\n", "\n", "``` {stata}\n", "local counter = 1 \n", "while `counter'<5{\n", " display `counter'\n", " local counter = `counter'+1\n", "}\n", "```\n", "\n", "## 3.4 Errors\n", "\n", "A common occurrence while working with Stata is encountering various\n", "errors. Whenever an error occurs, the program will stop executing and an\n", "error message will pop-up. Most commonly occuring errors can be\n", "attributed to syntax issues, so we should always verify our code before\n", "execution. Below we have provided 3 common errors that may pop up.\n", "\n", "``` {stata}\n", "summarize hello\n", "```\n", "\n", "We must always verify that the variable you use for a command exists and\n", "that you are using its correct spelling. Stata alerts you when you try\n", "to execute a command with a non-existing variable.\n", "\n", "``` {stata}\n", "su price if 5 =< mpg =< 25\n", "```\n", "\n", "In this example, the error is due to the use of invalid conditional\n", "operators. To make use of the greater than or equal to operator, you\n", "must use the symbol (mpg \\>= ) and to use the less than or equal to\n", "operator, you use the symbol (mpg \\<= ).\n", "\n", "``` {stata}\n", "local word = 95\n", "\n", "display \"I am enrolled in `course' and hope my grade will be 'word'%!\" // this is incorrect \n", "\n", "display \"I am enrolled in `course' and hope my grade will be `word'%!\" // this is correct\n", "```\n", "\n", "The number 95 does not display in the string due to the wrong\n", "punctuation marks being used to enclose the local. We make the error of\n", "using two apostraphes instead of a backtick (\\`) and an apostrophe (’).\n", "\n", "## 3.5 Wrap Up\n", "\n", "In this module, we looked at the way Stata commands function and how\n", "their syntax works. In general, many Stata commands will follow the\n", "folllowing structure:\n", "\n", "``` stata\n", "name_of_command [varlist] [if] [in] [weight] [, options]\n", "```\n", "\n", "At this point, you should feel more comfortable reading a documentation\n", "file for a Stata command. The question that remains is how to find new\n", "commands!\n", "\n", "You are encouraged to search for commands using the command `search`.\n", "For example, if you are interested in running a regression you can\n", "write:\n", "\n", "``` {stata}\n", "search regress \n", "```\n", "\n", "We can see that a new Stata window pops up on our computer, and we can\n", "click on the different options that are shown to look at the\n", "documentation for all these commands. Try it yourself in the code cell\n", "below!\n", "\n", "``` {stata}\n", "```\n", "\n", "In the following modules, whenever there is a command which confuses\n", "you, feel free to write `search command` or `help command` to redirect\n", "to the documentation for reference.\n", "\n", "**Note:** These commands have to be used on your Stata console!\n", "\n", "In the next module, we will expand on our knowledge of locals, as well\n", "as globals, another type of variable.\n", "\n", "## 3.6 Wrap-up Table\n", "\n", "| Command | Function |\n", "|-------------------------------|-----------------------------------------|\n", "| `describe` | Provides the characteristics of our dataset including the number of observations and variables, and variable types |\n", "| `summarize` | Calculates and provides a variety of summary statistics of the general dataset or specific variables |\n", "| `help` | Provides information on each command including its definition, syntax, and the options associated with the command |\n", "| `if-conditions` | Used to verify a condition before executing a command. If conditions make use of logical and conditional operators and are preceded by the desired command |\n", "| `sort` | Used to sort the observations of the data set into ascending order |\n", "| `detail` | Provides additional statistics, including skewness, kurtosis, the four smallest and four largest values, and various percentile |\n", "| `display` | Displays strings and values of scalar expressions |\n", "| `search` | Can be used to find useful commands |\n", "| `while` | A type of loop that iterates until a condition is met |\n", "| `forvalues` | A type of for-loop that iterates across a range of numbers |\n", "| `foreach` | A type of for-loop that iterates across a list of items |\n", "\n", "## References\n", "\n", "[PDF documentation in\n", "Stata](https://www.youtube.com/watch?v=zyJ8Wk3rV2c&list=PLN5IskQdgXWnnIVeA_Y0OBGmnw21fvcmU&index=2)\n", "
[Stata Interface tour](https://www.youtube.com/watch?v=FQ1MBQw_MTI)\n", "
[One-way tables of summary\n", "statistics](https://www.youtube.com/watch?v=ug0LihyIzvM)
[Two-way\n", "tables of summary\n", "statistics](https://www.youtube.com/watch?v=u_Efw1oWxWk)" ], "id": "5e1cd4bb-da5f-4224-9c0f-78f8a254708a" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "python3", "display_name": "Python 3 (ipykernel)", "language": "python", "path": "/usr/local/share/jupyter/kernels/python3" }, "language_info": { "name": "python", "codemirror_mode": { "name": "ipython", "version": "3" }, "file_extension": ".py", "mimetype": "text/x-python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } } }