{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 02 - Introduction to R\n", "\n", "COMET Team
*Colby Chambers, Anneke Dresselhuis, Oliver Xu, Colin\n", "Grimes, Jonathan Graves* \n", "2023-01-12\n", "\n", "## Outline\n", "\n", "### Prerequisites:\n", "\n", "- Introduction to Jupyter\n", "\n", "### Learning Objectives\n", "\n", "- Understand variables, functions and objects in R\n", "- Import and load data into Jupyter Notebook\n", "- Access and perform manipulations on data" ], "id": "22da7b05-f61a-4a3d-b8a7-9da77ce5234d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run this cell\n", "\n", "source(\"intro_to_r_tests.r\")" ], "id": "33ca1337-3693-44d4-af7b-7f5a8dedca14" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Using R\n", "\n", "In this notebook, we will introducing **R**, which is a programming\n", "language that is particularly well-suited for statistics, econometrics,\n", "and data science. If you are familiar with other programming languages,\n", "such as Python, this will likely be very familiar - if this is your\n", "first time, don’t be intimidated! Try to play around with the examples\n", "and exercises as you work through this notebook; it’s easiest to learn R\n", "(or any programming language) by trying things for yourself.\n", "\n", "# Basic Data Types\n", "\n", "To begin, it’s important to get a good grasp of the different **data\n", "types** in R and how to use them. Whenever we work with R, we will be\n", "manipulating different kinds of information, which is referred to as\n", "“data”. Data comes in many different forms, and these forms define how\n", "we can use it in calculations or visualizations - these are called\n", "*types* in R.\n", "\n", "R has 6 basic data types. Data types are used to store information about\n", "a variable or object in R:\n", "\n", "1. **Character**: data in text format, like “word” or “abc”\n", "2. **Numeric** (real or decimal): data in real number format, like 6,\n", " or 18.8 (referred to as **Double** in R)\n", "3. **Integer**: data in a whole number (integer) format, like 2L (the L\n", " tells R to store this as an integer)\n", "4. **Logical**: truth values, like TRUE or FALSE\n", "5. **Complex**: data in complex (i.e. imaginary) format, like 1+6i\n", " (where $i$ is the $\\sqrt{-1}$)\n", "6. **Raw**: raw digital data, an unusual type which we will not cover\n", " here\n", "\n", "If we are ever wondering what type an object in R is, or what its\n", "properties are, we can use the following two functions, which allow us\n", "to examine the data type and elements contained within an object:\n", "\n", "- `typeof()`: this function returns a character string that\n", " corresponds to the data type of an object\n", "- `str()`: this function displays a compact internal structure of an R\n", " object\n", "\n", "We will see some examples of these in just a moment.\n", "\n", "# Data Structures\n", "\n", "We often need to store data in complex forms. Data can also be stored in\n", "different structures in R beyond basic data types. Data structures in R\n", "programming can be complicated, as they are tools for holding multiple\n", "values. However, some of them are very important and are worth\n", "discussing here.\n", "\n", "- **Vectors**: a vector of values, like $(1,3,5,7)$\n", "- **Matrices**: a matrix of values, like $[1,2; 3,4]$ (usually\n", " displayed as a square)\n", "- **Lists**: a list of elements, like $($pet = “cat,”dog”, “mouse”$)$,\n", " with named properties\n", "- **Dataframe**: a collection of vectors or lists, organized into rows\n", " and columns according to observations\n", "\n", "Note that vectors don’t need to be numeric! There are some useful\n", "built-in functions to create data structures (we don’t have to create\n", "our own functions to do so).\n", "\n", "- `c()`: this function combines values into a vector\n", "- `matrix()`: this function creates a matrix from a given set of\n", " values\n", "- `list()`: this function creates a list from a given set of values\n", "- `data.frame()`: this function creates a data frame from a given set\n", " of lists or vectors\n", "\n", "Okay, enough background - lets see this in action!\n", "\n", "## Working with Vectors\n", "\n", "Vectors are important. We can create them from values or other elements,\n", "using the `c()` function:" ], "id": "dfbc9148-c124-4020-a305-0268374e8d21" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generates a vector containing values\n", "z <- c(1, 2, 3)\n", "\n", "# generates a vector containing characters\n", "countries <- c(\"Canada\", \"Japan\", \"United Kingdom\")" ], "id": "10b12444-2f6e-4c32-84d4-bf4bf493204f" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also access the elements of the vector. Since a vector is made of\n", "basic data, we can get those elements using the `[ ]` index notation.\n", "This is very similar to how in mathematical notation we refer to\n", "elements of a vector.\n", "\n", "> ***Note***: if you’re familiar with other programming languages, it’s\n", "> important to note that R is 1-indexed. So, the first element of a\n", "> vector is 1, not 0. Keep this in mind!" ], "id": "025e2e23-f9f0-4da3-b076-ec5369c37921" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If we want to access specific parts of the vector:\n", "\n", "# the 2nd component of \"z\"\n", "z[2]\n", "\n", "# the 2nd component of \"countries\"\n", "countries[2]" ], "id": "d5d5aa77-01b2-4b39-9005-781fa00858f2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned above, we can use the `typeof()` and `str()` functions to\n", "glimpse the kind of data stored in our objects. Run the cell below to\n", "see how this works:" ], "id": "bd10fda4-dfaa-408d-874d-c78348b652bf" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# view the data type of countries\n", "typeof(countries)\n", "\n", "# view the data structure of countries\n", "str(countries)\n", "\n", "# view the data type of z\n", "typeof(z)\n", "\n", "# view the data structure of z\n", "str(z)" ], "id": "d0fc95c7-c169-4e1d-b3c7-6b074ac2c08b" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of `str(countries)` begins by acknowledging that the\n", "contained data is of a character (chr) type. The information contained\n", "in the `[1:3]` first refers to the component number (there is only 1\n", "component list here) and then the number of observations (the 3\n", "countries).\n", "\n", "## Check Your Knowledge! Vectors\n", "\n", "Let’s see if you understand how to create new vectors! In the block\n", "below:\n", "\n", "1. Create an object named `my_vector` which is a vector and which\n", " contains the numbers from 10 to 15.\n", "2. Extract the 4th element of `my_vector` and store it in the object\n", " `answer1`" ], "id": "385cffee-81dd-4d6f-807e-a56b802c8164" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_vector <- c(...) # replace ... with the appropriate code\n", "\n", "answer1 <- my_vector[...] # replace ... with the appropriate code\n", "\n", "test_1()" ], "id": "f0e3163d-10bc-411c-a503-f6f3965e9ba0" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Matrices\n", "\n", "Just like vectors, we can also create matrices; you can think of them as\n", "organized collections of rows (or columns), which are vectors. They’re a\n", "little bit more complicated to create manually, since you need to use a\n", "more complex function.\n", "\n", "The simplest way to make a matrix is to provide a vector of all the\n", "values you are interested in including, and then tell R how the matrix\n", "is organized. R will then fill in the values:" ], "id": "290ca6b4-e3eb-4e62-8781-e2fb4126de9a" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generates a 2 x 2 matrix\n", "m <- matrix(c(2,3,6,7,7,3), nrow=2,ncol=3)\n", "\n", "print(m)" ], "id": "ebfd7511-b4db-48c1-b7e1-6ad2a41d1fc6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Take note of the order in which the values are filled it; it might be\n", "unexpected!\n", "\n", "Just like with vectors, we can also access parts of the matrix. If you\n", "look at the cell output above, you will see some notation like `[1,]` or\n", "`[,2]`. These are the *rows* and *columns* of the matrix. We can refer\n", "to them using this notation. We can also refer to elements using\n", "`[1,2]`. Again, this is very similar to the mathematical notation for\n", "matrices." ], "id": "29ea7f2e-1b41-444f-9709-9450204a6f5f" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If we want to access specific parts of the matrix:\n", "\n", "# 2th column of matrix\n", "m[,2] \n", "\n", "# 1st row of matrix\n", "m[1,] \n", "\n", "# Element in row 1, column 2\n", "\n", "m[1,2]" ], "id": "3e004294-f91a-4e85-b669-a686457f5f56" }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with vectors, we can also observe and inspect the data structures of\n", "matrices using the helper function above." ], "id": "c75bbe9b-11fa-4326-ad13-d46e8d7aa758" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# what type is m?\n", "\n", "typeof(m)\n", "\n", "# glimpse data structure of m\n", "str(m)" ], "id": "f80a42f8-ea9d-46d7-b26d-0688080ddcff" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output of `str(m)` begins by displaying that the data in the matrix\n", "is of an numeric (num) type. The `[1:2, 1:3]` shows the structre of the\n", "rows and columns. The final part displays the values in the matrix.\n", "\n", "## Test Your Knowledge! Matrices\n", "\n", "In this exercise:\n", "\n", "1. Create an object named `mat` which is a matrix with 2 rows and 2\n", " columns. The first column will take on values 1,2, while the second\n", " column will take on values 3,4.\n", "2. Extract the value in the first row, second column from `mat` and\n", " store it in the object `answer2`" ], "id": "d50da7a2-4fe6-46ed-81af-b9627554b7ef" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "mat <- matrix(..., nrow=...,ncol=...) # fill in the missing code\n", "answer2 <- mat[...] # fill in the missing code\n", "\n", "test_2()" ], "id": "9f8728db-bcfe-40dc-96e9-d4c3181e6da8" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Lists\n", "\n", "Lists are a little bit more complex because they can store many\n", "different data types and objects, each of which can be given *names*\n", "which are specific ways to refer to these objects. Names can be any\n", "useful descriptive term for an element of the list. You can think of\n", "lists like flexible vectors with names." ], "id": "843cd405-45bc-46d3-ac9e-5aabe6cc64c9" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generates a list with 3 components named \"text\" \"a_vector\" and \"a_matrix\"\n", "my_list <- list(text=\"test\", a_vector = z, a_matrix = m) " ], "id": "3f083db8-cf75-4afd-b325-4a02b20198a2" }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access elements of the list using the `[ ]` or `[[ ]]`\n", "operations. There is a difference:\n", "\n", "- `[ ]` accesses the *elements of the list* which is the name and\n", " object\n", "- `[[ ]]` accesses the *object* directly\n", "\n", "We usually want to use `[[ ]]` when working with data stored in lists.\n", "One very nice feature is that you can refer to elements of a list by\n", "number (like a vector) or by their name." ], "id": "6108e122-8ca8-48f3-a6c9-ac968de23352" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If we want to access specific parts of the list:\n", "\n", "# 1st component in list\n", "my_list[[1]] \n", "\n", "# 1st component in list by name (text)\n", "my_list[[\"text\"]]\n", "\n", "# 1st part of the list (note the brackets)\n", "my_list[1] \n", "\n", "# glimpse data type of my_list\n", "typeof(my_list)" ], "id": "88cc6b09-a3fd-4efd-ba9f-864a6436b649" }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is one final way to access elements of a list by name: using the\n", "`$` or **access** operator. This works basically like `[[name]]` but is\n", "more transparent when writing code. You put the object you want to\n", "access, followed by the operator, followed by the property:" ], "id": "bc128380-e868-4135-bd18-157de2952692" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# access the named property \"text\"\n", "my_list$text\n", "\n", "#access the named property \"a_matrix\"\n", "my_list$a_matrix" ], "id": "495c3c9e-aa1e-46cc-883f-46ec54778de6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "You will notice that this *only* works for named object - which is\n", "particularly convenient for data frames, which we will discuss next.\n", "\n", "## Test Your Knowledge! Lists\n", "\n", "In this exercise, you will need to:\n", "\n", "1. Create an object named `a_list`, which is a list with two\n", " components: an element called `String` which stores the character\n", " string “Hello World”, and an element called `Range` which contains a\n", " vector with values 1 through 5.\n", "2. Extract the value of the second element, and store it in the object\n", " `answer3`" ], "id": "b3679df1-55f8-4c9f-ad95-47e795698be6" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a_list <- list(... = \"Hello World\", Range = ...) # fill in the missing code\n", "\n", "answer3 <- a_list... # fill in the missing code\n", "\n", "test_3()" ], "id": "43767e1b-fcc2-4f56-850c-349cc4c22fca" }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Dataframes\n", "\n", "Dataframes are the most complex object you will work with in this course\n", "but also the most important. They represent data - like the kind of data\n", "we would use in econometrics. In this course, we will primarily focus on\n", "**tidy data**, which refers to data in which the columns represent\n", "variables, and the rows represent observations. In terms of R, you can\n", "think of data-frames as a combination of a matrix and a list.\n", "\n", "We can access columns (variables) using their names, or their ordering" ], "id": "5228e918-2507-49cb-b31f-0f9175bb6218" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generates a dataframe with 2 columns and 3 rows\n", "df <- data.frame(ID=c(1:3),\n", " Country=countries)\n", "\n", "# If we want access specific parts of the dataframe:\n", "\n", "# 2nd column in dataframe\n", "df[2] \n", "\n", "df$Country\n", "\n", "# glimpse compact data structure of df\n", "str(df)" ], "id": "b09b8e85-556d-4fdb-ae7c-00b01f2c193c" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the `str(df)` command shows us what the names of the columns\n", "are in this dataset and how we can access them.\n", "\n", "## Test Your Knowledge: Dataframes\n", "\n", "In this exercise:\n", "\n", "1. Create an object `my_dataframe` which is a dataframe with two\n", " variables and two observations. The first column `var1` will take on\n", " values `c(1,2)`. The second column `var2` will take on values\n", " `c(\"A\", \"B\")`.\n", "2. Extract the column `var1` and store it in the object `answer4`" ], "id": "2f5451fb-163b-4274-b7c0-8c0c14b26537" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_dataframe <- data.frame(var1=..., ...=c(\"A\",\"B\")) # fill in the missing code\n", "answer4 <- ... # fill in the missing code\n", "\n", "\n", "test_4()" ], "id": "d24ffa32-6dc0-4787-ad5e-0e0b69d855ee" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Objects and Variables\n", "\n", "At this point, you are familiar with some of the different types of data\n", "in R and how they work. However, let’s understand how we can work with\n", "these data types in more detail by writing R code. A **variable** or\n", "**object** is a name assigned to a memory location in the R workspace\n", "(working memory). For now we can use the terms variable and object\n", "interchangeably. An object will always have an associated type,\n", "determined by the information assigned to it. Clear and concise object\n", "assignment is essential for **reproducible data analysis**, as mentioned\n", "in the module *Intro to Jupyter*.\n", "\n", "When it comes to code, we can assign information (stored in a specific\n", "data type) to variables and objects using the **assignment operator**\n", "`<-`. Using the assignment operator, the information on the right-hand\n", "side is assigned to the variable/object on the left-hand side; we’ve\n", "seen this before, in some of the examples earlier.\n", "\n", "In the example \\[2\\] below, `\"Hello\"` has been assigned to the object\n", "`var_1`. `\"Hello\"` will be stored in the R workspace as an object named\n", "`\"var_1\"`.\n", "\n", "> **Important Note**: R is case sensitive. When referring to an object,\n", "> it must *exactly* match the assignment. `Var_1` is not the same as\n", "> `var_1` or `var1`" ], "id": "2521b1f2-e8d1-455e-a0e0-9caa64e6730c" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var_1 <- \"Hello\"\n", "\n", "var_1\n", "\n", "typeof(var_1)" ], "id": "f8ea3e43-2f97-4a27-9dc0-7af903484c09" }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can create variables of many different types, including all of the\n", "basic and advanced types we discussed above." ], "id": "de786923-974d-472b-a0e9-45415d29ee03" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "var_2 <- 34.5 #numeric/double\n", "var_3 <- 6L #integer\n", "var_4 <- TRUE #logical/boolean\n", "var_5 <- 1 + 3i #complex" ], "id": "03b658f2-fc9b-4e25-8e51-20c94bf1507a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Operations\n", "\n", "In R, we can also perform **operations** on objects; the type of an\n", "object defines what operations are valid. All of the basic mathematical\n", "and logical operations you are familiar with are example of these, but\n", "there are many more. For example:" ], "id": "c80c3739-6175-4871-a089-1b634dbf9fcc" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a <- 4 # creates an object named \"a\" assigned to the value: 4\n", "b <- 6 # creates an object named \"b\" assigned to the value: 6\n", "c <- a + b # creates an object \"c\" assigned to the value (a = 4) + (b = 6)" ], "id": "b450a536-8eaf-4e51-bcdc-60408e0b65d6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Try and think about what value c holds!\n", "\n", "We can view the assigned value of `c` in two different ways: 1. By\n", "printing `a + b` \n", "OR 1. By printing `c`\n", "\n", "Run the code cell below to see for yourself!" ], "id": "3430c2bd-c06b-4e66-8d4b-dc372e7bc7af" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a + b\n", "c" ], "id": "a3febf23-1bfb-4e34-9c10-01c00765eab6" }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to change the value of our objects. In the example\n", "below, the object `b` has been reassigned the value 5." ], "id": "2ad5d022-94c9-4b5b-929b-c1c65cfa3378" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b <- 5 " ], "id": "c3eee858-413d-4595-a6f2-e5e58dcad33e" }, { "cell_type": "markdown", "metadata": {}, "source": [ "R will now store the updated value of 5 in the object `b`. This\n", "overrides the original assignment of 6 to `b`. The ability to change\n", "object names is a key benefit using variables in R. We can simply\n", "reassign the value to a variable without having to change that value\n", "everywhere in our code. This will be quite useful when we want to do\n", "things such as change the name of a column in a dataset.\n", "\n", "> ***Tip:*** Remember to use a unique object name that hasn’t been used\n", "> before in order to avoid unplanned object reassignments when creating\n", "> a new object. The more descriptive, the better!\n", "\n", "## Test Your Knowledge! Basic Operations\n", "\n", "In this exercise:\n", "\n", "1. create an object `u` which is equal to 1\n", "2. create an object `y` which is equal to 7\n", "3. create an object `w` which is equal to 10.\n", "4. create an object `answer6` which is equal to the sum of `u` and `y`,\n", " divided by `w`" ], "id": "96223dbb-2ed4-449f-b76f-1b2be21cb5fd" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "... <- 1 # fill in the missing code\n", "y <- ...\n", "... <- ..\n", "\n", "answer6 <- ...\n", "\n", "\n", "test_6()" ], "id": "9d637e56-8dae-471a-957a-3c5121911a28" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Comments\n", "\n", "While developing our code, we do not always have to use markdown cells\n", "to document our process. We can also write notes in code cells using\n", "something called a **comment**. A comment simply allows us to write\n", "lines in our code cell which will not run when we run the cell itself.\n", "By simply typing the `#` sign, anything written directly after this sign\n", "and on the same line will not run; it is a comment. To comment out\n", "multiple lines of code, simply include the `#` sign at the start of each\n", "line.\n", "\n", "> In general, the purpose of comments is to make the source code easier\n", "> for readers to understand. Remember the concept of **reproducibility**\n", "> from our last notebook?\n", "\n", "It is important to comment on our code for three main reasons:\n", "\n", "1. It allows us to **keep track of our actions and thought process**:\n", " Commenting is a great way to help us stay organized. Code comments\n", " provide an ordered process for everyone to follow. In case we need\n", " to debug our codes, we can easily track which step is problematic\n", " and come back to that particular line of code that may be the source\n", " of the problem.\n", "\n", "2. It **helps readers understand** why we’re coding in a particular\n", " way: While coding something like `a + b` may be a more or less\n", " straightforward computation, our reader may not be able to\n", " understand what `a` or `b` are referring to, or why they need to be\n", " added to each other. Our readers or other developers may ask: why is\n", " addition used instead of multiplication or division? With comments,\n", " we can explain why this particular method was used for this\n", " particular code block and how it relates to other code blocks.\n", "\n", "3. It **saves everyone’s time** in the future, including yourself: It’s\n", " far easier than you might expect to forget what a piece of code\n", " does, or is supposed to do. Keeping good comments ensures that your\n", " code remains comprehensible.\n", "\n", "> **Tip**: an old woodworker’s tip is to always label something when\n", "> taking it apart so that a stranger could put the pieces back together.\n", "> The same advice applies to comments and coding: write code so that a\n", "> stranger could figure out what it is supposed to do.\n", "\n", "Generally, it is always a good idea to add comments to our code.\n", "However, if we find ourselves needing to explain an important block of\n", "code using lines upon lines of comments, it is preferable to use a\n", "markdown cell instead to give ourselves more room. Comments are best\n", "served for the reasons above.\n", "\n", "# More on Operators\n", "\n", "Earlier, we used discussed operations and used the example of `+` to run\n", "the addition of `a` and `b`. `+` is a type of R arithmetic **operator**\n", "which means a symbol that tells R to perform a specific operation. We\n", "can use different R operators with variables. R has 4 types of\n", "operators:\n", "\n", "1. **Arithmetic operators**: used to carry out mathematical operations.\n", " Ex. `*` for multiplication, `/` for division, `^` for exponent etc.\n", "2. **Assignment operators**: used to assign values to variables. Ex.\n", " `<-`\n", "3. **Relational operators**: used to compare between values. Ex. `>`\n", " for greater than, `==` for equal to, `!=` for not equal to etc.\n", "4. **Logical operators**: used to carry out Boolean operations. Ex. `!`\n", " for Logical NOT, `&` for Logical AND etc.\n", "\n", "We won’t cover all of these right now, but you can look them up online;\n", "for now, keep an eye out for them when they occur.\n", "\n", "# Functions\n", "\n", "These simple operations are great to start with, but what if we want to\n", "do operations on different values of X and Y over and over and don’t\n", "want to constantly rewrite this code? This is where **functions** come\n", "in. Functions allow us to carry out specific tasks. We simply pass in a\n", "parameter or parameters to the function. Code is then executed in the\n", "function body based on these parameters, and output may be returned." ], "id": "e4c5138e-57cb-4489-b06c-e939c10d5d9e" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Functionname <- function(arguments)\n", "# {code operating on the arguments\n", "# }" ], "id": "4041e84c-c9f2-435c-9d10-4d6a9e670afc" }, { "cell_type": "markdown", "metadata": {}, "source": [ "This structure says that we start with a name for our function\n", "(`Functionname`, here) and we use the assignment operator similarly to\n", "when we assign values to variables. We then pass **arguments or\n", "parameters** to our function (which can be numeric, characters, vectors,\n", "collections such as lists, etc.); think of them as the *inputs* to the\n", "function.\n", "\n", "Finally, within the curly brackets we write our code needed to\n", "accomplish our desired task. Once we have done this, we can call this\n", "function anywhere in our code (after having run the cell defining the\n", "function!) and evaluate it based on specific parameter values.\n", "\n", "An example is shown below; can you figure out what this function does?" ], "id": "7a305fe1-4273-4e35-a46d-a53b6b3a4c0d" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_function <- function(x, y)\n", " {x = x + y\n", " 2 * x\n", "}" ], "id": "aaa3ef58-4f90-4fee-873e-8c5a141c3def" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The parameters placed into functions can be given **defaults**. Defaults\n", "are specific values for parameters that have been chosen and defined\n", "within the circular brackets of the function definition. For example, we\n", "can define `y = 3` as a default in our `my_function`. When we call our\n", "function, we then do not have to specify an input for `y` unless we want\n", "to." ], "id": "30068b24-5e8f-4827-aa53-638377be9bc7" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_function <- function(x, y = 3)\n", " {x = x + y\n", " 2 * x}\n", "\n", "my_function(2)" ], "id": "f81e12b5-3ad6-4cde-a9b0-81e7699ffc5a" }, { "cell_type": "markdown", "metadata": {}, "source": [ "However, if we want to override this default, we can simply call the\n", "function with a new input for `y`. This is done below for `y=4`,\n", "allowing us to execute our code as though our default was actually\n", "`y=4`." ], "id": "1eaffd4b-dd9b-4614-9ebf-a25620872bd8" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_function <- function(x, y = 3)\n", " {x = x + y\n", " 2 * x}\n", "\n", "my_function(2, 4)" ], "id": "cf6c8a32-892c-4e53-aeec-d34f50100880" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, note that we can **nest** functions within functions, meaning\n", "we can call functions inside of other functions - creating very complex\n", "arrangements. Just be sure that these inner functions have themselves\n", "already been defined." ], "id": "44eb4614-b082-4e0a-9ff6-b24b80ae26fc" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_function_1 <- function(x, y)\n", " {x = x + y + 2\n", " 2 * x}\n", "\n", "my_function_2 <- function(x, y)\n", " {x = x + y - my_function_1(x, y)\n", " 2 * x}\n", "\n", "my_function_2(2, 3)" ], "id": "975099c6-efa3-4a4a-9884-581ae418fc6e" }, { "cell_type": "markdown", "metadata": {}, "source": [ "Luckily, we usually don’t have to define our own functions, since most\n", "useful built-in functions we need already come with R - although we may\n", "need to import specific packages to access them. We can always use the\n", "help `?` feature in R to learn more about a built-in function if we’re\n", "unsure. For example, `?max` gives us more information about the `max()`\n", "function.\n", "\n", "For more information about how you can read and use different functions,\n", "please refer to the [Function Cheat\n", "Sheet](https://cran.r-project.org/doc/contrib/Short-refcard.pdf).\n", "\n", "## Test Your Knowledge! Functions\n", "\n", "In this exercise:\n", "\n", "1. Create a function `divide` which takes in two arguments, `x` and\n", " `y`. The function should return `x` divided by `y`.\n", "2. Store the solution to `divide(5,3)` in the object `answer7`:" ], "id": "7ac9e3ac-5b52-4412-8800-a9b0df17ee4b" }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "divide <- function(x,y) {\n", " ...\n", " }\n", "\n", "# Your code goes here\n", "\n", "answer7 <- ...(5,3)\n", "\n", "\n", "test_7()" ], "id": "c1afc136-45b9-4f25-a3a5-b4d4188395ee" }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Oops! Dealing with Errors\n", "\n", "Sometimes in our analysis we can run into errors in our code. This\n", "happens to everyone - don’t worry - it’s not a reason to panic.\n", "Understanding the nature of the error we are confronted with can be a\n", "helpful first step to finding a solution. There are two common types of\n", "errors:\n", "\n", "- **Syntax errors**: This is the most common error type. These errors\n", " result from invalid code statements/structures that R doesn’t\n", " understand. Suppose R speaks English, asking it to help by speaking\n", " German or broken English certainly would not work! Here are some\n", " examples of common syntax errors: the associated package is not\n", " loaded, misspelling of a command as R is case-sensitive,\n", " unmatched/incomplete parenthesis etc. How we handle syntax errors is\n", " case-by-case: we can usually solve syntax errors by reading the\n", " error message and finding what is often a typo or by looking up the\n", " error message on the internet using resources like stack overflow.\n", "\n", "- **Semantic errors**: These errors result from valid code that\n", " successfully executes but produces unintended outcomes. Again, let\n", " us suppose R speaks English. Although we asked it to hand us an\n", " apple in English and R successfully understood, it somehow handed us\n", " a banana! This is not okay! How we handle semantic errors is also\n", " case-by-case - we can usually solve semantic errors by reading the\n", " error message and searching it online.\n", "\n", "Now that we have all of these terms and tools at our disposal, we can\n", "begin to load in data and operate on it using what we’ve learned.\n", "\n", "# Wrapping Up\n", "\n", "In this notebook, we have learned the different ways data can be stored\n", "and structured in our R memory. We have also learned how to manipulate,\n", "extract and operate on data from different structures. Finally, we have\n", "learned how to write a function that can perform operations more\n", "efficiently." ], "id": "9abc2de2-5333-4d3d-acad-6caf951d9422" } ], "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "name": "ir", "display_name": "R", "language": "r" } } }