School of Economics and Management
Beihang University
http://yanfei.site

Objectives

• Overview of R
• R nuts and bolts
• Getting data in and out of R
• Subsetting R objects

What is R?

• A freely available language and environment
• Statistical computing and graphics
• Linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.

Installation

Why Rstudio?

• Syntax highlighting
• Able to evaluate R code
• by line
• by selection
• entire file
• Command auto-completion

Design of the R System

• When you download R from CRAN, you get the "base" system - a substantial amount of functionality.
• 10,000 packages on CRAN that have been developed by users and programmers around the world.

• People often make packages available on their personal websites.
• There are a number of packages being developed on repositories like GitHub and BitBucket.

1 + 2 + 3
##  6
1 + 2 * 3
##  7

x <- 1
y <- 2
z <- c(x, y)
z
##  1 2

exp(1)
##  2.718282
cos(3.141593)
##  -1
log2(1)
##  0

R Objects

R has five basic classes of objects:

1. character
2. numeric (real numbers)
3. integer
4. complex
5. logical (True/False)

Numbers

• Numbers in R are generally treated as numeric objects.
• Difference of 1 and 1L?
• Special number Inf. Try 1/Inf.
• NaN: an undefined value (not a number). Try 0/0. It can also be thought of as a missing value.

Attributes

Attributes can be accessed by attributes(). Some examples of R object attributes are:

• names, dimnames
• dimensions (e.g. matrices, arrays)
• class (e.g. integer, numeric)
• length

Vectors

The c() function can be used to create vectors of objects by concatenating things together.

x <- c(0.5, 0.6)  ## numeric
x <- c(TRUE, FALSE)  ## logical
x <- c(T, F)  ## logical
x <- c("a", "b", "c")  ## character
x <- 9:29  ## integer
x <- c(1 + (0+0i), 2 + (0+4i))  ## complex

You can also use the vector() function to initialize vectors.

x <- vector("numeric", length = 10)
x
##   0 0 0 0 0 0 0 0 0 0

Matrices

m <- matrix(c(1:6), 2, 3)
attributes(m)
##  "a" "b"

Factors

Factors are used to represent categorical data.

f <- factor(c("yes", "yes", "no", "yes", "no"))
attributes(f)
## $levels ##  "no" "yes" ## ##$class
##  "factor"

Data Frames

• A special type of list.
• Unlike matrices – data frames can store different classes of objects in each column.
• They have column names and row names.
d <- data.frame(x = 1:10, y = letters[1:10])
attributes(d)
## $names ##  "x" "y" ## ##$class
##  "data.frame"
##
## $row.names ##  1 2 3 4 5 6 7 8 9 10 names(d) ##  "x" "y" row.names(d) ##  "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" Names Names are very useful for writing readable code and self-describing objects. x <- 1:3 names(x) ## NULL names(x) <- c("New York", "Seattle", "Los Angeles") x ## New York Seattle Los Angeles ## 1 2 3 names(x) ##  "New York" "Seattle" "Los Angeles" Lists can also have names, which is often very useful. x <- list(Los Angeles = 1, Boston = 2, London = 3) x ##$Los Angeles
##  1
##
## $Boston ##  2 ## ##$London
##  3
names(x)
##  "Los Angeles" "Boston"      "London"

Reading and Writing Data

There are a few principal functions reading data into R.

• readLines, for reading lines of a text file
• source, for reading in R code files (inverse of dump)
• dget, for reading in R code files (inverse of dput)
• load, for reading in saved workspaces

There are analogous functions for writing data to files.

• write.table, for writing tabular data to text files (i.e. CSV) or connections
• writeLines, for writing character data line-by-line to a file or connection
• dump, for dumping a textual representation of multiple R objects
• dput, for outputting a textual representation of an R object
• save, for saving an arbitrary number of R objects in binary format (possibly compressed) to a files

There are many R packages that have been developed to read in all kinds of other datasets (e.g., the readr package).

How to Subset?

There are three operators that can be used to extract subsets of R objects.

• The [ operator always returns an object of the same class as the original. It can be used to select multiple elements of an object

• The [[ operator is used to extract elements of a list or a data frame. It can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame.

• The $operator is used to extract elements of a list or data frame by literal name. Its semantics are similar to that of [[. Subsetting a Vector Vectors are basic objects in R and they can be subsetted using the [ operator. x <- c("a", "b", "c", "c", "d", "a") x ## Extract the first element ##  "a" x ## Extract the second element ##  "b" The [ operator can be used to extract multiple elements of a vector by passing the operator an integer sequence. Here we extract the first four elements of the vector. x[1:4] ##  "a" "b" "c" "c" x[c(1, 3, 4)] ##  "a" "c" "c" x[x > 2] ##  "a" "b" "c" "c" "d" "a" Subsetting a Matrix Matrices can be subsetted in the usual way with (i,j) type indices. x <- matrix(1:6, 2, 3) x ## [,1] [,2] [,3] ## [1,] 1 3 5 ## [2,] 2 4 6 We can access the $$(1,2)$$ or the $$(2,1)$$ element of this matrix using the appropriate indices. x[1, 2] ##  3 x[2, 1] ##  2 Indices can also be missing. This behavior is used to access entire rows or columns of a matrix. x[1, ] ## Extract the first row ##  1 3 5 x[, 2] ## Extract the second column ##  3 4 Subsetting Lists ists in R can be subsetted using all three of the operators mentioned above, and all three are used for different purposes. x <- list(foo = 1:4, bar = 0.6) x ##$foo
##  1 2 3 4
##
## $bar ##  0.6 The [[ operator can be used to extract single elements from a list. Here we extract the first element of the list. x[] ##  1 2 3 4 The [[ operator can also use named indices so that you don't have to remember the exact ordering of every element of the list. You can also use the$ operator to extract elements by name.

x[["bar"]]
##  0.6
x$bar ##  0.6 Subsetting Nested Elements of a List The [[ operator can take an integer sequence if you want to extract a nested element of a list. x <- list(a = list(10, 12, 14), b = c(3.14, 2.81)) ## Get the 3rd element of the 1st element x[[c(1, 3)]] ##  14 ## Same as above x[][] ##  14 ## 1st element of the 2nd element x[[c(2, 1)]] ##  3.14 Extracting Multiple Elements of a List The [ operator can be used to extract multiple elements from a list. For example, if you wanted to extract the first and third elements of a list, you would do the following x <- list(foo = 1:4, bar = 0.6, baz = "hello") x[c(1, 3)] ##$foo
##  1 2 3 4
##
## \$baz
##  "hello"

Note that x[c(1, 3)] is NOT the same as x[[c(1, 3)]].

Remember that the [ operator always returns an object of the same class as the original. Since the original object was a list, the [ operator returns a list. In the above code, we returned a list with two elements (the first and the third).

Removing NA Values

A common task in data analysis is removing missing values (NAs).

x <- c(1, 2, NA, 4, NA, 5)
##  FALSE FALSE  TRUE FALSE  TRUE FALSE
##  1 2 4 5

What if there are multiple R objects and you want to take the subset with no missing values in any of those objects?

##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 5    NA      NA 14.3   56     5   5
## 6    28      NA 14.9   66     5   6
good <- complete.cases(airquality)
##   Ozone Solar.R Wind Temp Month Day
## 1    41     190  7.4   67     5   1
## 2    36     118  8.0   72     5   2
## 3    12     149 12.6   74     5   3
## 4    18     313 11.5   62     5   4
## 7    23     299  8.6   65     5   7
## 8    19      99 13.8   59     5   8

Review of this lecture

• Overview of R
• R nuts and bolts
• Getting data in and out of R
• Subsetting R objects

Read and Write Data in R

You'll be working with swimming_pools.csv; it contains data on swimming pools in Brisbane, Australia (Source: data.gov.au). The file contains the column names in the first row. It uses a comma to separate values within rows.

1. Try read.csv() and read.table() to import "swimming_pools.csv" as a data frame with the name pools.
2. Try write.table(), dput(), and save() functions to write pools to files.
3. Restart R and read your saved data in R.
4. Practice subsetting of a data frame.