sample(x, size, replace = FALSE)
rerun()
map()
In rough order of importance (for us as statisticians/data scientists):
E.g. rbinom()
rather than implementing the inverse transform method
See:
?Distributions
for those available in the stats
package.
See https://cran.r-project.org/web/views/Distributions.html for packages that implement other distributions.
Inverse transform (Tue) and Rejection method (Lab) are options when nothing exists for the distribution you need.
Consider the 11 letters in the word Mississippi.
library(tidyverse)
(mi_letters <- str_split("Mississippi", "")[[1]])
## [1] "M" "i" "s" "s" "i" "s" "s" "i" "p" "p" "i"
What is the probability that no adjacent letters are the same, in a random reordering of the letters?
Taken from Tijms, Henk. Probability: A Lively Introduction. Cambridge University Press, 2017.
Your Turn:
First, figure out 1. and 2. on a single example.
Then scale up:
rerun()
map()
, map_dbl()
# depends on goal
rerun()
and map()
are in the purrr package.
Step 1: How do we get a random reordering of the letters in Mississippi?
sample(mi_letters)
## [1] "p" "i" "M" "s" "i" "p" "s" "s" "i" "s" "i"
sample(x, size, replace = FALSE)
sample takes a sample of the specified size from the elements of x using either with or without replacement.
– ?sample
(one_reordering <- sample(mi_letters))
## [1] "s" "i" "p" "i" "s" "i" "i" "M" "s" "p" "s"
Step 2: Given a reordering, does it have letters next to each other that are the same? We want TRUE
when no adjacent letters match.
The rle()
function will be very useful.
What does it do?
one_reordering
## [1] "s" "i" "p" "i" "s" "i" "i" "M" "s" "p" "s"
rle(one_reordering)
## Run Length Encoding
## lengths: int [1:10] 1 1 1 1 1 2 1 1 1 1
## values : chr [1:10] "s" "i" "p" "i" "s" "i" "M" "s" "p" "s"
How can we use it?
Look for any lengths greater than 2.
How do you get out the lengths? Some strategies:
rle(one_reordering)$lengths # guess
?rle # read the Value section
rel_one <- rle(one_reordering) # save
rel_one$lengths # then rely on RStudio completion
# use str()
str(one_reordering)
Now find out if any are greater than 1. My approach:
all(rle(one_reordering)$lengths == 1)
## [1] FALSE
Some other approaches
length(rle(one_reordering)$lengths) == length(one_reordering)
!(mean(rle(one_reordering)$lengths) > 1)
max(rle(one_reordering)$lengths) == 1
Back at 9:01am
one_reordering <- sample(mi_letters) # One example
# Fill in this bit in class
all(rle(one_reordering)$lengths == 1)
## [1] FALSE
rerun()
The first argument is the number of times you’d like to repeat the evaluation of the second argument.
many_reorderings <- rerun(.n = 1000, sample(mi_letters))
Your turn Take a look at the object many_reorderings
. What kind of object is it? Generate 1000 reorderings instead.
map()
map()
solves iteration problems, like: for each ___ do ___.
First argument is the object you want to iterate over, many_reorderings
Second argument describes what you want to do. One way, specify a formula (starts with ~
) using .
as a placeholder for a single example: ~ any(rle(.)$lengths > 1
map(many_reorderings,
~ all(rle(.)$lengths == 1))
Use one of its friends instead: map_dbl()
, map_lgl()
, map_int()
, map_chr()
to get an atomic vector.
Your turn: Swap out map()
for the appropriate function
map_lgl(many_reorderings, ~ all(rle(.x)$lengths == 1))
num_sims <- 1000
many_reorderings <- rerun(num_sims, sample(mi_letters))
adj_letters_same <- map_lgl(many_reorderings,
~ all(rle(.x)$lengths == 1))
# Explore
adj_letters_same %>% table()
## .
## FALSE TRUE
## 940 60
adj_letters_same %>% mean()
## [1] 0.06
A random sequence of H’s and T’s is generated by tossing a fair coin \(n = 20\) times. What’s the expected length of the longest run of consecutive heads or tails?
Taken from Tijms, Henk. Probability: A Lively Introduction. Cambridge University Press, 2017.
Common patterns:
m
times, rerun()
x
, map()
while
You can do 1. and 2. with for
loops, but the purrr functions abstract away the details and let you focus on “this thing”.
(You also don’t run the risk of writing an inefficient for
loop)
But remember R loves working with vectors. Don’t iterate over the elements of a vector, when a function exists to handle the whole vector.
num_sims
be?Functions in R4DS 19.1 through and including 19.3
(If you are on a roll keep reading…)
Be prepared to answer: