Monte Carlo simulations

Missisippi

Consider the 11 letters in the word Mississippi.

library(tidyverse)
(mi_letters <- str_split("Mississippi", "")[[1]])

##  [1] "M" "i" "s" "s" "i" "s" "s" "i" "p" "p" "i"

What is the probability that no adjacent letters are the same, in a random reordering of the letters?

Taken from Tijms, Henk. Probability: A Lively Introduction. Cambridge University Press, 2017.

Your Turn:

How would you do a simulation to estimate this probability?
How is this simulation similar to the Russian election example?

General structure of Monte Carlo Simulations

Simulate many examples
Calculate something on each example
Explore the many calculated things

Mississippi

Generate random reorderings of the letters in Mississippi
For each reordering, ask: are all adjacent letters different?
Find the proportion of “Yes”s

Russian Election Fraud

Generate turnout vectors based on no fraud (Binomial counts based on number on voter list and observed turnout, transformed to turnout percentages)
For each turnout vector, count the number of integers.
Find the mean and quantiles of the counts

One pretty flexible approach

First, figure out 1. and 2. on a single example.

Then scale up:

Simulate many examples rerun()
Calculate something on each example map(), map_dbl()
Explore the many calculated things # depends on goal

rerun() and map() are in the purrr package.

One Mississippi

Step 1: How do we get a random reordering of the letters in Mississippi?

sample(mi_letters)

##  [1] "p" "i" "M" "s" "i" "p" "s" "s" "i" "s" "i"

`sample(x, size, replace = FALSE)`

sample takes a sample of the specified size from the elements of x using either with or without replacement.

– ?sample

(one_reordering <- sample(mi_letters))

##  [1] "s" "i" "p" "i" "s" "i" "i" "M" "s" "p" "s"

Two Mississippi

Step 2: Given a reordering, does it have letters next to each other that are the same? We want TRUE when no adjacent letters match.

The rle() function will be very useful.

What does it do?

one_reordering

##  [1] "s" "i" "p" "i" "s" "i" "i" "M" "s" "p" "s"

rle(one_reordering)

## Run Length Encoding
##   lengths: int [1:10] 1 1 1 1 1 2 1 1 1 1
##   values : chr [1:10] "s" "i" "p" "i" "s" "i" "M" "s" "p" "s"

How can we use it?

Look for any lengths greater than 2.

How do you get out the lengths? Some strategies:

rle(one_reordering)$lengths # guess

?rle # read the Value section 

rel_one <- rle(one_reordering) # save  
rel_one$lengths # then rely on RStudio completion

# use str()
str(one_reordering)

Now find out if any are greater than 1. My approach:

all(rle(one_reordering)$lengths == 1)

## [1] FALSE

Some other approaches

length(rle(one_reordering)$lengths) == length(one_reordering)
!(mean(rle(one_reordering)$lengths) > 1)
max(rle(one_reordering)$lengths) == 1

So far

Back at 9:01am

one_reordering <- sample(mi_letters) # One example

# Fill in this bit in class
all(rle(one_reordering)$lengths == 1)

## [1] FALSE

Scaling up - many examples with `rerun()`

The first argument is the number of times you’d like to repeat the evaluation of the second argument.

many_reorderings <- rerun(.n = 1000, sample(mi_letters))

Your turn Take a look at the object many_reorderings. What kind of object is it? Generate 1000 reorderings instead.

Scaling up - for each example, do something with `map()`

map() solves iteration problems, like: for each ___ do ___.

First argument is the object you want to iterate over, many_reorderings
Second argument describes what you want to do. One way, specify a formula (starts with ~) using . as a placeholder for a single example: ~ any(rle(.)$lengths > 1

map(many_reorderings,
  ~ all(rle(.)$lengths == 1))

map() returns a list

Use one of its friends instead: map_dbl(), map_lgl(), map_int(), map_chr() to get an atomic vector.

Your turn: Swap out map() for the appropriate function

map_lgl(many_reorderings, ~ all(rle(.x)$lengths == 1))

All together

num_sims <- 1000
many_reorderings <- rerun(num_sims, sample(mi_letters))
adj_letters_same <- map_lgl(many_reorderings, 
  ~ all(rle(.x)$lengths == 1))

# Explore 
adj_letters_same %>% table()

## .
## FALSE  TRUE 
##   940    60

adj_letters_same %>% mean()

## [1] 0.06

Your turn

A random sequence of H’s and T’s is generated by tossing a fair coin $n = 20$ times. What’s the expected length of the longest run of consecutive heads or tails?

Taken from Tijms, Henk. Probability: A Lively Introduction. Cambridge University Press, 2017.

Iteration

Common patterns:

Do this thing m times, rerun()
Do this thing to each element of x, map()
Do this thing until some condition is satisfied while

You can do 1. and 2. with for loops, but the purrr functions abstract away the details and let you focus on “this thing”.

(You also don’t run the risk of writing an inefficient for loop)

But remember R loves working with vectors. Don’t iterate over the elements of a vector, when a function exists to handle the whole vector.

Questions

How many simulations do we need? I.e. what should num_sims be?
What do we do when the simulation part is complicated? What do we do when the calculation part is complicated?

Reading for next week

Functions in R4DS 19.1 through and including 19.3

(If you are on a roll keep reading…)

Be prepared to answer:

What are the three key steps to writing a function?
Do exercise 19.2.1 #3

Simulation

Charlotte Wickham

Sep 27th 2018

Good code

Pick up from Thursday

Good code

Use existing functions when you can

Monte Carlo simulations

Missisippi

General structure of Monte Carlo Simulations

Mississippi

Russian Election Fraud

One pretty flexible approach

One Mississippi

`sample(x, size, replace = FALSE)`

Two Mississippi

So far

Scaling up - many examples with `rerun()`

Scaling up - for each example, do something with `map()`

map() returns a list

All together

Your turn

Iteration

Questions

Reading for next week

Simulation

Charlotte Wickham

Sep 27th 2018

Good code

Pick up from Thursday

Good code

Use existing functions when you can

Monte Carlo simulations

Missisippi

General structure of Monte Carlo Simulations

Mississippi

Russian Election Fraud

One pretty flexible approach

One Mississippi

sample(x, size, replace = FALSE)

Two Mississippi

So far

Scaling up - many examples with rerun()

Scaling up - for each example, do something with map()

map() returns a list

All together

Your turn

Iteration

Questions

Reading for next week

`sample(x, size, replace = FALSE)`

Scaling up - many examples with `rerun()`

Scaling up - for each example, do something with `map()`