# Set seed for reproducibility.
set.seed(1)
# Set the number of iterations.
<- 10000
n
# Define an accumulator list that will contain
# each iteration's data point.
<- c()
accum
# Let's gooooooo
for(i in 1:n){
<- rnorm(1, mean = 0, sd = 1.5)
a <- runif(1, min = 0, max = 5)
b <- rnorm(1, mean = a, sd = b)
data <- c(accum, data)
accum
## Comment in to print out each iteration's likelihood
## (best make n smaller first!)
# print(paste0('Sampled ', round(data, 2), ' from Normal(', round(a, 2), ', ', round(b, 2), ')'))
}
# Make a density plot.
tibble(data = accum) %>%
ggplot(aes(x = data)) +
geom_density(fill = 'grey', alpha = .5)
Bayesian models as generative models
Bayesian models can be thought of as models of the generative process that produced the data.
In a way, what we’ve been doing so far is playing the model “backward”: we’re using observed data to try to figure out plausible values of the parameters inside the model.
But we can also play the model “forward”, and use the parameter values inside the model to generate data that the model thinks is plausible.
We’ll see tomorrow why this can be a useful thing to do!
Here, we’ll see how it works.
Imagine our model is:
\[ \begin{aligned} \text{data} &\sim Normal(\mu, \sigma) \\ \mu &\sim Normal(0, 1.5)\\ \sigma &\sim Uniform(0, 5)\\ \end{aligned} \]
The following procedure will let us use this model to generate one data point.
- To generate data from the likelihood \(Normal(\mu, \sigma)\), we need to define the currently-unknown parameters \(\mu\) and \(\sigma\).
- For \(\mu\), sample one value \(a\) from \(Normal(0, 1.5)\).
- For \(\sigma\), sample one value \(b\) from \(Uniform(0, 5)\).
- Combine them to define the likelihood as \(Normal(a, b)\).
- Sample one value from this distribution: this is one observation.
And to generate a whole dataset of size \(n\), we would just repeat this procedure \(n\) times. The distribution of data resulting from this procedure is sometimes called a “predictive distribution”.
Here’s some R code that implements this procedure and plots the resulting predictive distribution.
So, the model
\[ \begin{aligned} \text{data} &\sim Normal(\mu, \sigma) \\ \mu &\sim Normal(0, 1.5)\\ \sigma &\sim Uniform(0, 5)\\ \end{aligned} \]
considers data in the range of about [–10, 10] to be plausible outcomes.
Whether or not we agree will depend on the data and our real-world knowledge about it!
Tomorrow we’ll see how generating predictive distributions fits into the Bayesian modelling workflow.