Bayesian Inference and Graphical ModelsBayesian networks
Consider the following probabilistic narrative about an individual's health outcome.
(i) A person becomes a smoker with probability 18%.
(ii) They exercise regularly with probability 40% if they are a non-smoker or with probability 25% if they are a smoker.
(iii) Independently of the above, with probability 15% they have a gene which predisposes them to lung cancer.
(iv) Their conditional probability of contracting lung cancer, given the indicator random variables , , and of the events described in (a), (b), and (c) respectively, is given by .
We can visualize this story with a diagram in which each event of the four indicator random variables is a node, and arrows are drawn to indicate dependencies as specified in the story.
Exercise
Is this the only such diagram consistent with the specified probability measure on the four random variables?
Solution. No, there's nothing about smoking and exercising that requires that we sample the smoking indicator and then the exercising indicator from its conditional distribution giving smoking. We could have done it the other way around.
The diagram tells us that having the gene is independent of smoking and exercising (since those nodes have no common ancestors in the diagram). If we included another descendant of the "smokes" node, like "develops premature wrinkles", then that would be communicating that premature wrinkles and lung cancel—while not independent—are conditionally independent given the smoking random variable.
Gaussian mixture models
Consider a distribution on whose density function can be written as a linear combination of multivariate Gaussian densities:
using Plots, Distributions f(x,y) = 0.55pdf(MvNormal([2.2, -0.4], [0.4 0.2; 0.2 0.4]), [x,y]) + 0.45pdf(MvNormal([0.1, -4.3], [1.5 -0.1; -0.1 0.5]), [x,y]) p1 = heatmap(-6:0.05:6, -6:0.05:6, f) p2 = surface(-6:0.05:6, -6:0.05:6, f) plot(p1, p2, size = (650, 300))
Such a distribution is called a Gaussian mixture model. We can sample from a GMM of the form by simulating a random variable which takes values in with probability for each element , and then drawing from a multivariate normal distribution with mean and covariance (where and are the mean and covariance of ).
Exercise
Explain how you might estimate the means, covariances, and values based on the observations shown. Feel free to use your own visual intuition as part of the algorithm.
Solution. We identify the two clusters visually, and we associate each point with one of the clusters or the other. Then we estimate means and covariances of the sample means and covariances for the two clusters, and we estimate the 's as the proportions of points belonging to each cluster.
In the next section (on Expectation-Maximization), we'll talk about how to do this in a way that doesn't require a human to hand-pick the value for each point.
Solution. It looks like the sequence of 's was most likely this path (which switches 8 times):
Furthermore, it appears that is probably pretty small, since the differences between the 's and 's are small.
In the next section we'll talk about a more principled method for inferring model parameters and the conditional distribution of the 's given the observed 's.
We close this section with an example showing how to use Bayes nets to calculate likelihood values.
Example
Find the likelihood of the following data for the hidden Markov model described above, with , , and . Suppose is uniformly distributed on .
Solution. The probability of observing is . The probability of observing and is . The probability of observing all three of the given values is .
The conditional probability of seeing an value close to 0.2 given is proportional to value of the standard Gaussian density at , which is . Likewise, the likelihood gets a factor of for and a factor of for , given the values for and under consideration. All together, the likelihood is
More generally, we can compute the likelihood for any complete set of values in a Bayes net by traversing the diagram starting from a root node (a node with no incoming arrows) and including a factor for each conditional probability mass or density value encountered at each node.