I've just noticed that a fellow blogger is having a bit of trouble getting her head round Bayesian analysis. Actually, it seems she's got the hang of it, but it occurs to me that some background on why Bayesian analysis works might be useful. Being a maths geek and all-round nosey person, I am of course going to get involved with attempting to explain the situation.
Bayesian analysis consists, quite simply, of one method to determine how probabilities behave when our model is accurate plus another method that uses this to refine our model. I'll briefly cover both.
The way things are
The result I'm going to discuss here is called Bayes' Theorem. It's most easily explained using a visual technique called Venn diagrams, but that involves lots of irritating drawing and uploading so I'll skip it for the moment. Instead I'll briefly run you through the maths.
The first thing you need to know is conditional probability. Conditional probability is a way of figuring out how likely it is that one thing happens given that another thing has already happened. The important result here is: the probability that A happens given B happens = the probability that A happens and B happens divided by the probability that B happens. I'll write this as P(A|B) = P(AnB)/P(B). Stare at this for a while and it'll hopefully start to make sense.
P(A|B) = P(AnB)/P(B)
This allows us to produce an interesting little equality - Bayes' Theorem. Since P(A|B)P(B) = P(AnB) = P(BnA) = P(B|A)P(A), we can write P(A|B) = P(B|A)P(A)/P(B). This is important - memorise it.
P(A|B) = P(B|A)P(A)/P(B)
Let's consider an example. Say we're picking black and white balls out of a pot, with replacement. We know there are 10 balls in the pot; what we don't know is the number that are black. Our task is to figure out the number of black balls. I'll need to cover some other topics, but I'd like to point out an interesting phenomenon here: if we guess correctly, the data will match our guess. No amount of data will change our mind. The posterior probability P(# black | data) will equal the prior probability P(# black).
If our distribution is accurate, P(distribution|data) = P(distribution)
The way things should be
I'm going to leave our concrete example for the moment and briefly cover an important numerical approximation technique. The actual proof thereof takes advanced mathematical skills, but the basic idea is simple. It is this: if an equation can be formulated as x = f(x), then setting xn+1 = f(xn) will result in xn converging on a solution of the equation as n increases.
If x=f(x) then setting xn+1 = f(xn) will cause xn to converge to x
Let's take an example. The other day my dad asked me to work out a fairly complicated problem in compound interest for him. It turned out that the answer I was after was a solution to the equation 46.40*x300 - 23446.40*x + 23400 = 0. Remembering the above mathematical technique, I rewrote the equation as x = ((23446.40*x + 23400)/46.40)1/300. I then plugged in an initial value of 1.05 and iterated a couple of times. The algorithm very quickly converged on a correct solution of 1.0233653112365164.
How does this apply to our example? Well, if you look at the equations we derived above, you'll notice something: if our distribution is accurate, we have the equation P(distribution) = P(distribution|data) = P(data|distribution)*P(distribution)/P(data). This is precisely the sort of equation that this algorithm is designed to find solutions for.
Let's get back to the balls. We have eleven possible values for the number of black balls - anything from 0 to 10 - which gives rise to eleven possible binomial distributions that could be giving rise to the data. To start with, let's assume that these distributions are equally likely - P(# black = k) = 1/11 for all possible k. This distribution of distributions will act as our prior probabilities.
Now let's go out and collect some data. We pick 40 balls and find that 21 are black and 19 are white. What are the associated posterior probabilities?
Well, first we need to work out two things: the probability of getting that data given a value k for the number of black balls, and the overall probability of getting that data. The first is fairly standard mathematics, so I won't go into it here - just google on "Binomial Distribution" if you're confused. The answer turns out to be 40!/(21!*19!)*(k/10)21(1 - k/10)19. So, for example, with k=5 this probability is 0.119.
Now we can work out the overall probability of getting that data - note that P(data) = sum over distributions of P(data n distribution) = sum( P(data|distribution)*P(distribution) ). In the specific example, this turns out to be 1/11*(0 + 1.77*10-11 + 3.97*10-6 + 0.00157 + 0.0352 + 0.119 + 0.0792 + 0.00852 + 6.35*10-5 + 1.44*10-9 + 0) = 0.244/11. Cool, huh?
OK, we now have all the parts we need to generate posterior probabilities. Here goes nothing...
P(# black = k | 21 black, 19 white) = P(21 black, 19 white | # black = k)*P(# black = k)/P(21 black, 19 white)
= 40!/(21!*19!)*(k/10)21(1 - k/10)19 * 1/11 / (0.244/11)
= 40!/(21!*19!)*(k/10)21(1 - k/10)19 / 0.244
Which gives us the following posterior probabilities:
P(# black = 0) = 0.0
P(# black = 1) = 7.27105875099e-11
P(# black = 2) = 1.62678304084e-05
P(# black = 3) = 0.0064179907059
P(# black = 4) = 0.144252566798
P(# black = 5) = 0.489542214277
P(# black = 6) = 0.324568275296
P(# black = 7) = 0.0349423938432
P(# black = 8) = 0.000260285286534
P(# black = 9) = 5.8895575883e-09
P(# black = 10) = 0.0
If you sum these up you'll find they add up to 1, as necessary. You'll note that our distribution has already started converging on a value for k of either 5 or 6, which is more or less what we'd expect. With sufficient data, we can expect that one distribution will emerge as the clear winner - adding more data is directly equivalent to doing another pass of the approximation algorithm, and could even be formulated in the same way if we were complete masochists.
Hope that helps everyone else as much as it just helped me.