So far we have been thinking of the probability for getting a result
#tex2html_wrap_inline552# if we know that the mean value should be #tex2html_wrap_inline554#. Now suppose we
make a measurement and get #tex2html_wrap_inline556# counts, but we don't know anything
about #tex2html_wrap_inline558#, except that it must be nonnegative, of course. We may
turn the question around and ask what is the most likely value for
#tex2html_wrap_inline560#, given the result of our measurement. To make this
turned-around idea more concrete, we use the concept of conditional
probability. We say that the Poisson distribution #tex2html_wrap_inline562# tells
us the probability that we get #tex2html_wrap_inline564#, on the condition that the mean
value is #tex2html_wrap_inline566#. The notation #tex2html_wrap_inline568# denotes the
probability for getting #tex2html_wrap_inline570#, given that #tex2html_wrap_inline572# occurs or #tex2html_wrap_inline574# is true.
Thus we could write
Now the reverse question is, ``What is the probability that the mean
value is #tex2html_wrap_inline576#, given that we just made a measurement and got
#tex2html_wrap_inline578#?''. This probability would be denoted #tex2html_wrap_inline580#. Now a
trivial but important theorem due to Bayes states that
where #tex2html_wrap_inline582# is the <#206#>a priori<#206#> probability for #tex2html_wrap_inline584# to occur,
regardless of whether the event #tex2html_wrap_inline586# occurs, and #tex2html_wrap_inline588# is the <#207#>a
priori<#207#> probability for #tex2html_wrap_inline590# to occur, regardless of whether the event
#tex2html_wrap_inline592# occurs. From this theorem we conclude that
So we need to know #tex2html_wrap_inline594# and #tex2html_wrap_inline596# to make progress. The first
is the <#210#>a priori<#210#> probability for getting a particular value for
#tex2html_wrap_inline598#. If we don't know anything about #tex2html_wrap_inline600#, except that it is
nonnegative, then we must say that any nonnegative value whatsoever is
equally probable. Thus without benefit of knowing the outcome of the
measurement, we say #tex2html_wrap_inline602# is constant, independent of #tex2html_wrap_inline604#
for nonnegative #tex2html_wrap_inline606#, and it is zero for negative #tex2html_wrap_inline608#. So
the rhs of this equation reduces simply to
where the normalization factor #math83##tex2html_wrap_inline610# can be determined
by requiring that the total probability for having any #tex2html_wrap_inline612# is 1.
In fact it turns out that #tex2html_wrap_inline614#, so
This distribution is called the likelihood function for the parameter
#tex2html_wrap_inline616#. Notice that we are now thinking of the rhs as a continuous
function of #tex2html_wrap_inline618# with fixed #tex2html_wrap_inline620#. This result is very remarkable,
since a single measurement is giving us the <#221#>whole<#221#> probability
distribution! Recall that if we were to measure the length of a table
top, even if we started by assuming we were going to get a Gaussian
distribution, a single measurement would allow us only to guess #tex2html_wrap_inline622# and would tell us nothing about #tex2html_wrap_inline624#. To get #tex2html_wrap_inline626# takes at
least two measurements, and even then we would be putting ourselves at
the mercy of the gods of statistics for taking a chance with only two
measurements. If we weren't so rash as to assume a Gaussian, we would
have to make many measurements of the length of the table top to get
the probability distribution in the measured length.
We now ask, what is the most probable value of #tex2html_wrap_inline628#, given that we
just found #tex2html_wrap_inline630#? This is the value with maximum likelihood. If we
examine the probability distribution, we see that it peaks at #tex2html_wrap_inline632#,
just as we might have expected. We may then ask, what is the error in
the determination of this value. This is a tricky question, because
the Poisson distribution is not shaped like a Gaussian distribution.
However, for large #tex2html_wrap_inline634# it looks more and more like a Gaussian.
Expanding the log of the Poisson distribution for large #tex2html_wrap_inline636# and fixed
#tex2html_wrap_inline638# gives
so for large #tex2html_wrap_inline640# the error is
To summarize, a single measurement yields the entire probability distribution.
For large enough #tex2html_wrap_inline642# we can say that
To see how Bayesian statistics works, suppose we repeated the
experiment and got a new value #tex2html_wrap_inline644#. What is the probability
distribution for #tex2html_wrap_inline646# in light of the new result? Now things have
changed, since the <#234#>a priori<#234#> probability for #tex2html_wrap_inline648# is no
longer constant because we already made one measurement and got #tex2html_wrap_inline650#.
Instead we have
so
Notice that the likelihood function is now the product of the
individual likelihood functions. A more systematic notation would
write this function as #math90##tex2html_wrap_inline652#, i.e. the
probability for #tex2html_wrap_inline654# having a particular value, given that we made
two measurements and found #tex2html_wrap_inline656# and #tex2html_wrap_inline658#. The normalization
factor #tex2html_wrap_inline660# is obtained by requiring the total probability to be 1.
The most likely value of #tex2html_wrap_inline662# is easily shown to be just the
average
as we should have expected.
The Bayesian approach insists that we fold together all of our
knowledge about a parameter in constructing its likelihood function.
Thus a generalization of these results would state that the likelihood
function for the parameter set #tex2html_wrap_inline664#, given the independently measured
results #tex2html_wrap_inline666#, #tex2html_wrap_inline668#, #tex2html_wrap_inline670#, etc. is just
where #tex2html_wrap_inline672# is a normalization factor. Again, this is just the product
of the separate likelihood functions. The result is completely
general and applies to any probability distribution, not just a
Poisson distribution. We will use this result in discussing
#tex2html_wrap_inline674# fits to data as a maximum likelihood search.
#./chap_statistics_intro.ltx#