# Conditional Entropy

Conditional Entropy is a measure of the average uncertainty about what the value that one random variable takes before observing it given the value of another random variable. The resulting quantity is averaged over all values of the second random variable. Conditional Entropy, like entropy, is measured in bits.

An entropy of *H* bits means that in order to provide
information about the value of the as yet unobserved random variable, it will require, on the average,
an *H* bit message specifies a choice of 1 out of
*2 ^{H}* possibilities.

One way to explain the meaning of an *H* bit conditional entropy is by the following game played between
person A and person B. Person A and B sample a value of random variable *X* according to the
probability *P(X)*. They observe the sampled value *u*.
Then conditioned on the value *u*, person A samples a value *v* of random variable *Y*. This sampling
takes place with respect to the conditional probability P(Y | X). Person B does not observe the value *u*.
Taking the weighted average, specified by *P(Y | X )*, over all possible values that random variable
*Y* can take, the average uncertainty that person B has with respect to the value *u* is
the conditional entropy of *Y* given *X*. This is denoted by *H(Y | X)*.

Suppose the discrete random variable *X* takes possible values
*{x _{1},...,x_{M}}* and
the discrete random variable

*Y*takes possible values

*{y*. Then the conditional entropy H(Y | X) is defined by

_{1},...,y_{N}}* H(Y | X) = -E _{X}[E_{Y}[log_2 P(Y | X)]}= - Σ^{M}_{m=1}Σ^{N}_{n=1} P(y_{n} | x_{m})log_2 P(y_{n} | x_{m})]P(x_{m}}) *.

If person B were to use his knowledge of the probability function of random
variable *X* in the most effective way possible, it would take person B, on the average,
*2 ^{H(Y | X)}* guesses to correctly
guess the value

*v*that person A had sampled. Or saying this another way, it would take on the average a message of length

*H(Y | X)*bits for person A to tell person B the value of

*Y*that he had sampled.