Conditional Entropy

Conditional Entropy is a measure of the average uncertainty about what the value that one random variable takes before observing it given the value of another random variable. The resulting quantity is averaged over all values of the second random variable. Conditional Entropy, like entropy, is measured in bits.

An entropy of H bits means that in order to provide information about the value of the as yet unobserved random variable, it will require, on the average, an H bit message. For example, an H bit message specifies a choice of 1 out of 2H possibilities.

One way to explain the meaning of an H bit conditional entropy is by the following game played between person A and person B. Person A and B sample a value of random variable X according to the probability P(X). They observe the sampled value u. Then conditioned on the value u, person A samples a value v of random variable Y. This sampling takes place with respect to the conditional probability P(Y | X). Person B does not observe the value u. Taking the weighted average, specified by P(Y | X ), over all possible values that random variable Y can take, the average uncertainty that person B has with respect to the value u is the conditional entropy of Y given X. This is denoted by H(Y | X).

Suppose the discrete random variable X takes possible values {x1,...,xM} and the discrete random variable Y takes possible values {y1,...,yN}. Then the conditional entropy H(Y | X) is defined by

H(Y | X) = -EX[EY[log_2 P(Y | X)]}= - ΣMm=1ΣNn=1 P(yn | xm)log_2 P(yn | xm)]P(xm}) .

If person B were to use his knowledge of the probability function of random variable X in the most effective way possible, it would take person B, on the average, 2H(Y | X) guesses to correctly guess the value v that person A had sampled. Or saying this another way, it would take on the average a message of length H(Y | X) bits for person A to tell person B the value of Y that he had sampled.