Conditional Entropy Per Letter

Conditional Entropy per Letter is a measure of the average uncertainty about what letter will be the current letter given the knowledge of the N previous letters.

The English language has 26 characters, not distinguishing between large and small and omitting numerics and punctuation marks and special symbols. The space character used to demarcate words is a 27th character. If each of the 27 characters were to occur with equal probability, the letter entropy would be about 4.76 bits per letter. However, the 27 characters do not have equal probability. The entropy per letter is actually about 4.03 bits per letter.

Now suppose that we have knowledge of the previous letter. What is the average uncertainty for the current letter? Here the question we are asking is what is the entropy per letter conditioned on the knowledge of the previous letter. This is a conditional entropy question. For the English language it is about 2.8 bits. And if we are given knowledge of the previous 2 characters, the conditional entropy per letter is about 1.3 bits. This means that a person who had knowledge of the probability of the current letter given the previous 2 letters would take on the average about 2.5 guesses (log2(2.5)=1.3) to correctly guess the current letter given the previous 2 letters. Or equivalently, given the previous two letters, it would take about 1.3 bit message, on the average, to communicate what the current letter is.