Some time ago I wrote about an inequality between conditional entropy and error probability. Specifically, one can show that if \hat{x}(Y) is the maximum a posteriori estimate of a discrete random variable X given the random observation Y, then the probability of making an error is bounded from above by
P_e = \mathbb{P}(X\neq \hat{x}(Y)) \le 1- 2^{H(X|Y)}.
This inequality is not new. I was not aware, however, of any previous references until I met Igal Sason at the International Zurich Seminar earlier this year. He was speaking about a paper he recently published which includes this inequality (eq. (2)), together with many more general results relating error probability with Arimoto-Renyi conditional entropy.
I was hoping to generalize above inequality to the mismatched case. Specifically, let p_{X|Y} be the conditional distribution of X given Y. The MAP estimator is \hat{x}(y) = \arg\max_x p_{X|Y}(x|y). Now suppose that the MAP estimator does not have access to p_{X|Y} but only to a surrogate distribution q_{X|Y}, in which case the estimator becomes \tilde{x}(y) = \arg\max_x q_{X|Y}(x|y). I thought to get a bound similar to the one above that includes this surrogate distribution and that bounds the error probability \tilde{P}_e = \mathbb{P}(X\neq \tilde{x}(Y)). To be more precise, I was hoping to exchange the conditional entropy above by some conditional cross entropy, i.e., to obtain a bound similar to
\tilde{P}_e \le 1- 2^{\mathbb{E}(H^\times(p_{X|Y}(\cdot|Y)\Vert q_{X|Y}(\cdot|Y)))}
possibly with the order of the distributions reversed. Since cross entropy is an upper bound on entropy, the bound will be larger than the bound for P_e. (Interestingly, even if p_{X|Y} and q_{X|Y} are different, we nay have \tilde{P}_e =P_e: It suffices that, for every y, both conditional distributions have their modes at the same values of x.)
Unfortunately, my hope was not justified. We show this in a counterexample, i.e., we show that there exists distributions such that the inequality on \tilde{P}_e is violated (for any choice of order of distributions). For the sake of simplicity, we now look at guessing rather than estimation, i.e., the conditional distributions satisfy p_{X|Y}=p_X and q_{X|Y}=q_{X}. Now suppose that X is binary, i.e., it takes values 0 and 1. Furthermore, let p_X(0)=q_X(1)=\epsilon\ll 0.5. We have that \tilde{x}=\arg\max_x q_X(x) = 0, thus
\tilde{P}_e=\mathbb{P}(X\neq 0)=1-\epsilon.
Since p_X and q_X are permutations of each other, we have that H(q_X)=H(p_X), that D(p_X\Vert q_X) = D(q_X\Vert p_X), and that thus
H^\times(p_X\Vert q_X) = H^\times(q_X\Vert p_X) = -\epsilon\log(1-\epsilon)-(1-\epsilon)\log\epsilon = -\log \epsilon - \epsilon\log\left(\frac{1-\epsilon}{\epsilon}\right).
With this we get
2^{-H^\times(p_X\Vert q_X)} = \epsilon\cdot 2^{\epsilon\log\left(\frac{1-\epsilon}{\epsilon}\right)} > \epsilon
and thus the bound \tilde{P}_e \le 1- 2^{-H^\times(p_X\Vert q_X)} cannot hold.