Recently, my colleage Georg came to my office to show me some exciting result he found in a book: A bound on mutual information (MI) based on the variational distance (VD) between the joint distribution and the product of marginal distributions. In other words, if $X$ and $Y$ are random variables with probability mass functions (PMFs) $p_X$ and $p_Y$, and a joint PMF $p_{XY}$, one can write

$$ I(X;Y) \le C \Vert p_{XY}-p_Xp_Y\Vert, $$

where $C$ only depends on the alphabet sizes of $X$ and $Y$ and where

$$\Vert p_{XY}-p_Xp_Y\Vert = \sum_{x,y} |p_{XY}(x,y) - p_X(x)p_Y(y)| .$$

This of course implies that if the variational distance goes to zero, then so does the mutual information. This result is exciting since in general it is not possible to bound Kullback-Leibler divergence (KLD; mutual information is a special case of a Kullback-Leibler divergence) via variational distance, at least not by a bound that does not depend on the distributions. The reverse is always possible: The Kullback-Leibler divergence bounds variational distance via the well-known Pinsker inequality.

While I was surprised by the existence of such a bound, I was not surprised by the weaker result that if VD goes to zero then also MI does. At least if $X$ and $Y$ have a finite alphabet, then the entropies $H(X)$, $H(Y)$, and $H(X,Y)$ are continuous in the PMFs, hence convergence of the PMFs implies convergence of the entropies, and hence, of the MI.

Very interestingly, however, it turned out that KLD is NOT continuous, but only lower semi-continuous (LSC), even if the supports of the distributions are finite:

A sequence of (probability) measures $\mu_n$ is said to converge setwise or strongly to another probability measure $\mu$ iff for every measurable set $A$ one has $\mu_n(A)\to \mu(A)$. I think if $\mu=n$ are discrete probability measures with finite alphabets, i.e., can be described by PMFs $p_n$ with finite support, then setwise convergence is equivalent to the fact that, for every point $x$ in the support, $p_n(x)\to p(x)$. I furthermore think that for such finite PMFs setwise convergence is equivalent to the fact that $\Vert p_n-p\Vert\to 0$. (I should check that.) Now let us take two sequences $p_n$ and $q_n$ of PMFs. Let both have a finite support. One would guess that if $p_n \to p$ and $q_n\to p$, then the KLD

$$ D(p_n\Vert q_n) = \sum_x p_n(x)\log\frac{p_n(x)}{q_n(x)} \to 0.$$

But that's not true. While entropies are continuous w.r.t. setwise convergence, KLD is only LSC. It follows that the right-hand side of above equation can be strictly positive, even if both $p_n$ and $q_n$ converge to $p$. Here is an example that you can easily work out yourself: $p_n(0) = 1-1/n$, $p_n(1)=1-p_n(0)$ and $q_n(0)=1-e^{-n^2}$, $q_n(1)=1-q_n(0)$.