4
$\begingroup$

I am looking for concrete, preferably elementary, examples of pairs of probability measures $(\mu_n,\nu_n)$ on a common metric space (e.g. $\mathbb{R}^d$) that explicitly demonstrate the non-equivalence between the Wasserstein distances $(W_1, W_2)$ and the Jensen–Shannon divergence ($\mathrm{JS}$).

I am interested in examples exhibiting at least one of the following behaviors:

  1. $\mathrm{JS}(\mu_n,\nu_n)\to 0 $ while $W_p(\mu_n,\nu_n)\not\to 0$ for $p\in \{1,2\}$ .

  2. $W_p(\mu_n,\nu_n)\to 0$ while $\mathrm{JS}(\mu_n,\nu_n)\not\to 0$.

Ideally, the examples would illustrate which structural properties of the distances (transport vs. information-theoretic) are responsible for the discrepancy (e.g. mass splitting, support separation, heavy tails).

I know that Pinsker’s inequality bounds total variation in terms of KL divergence, and that Jensen–Shannon controls total variation as well. By contrast, I do not believe that in general, Wasserstein distances have any equivalence with the KL divergence beyond very indirect dimension-dependent bounds. I am therefore looking specifically for explicit and hopefully elementary counterexamples rather than abstract non-equivalence results.

New contributor
Sophia is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
$\endgroup$

1 Answer 1

4
$\begingroup$

$\newcommand\JS{\text{JS}}\newcommand\de{\delta}$The Wasserstein distances are incomparable with the Jensen–Shannon divergence, because the former are proportional to the underlying metric of the metric space, whereas the latter does not depend on the metric at all.

Specifically, for real $t>0$ and $r\in(0,1/2)$, let $$\mu_{r,t}:=\mu_t:=\tfrac12\,\de_0+\tfrac12\,\de_t,$$ $$\nu_{r,t}:=(\tfrac12-r)\,\de_0+(\tfrac12+r)\,\de_t,$$ where $\de_a$ is the Dirac probability measure supported on $\{a\}$. Then $$\JS(\mu_{r,t},\nu_{r,t})=\JS(\mu_{r,1},\nu_{r,1})\sim r^2$$ as $r\downarrow0$, whereas for all real $p>0$ and the standard metric for $\Bbb R$, $$W_p(\mu_{r,t},\nu_{r,t})=tW_p(\mu_{r,1},\nu_{r,1})=tr$$ as $r\downarrow0$.

So, if $r\downarrow0$ but (say) $tr\to1$, then $\JS(\mu_{r,t},\nu_{r,t})\to0$ but $W_p(\mu_{r,t},\nu_{r,t})=tr\to1\ne0$.

If now $t\downarrow0$ and $r\in(0,1/2)$ is fixed, then $W_p(\mu_{r,t},\nu_{r,t})=tr\to0$ but $\JS(\mu_{r,t},\nu_{r,t})=\JS(\mu_{r,1},\nu_{r,1})\not\to0$.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.