I am looking for concrete, preferably elementary, examples of pairs of probability measures $(\mu_n,\nu_n)$ on a common metric space (e.g. $\mathbb{R}^d$) that explicitly demonstrate the non-equivalence between the Wasserstein distances $(W_1, W_2)$ and the Jensen–Shannon divergence ($\mathrm{JS}$).
I am interested in examples exhibiting at least one of the following behaviors:
$\mathrm{JS}(\mu_n,\nu_n)\to 0 $ while $W_p(\mu_n,\nu_n)\not\to 0$ for $p\in \{1,2\}$ .
$W_p(\mu_n,\nu_n)\to 0$ while $\mathrm{JS}(\mu_n,\nu_n)\not\to 0$.
Ideally, the examples would illustrate which structural properties of the distances (transport vs. information-theoretic) are responsible for the discrepancy (e.g. mass splitting, support separation, heavy tails).
I know that Pinsker’s inequality bounds total variation in terms of KL divergence, and that Jensen–Shannon controls total variation as well. By contrast, I do not believe that in general, Wasserstein distances have any equivalence with the KL divergence beyond very indirect dimension-dependent bounds. I am therefore looking specifically for explicit and hopefully elementary counterexamples rather than abstract non-equivalence results.