Misc exercises in HDP chapter 3

Exercises from HDP chapter 3

$$ \DeclareMathOperator{\E}{\mathbb{E}} \newcommand{\R}{\mathbb{R}} \newcommand{\P}{\mathbb{P}} \newcommand{\dist}{\text{dist}} \newcommand{\Rn}{\mathbb{R}^n} \DeclareMathOperator{\Var}{Var} \DeclareMathOperator{\tr}{tr} \newcommand{\Norm}[1]{\left\|#1\right\|} \newcommand{\SGNorm}[1]{\left\|#1\right\|_{\psi_2}} $$

Expected $\ell^p$ norm of a random vector

Upper bound

\[\begin{aligned} \E\Norm{X}_p = \E\left[(\sum_{i=1}^n |X_i|^p)^{1/p}\right] \leq \left(\E \sum_{i=1}^n |X_i|^p\right)^{1/p} \stackrel{*}{\leq} C \left(\sum_{i=1}^n \SGNorm{X_i}^p p^{p/2}\right)^{1/p} \leq C K n^{1/p} \sqrt{p}. \end{aligned}\]

where $(*)$ follows from moment bounds of sugaussian norms. If $p \asymp \log n$, then $n \asymp e^p$, so

\[\sqrt{p} n^{1/p} \asymp \sqrt{\log n} (e^p)^{1/p} = \sqrt{\log n}.\]

Also the function $f(p) = \sqrt{p} n^{1/p}$ has derivative $\sqrt{p}\,n^{1/p} \left(\frac{1}{2p}-\frac{\log n}{p^{2}}\right)$, which is negative when $p \lesssim \log n$ and positive otherwise. So for $p \ge \log n$, we have

\[\E\Norm{X}_p \le C K \sqrt{\log n}.\]

Tightness of the upper bound for Gaussian vectors


Small ball probability

For all $\lambda>0$, we have

\[\begin{aligned} \P \{ \Norm{X-a}_2 \leq \varepsilon \sqrt{n}\} &= \P \{ \Norm{X-a}_2^2 \leq \varepsilon^2 n\} = \P \{ e^{- \Norm{X-a}_2^2/\lambda} \ge e^{- \varepsilon^2 n/\lambda}\} \\ & \leq e^{-\varepsilon^2 n / \lambda} \E [e^{- \Norm{X-a}_2^2/\lambda}] \\ & = e^{- \varepsilon^2 n/\lambda} \prod_{i=1}^n \E [e^{- (X_i-a)^2/\lambda}] \\ &= e^{- \varepsilon^2 n/\lambda} \prod_{i=1}^n \int_{\R} f_i(x) e^{- (x_i-a)^2/\lambda} dx \\ & \leq e^{- \varepsilon^2 n} K^n\prod_{i=1}^n \int_\R e^{- (x_i-a)^2/\lambda} dx \\ & = e^{- \varepsilon^2 n/\lambda} K^n \left(\frac{\pi}{\lambda}\right)^n. \end{aligned}\]

Solving $\inf_{\lambda >0} e^{- \varepsilon^2 n / \lambda} K^n (\sqrt{\pi \lambda})^n$, we take $\lambda=2\varepsilon^2$ to obtain the bound.

For the integral bound, we have

\[\begin{aligned} \E\left[\frac{1}{\Norm{X}_2}\right] &=\frac{1}{\sqrt{n}}\E\left[\frac{\sqrt{n}}{\Norm{X}_2}\right] \\ & = \frac{1}{\sqrt{n}}\int_0^\infty \P \{ \sqrt{n}/\Norm{X}_2 \ge \varepsilon\} d\varepsilon \\ & = \int_0^\infty \P \{ \Norm{X}_2 / \sqrt{n}\leq 1/\varepsilon\} d\varepsilon \\ & \leq \frac{1}{\sqrt{n}} \int_0^\infty (\sqrt{2\pi e} K \varepsilon^{-1})^n d\varepsilon \\ & = \frac{CK}{\sqrt{n}}. \end{aligned}\]

The expectation can be infinite for $n=1$ because $x^{-1}$ is not integrable on $(0, \infty)$, in which case the bound is trivial.

Expectation of an isotropic random vector

Let $X \in \mathbb{R}^n$ be isotropic.

For any vector $v \in \mathbb{R}^n$, by Jensen’s inequality,

\[|v^\top \mathbb{E} X|^2 = |\mathbb{E}[v^\top X]|^2 \le \mathbb{E}[(v^\top X)^2] = \|v\|_2^2.\]

Take $v = \E X$. If $\E X \neq 0$, then

\[\|\E X\|_2^2 \le \|\mathbb{E} X\|_2,\]

which implies

\[\|\E X\|_2 \le 1.\]

If $\E X = 0$, the inequality is trivially true.

Take $X_1 = 1$ a.s., and $X_2, \dots, X_n$ to be indpendent Rademacher. We have $\E X=1$ and $\E XX^\top = I_n$.

Delocalization of random vector on the sphere

Show that a random vector $X$ uniformly distributed on the unit Euclidean sphere in $\R^n$ $satisfies

\[\E \Norm{X}_\infty \asymp \sqrt{\log n / n}.\]

Write \(\Norm{X}_\infty = \Norm{g}_{\infty}/ \Norm{g}_2\). So

\[\E \frac{\Norm{g}_{\infty}}{\Norm{g}_2} \leq (\E\Norm{g}_{\infty}^2)^{1/2} \cdot \E\left[\frac{1}{\Norm{g}_2^2}\right]^{1/2} \leq C \sqrt{\frac{\log n}{n}}.\]

Note that we have

\[\P\{|\Norm{g}_2 - \sqrt{n} \ge t\} \leq 2 \exp\{-ct^2\}.\]

Consider the event

\[A:= \{\Norm{g}_2 \leq 2\sqrt{n}\}.\]

We have

\[\P\{A\} \ge 1-e^{-cn}.\]

So

\[\E\left[\frac{\|g\|_\infty}{\|g\|_2}\right] \ge \E\left[\frac{\|g\|_\infty}{\|g\|_2}\boldsymbol{1}\{A\}\right] \ge \frac{1}{2\sqrt{n}}\E[\|g\|_\infty \boldsymbol{1}\{A\}] \ge \frac{1}{2\sqrt{n}} \sqrt{\log n} \P\{A \cap \{\|g\|_\infty \ge \sqrt{\log n}\}\}.\]

Let \(B=\{\|g\|\ge \sqrt{\log n}\}\). Then

\[1-\P \{B\} \leq (1-c_1 e^{-(\sqrt{\log n})^2/2})^n \leq \exp\{-c_1 n^{-1/2}\}.\]

So

\[\P\{A \cap B\} = 1- \P(A^c) - \P(B^c) = 1-o(1).\]

Large cube probability

Let $K = [-a, a]^n$. We have:

\[\text{dist}(g, K)^2 = \|g - \pi_K(g)\|_2^2 = \sum_{i=1}^n (g_i - \text{sign}(g_i)\min (|g_i|, a))^2=\sum_{i=1}^n \left( (|g_i| - a)_+ \right)^2\]

where \((x)_+ = \max(0, x)\). Let \(Y_i = (\vert g_i \vert - a)_+^2\). Since \(g_i \sim N(0, 1)\), the $Y_i$ are i.i.d. random variables.

First, we analyze the expected contribution of a single coordinate to the squared distance.

\[\E[Y_1] = \E\left[ (|g_1| - a)_+^2 \right] = 2 \int_a^\infty (x - a)^2 \phi(x) \, dx\]

where $\phi(x)$ is the standard normal PDF. Since the Gaussian tail decays super-exponentially, \(\lim_{a \to \infty} \E[Y_1] = 0\). Therefore, we can choose a constant $a$ sufficiently large such that:

\[\E[Y_1] \le 10^{-6}\]

Let $D_n = \dist(g, K)^2 = \sum_{i=1}^n Y_i$. By linearity of expectation:

\[\E[D_n] = n \E[Y_1] \le 10^{-6} n\]

By Markov’s inequality, the probability that the distance is large is small:

\[\P(D_n \ge 10^{-5} n) \le \frac{\E[D_n]}{10^{-5} n} \le \frac{10^{-6} n}{10^{-5} n} = 0.1\]

(Note: We can choose an even larger $a$ to make this probability arbitrarily small, say $\le 0.005$.)

Next, we consider the norm \(\|g\|_2\). The quantity \(\|g\|_2^2\) follows a Chi-squared distribution with $n$ degrees of freedom ($\chi^2_n$). By the Law of Large Numbers, $|g|_2^2 / n \to 1$ almost surely. For sufficiently large $n$, we have concentration around the mean:

By the ones-sided version of Proposiiton 2.6.1 in , we have

\[\P(\Norm{g}-\sqrt{n} \le -\frac{3}{4} \sqrt{n}) = \P(\Norm{g} \le \sqrt{n}/4) \le e^{-cn/16} \leq 0.0025,\]

for $n$ large enough.

If $|g|_2^2 > 0.5 n$ and $D_n < 10^{-5} n$, then:

\[\frac{\dist(g, K)^2}{\|g\|_2^2} < \frac{10^{-5} n}{0.5 n} = 2 \cdot 10^{-5}\]

Taking the square root:

\[\frac{\dist(g, K)}{\|g\|_2} < \sqrt{2 \cdot 10^{-5}} \approx 0.0045 < 0.01\]

Thus, Condition 1 holds with high probability.

Finally, we must show $g \notin [-100a, 100a]^n$. This occurs if at least one coordinate exceeds $100a$ in magnitude. Consider the complement event: $g \in [-100a, 100a]^n$.

\[P\left( \max_{i} |g_i| \le 100a \right) = \prod_{i=1}^n P(|g_i| \le 100a) = (1 - p)^n\]

where $p = P(|g_1| > 100a) > 0$. Since $a$ is fixed, $p$ is a fixed positive constant. As $n \to \infty$, $(1-p)^n \to 0$. Thus, there exists an $n_0$ such that for all $n > n_0$:

\[P(g \in [-100a, 100a]^n) \le 0.0025.\]

By the union bound, the probability that any of the failure conditions occur is at most:

\[P(\text{fail}) \le P(D_n \text{ large}) + P(\|g\|_2^2 \text{ small}) + P(\text{stays in large box})\]

By choosing a sufficiently large $a$ (to control the first term) and a sufficiently large $n_0$ (to control the second and third terms), we can ensure:

\[P(\text{fail}) \le 0.005 + 0.0025 + 0.0025 = 0.01\]

Thus, with probability at least $0.99$, the conditions hold.