Empirical Bayes Confidence Intervals

In progess. This note summarizes the methods proposed by Ignadiatis and Wager 2022.

Setup

Suppose we have following model

\[\begin{equation} Z_i = \mu_i + \epsilon_i, \hspace{1cm} \mu_i \stackrel{iid}{\sim} G\in \mathcal{G}, \hspace{1cm} \epsilon_i \stackrel{iid}{\sim} \mathcal{N}(0,1), \end{equation}\]

for some prior class $\mathcal{G}$. We are interested in estimating $\theta_G(z)=\mathbb{E}_G[h(\mu)\mid Z=z]$ for some function \(h\) and forming confidence intervals for the estimates. Ignadiatis and Wager propose two approaches for this task: \(F\)-localization and AMARI.

$F$-localization

Idea: find a level-$\alpha$ set of distribution functions $\mathcal{F}_n(\alpha)$ for the true marginal distribution function $F_G$, i.e.,

\[\lim \inf_{n\to \infty} \mathbb{P}_G\Big[ F_G \in \mathcal{F}_n(\alpha)\Big] \geq 1-\alpha.\]

For $z$ of interest, identify all $\hat{G}$ such that $F_{\widehat{G}}$ lies inside \(\mathcal{F}_n(\alpha)\). Now that we have a set of $\theta_{\widehat{G}}(z)$’s, we can simply take and maximum and minimum to form a confidence band.

AMARI

Prerequisite: bias-aware confidence intervals

Suppose we have some estimate $\hat{m}$ of some functional $m \in \mathcal{M}$ with standard error ${se}(\hat{m}(x_0))$. Denote the worse case bias by $B = \text{sup}_{m \in \mathcal{M}}|\hat{m}(x_0) - m(x_0)|$. Then a naive confidence interval for $m(x_0)$ is

\[\hat{m}(x_0) \pm (B + t_{1-\alpha/2} se(\hat{m}(x_0))).\]

To make the length of the CI as short as possible, a better one is given by

\[\hat{m}(x_0) \pm t_\alpha (se(\hat{m}(x_0)), B),\]

where

\[t_\alpha (se(\hat{m}(x_0)), B) = \inf \big\{ t > 0: \forall |b| \leq B: \mathbb{P}[|Z_b| \leq t] \geq \alpha\big\},\]

and $Z_b \sim \mathcal{N}(b, se(\hat{m}(x_0))^2)$.

Non-identifiability in the Bernoulli Model

Consider the model $Z_i \mid \mu_i \sim \text{Bernoulli}(\mu_i)$, and $\mu_i\sim G\in\mathcal{P}([0,1])$. Then we have $p(Z_i \mid \mu_i) = \mu_i^{z_i}(1-\mu_i)^{1-z_i}$, $ f_G(1) = \int \mu dG(\mu)$, and $f_G(0) = 1 - f_G(1)$. So the marginal distribution \(f_G\) of \(Z\) is determined by \(f_G(1)\).

The second moment \(L(G)=\int \mu^2 dG(\mu)\) is not point-identified. Different priors \(G\) with the same \(f_G(1)\), \(L(G)\) could have different \(L(G)\). The maximum is attained at \(G\) such that \(\PP_G[\{1\}]=f_G(1)\) and \(\PP_G[\{0\}] = f_G(0)\). In this case, \(L(G) = \text{Var}_G(\mu)+(f_G(1))^2 = f_G(1)\). The minimum is attained at \(\PP_G[\{f_G(1)\}]=1\), with \(L(G) = f_G(1)^2\).