\[ \newcommand{\PP}{\mathbb{P}} \newcommand{\EE}{\mathbb{E}} \]
In progess. This note summarizes the methods proposed by (Ignatiadis and Wager 2022).
Setup
Suppose we have following model
\[ \begin{equation} Z_i = \mu_i + \epsilon_i, \hspace{1cm} \mu_i \stackrel{iid}{\sim} G\in \mathcal{G}, \hspace{1cm} \epsilon_i \stackrel{iid}{\sim} \mathcal{N}(0,1), \end{equation} \]
for some prior class \(\mathcal{G}\). We are interested in estimating \(\theta_G(z)=\mathbb{E}_G[h(\mu)\mid Z=z]\) for some function \(h\) and forming confidence intervals for the estimates. Ignadiatis and Wager propose two approaches for this task: \(F\)-localization and AMARI.
\(F\)-localization
Idea: find a level-\(\alpha\) set of distribution functions \(\mathcal{F}_n(\alpha)\) for the true marginal distribution function \(F_G\), i.e.,
\[ \lim \inf_{n\to \infty} \mathbb{P}_G\Big[ F_G \in \mathcal{F}_n(\alpha)\Big] \geq 1-\alpha. \]
For \(z\) of interest, identify all \(\hat{G}\) such that \(F_{\widehat{G}}\) lies inside \[\mathcal{F}_n(\alpha)\]. Now that we have a set of \(\theta_{\widehat{G}}(z)\)’s, we can simply take and maximum and minimum to form a confidence band.
AMARI
Prerequisite: bias-aware confidence intervals
Suppose we have some estimate \(\hat{m}\) of some functional \(m \in \mathcal{M}\) with standard error \({se}(\hat{m}(x_0))\). Denote the worse case bias by \(B = \text{sup}_{m \in \mathcal{M}}\|\hat{m}(x_0) - m(x_0)\|\). Then a naive confidence interval for \(m(x_0)\) is
\[ \hat{m}(x_0) \pm (B + t_{1-\alpha/2} se(\hat{m}(x_0))). \]
To make the length of the CI as short as possible, a better one is given by
\[ \hat{m}(x_0) \pm t_\alpha (se(\hat{m}(x_0)), B), \]
where
\[ t_\alpha (se(\hat{m}(x_0)), B) = \inf \big\{ t > 0: \forall |b| \leq B: \mathbb{P}[|Z_b| \leq t] \geq \alpha\big\}, \]
and \(Z_b \sim \mathcal{N}(b, se(\hat{m}(x_0))^2)\).
Non-identifiability in the Bernoulli Model
Consider the model \(Z_i \mid \mu_i \sim \text{Bernoulli}(\mu_i)\), and \(\mu_i\sim G\in\mathcal{P}([0,1])\). Then we have \(p(Z_i \mid \mu_i) = \mu_i^{z_i}(1-\mu_i)^{1-z_i}\), \(f_G(1) = \int \mu dG(\mu)\), and \(f_G(0) = 1 - f_G(1)\). So the marginal distribution \(f_G\) of \(Z\) is determined by \(f_G(1)\).
The second moment \(L(G)=\int \mu^2 dG(\mu)\) is not point-identified. Different priors \(G\) with the same \(f_G(1)\), $\(L(G)\) could have different \(L(G)\). The maximum is attained at \(G\) such that \(\PP_G[\\{1\\}]=f_G(1)\) and \(\PP_G[\{0\}] = f_G(0)\). In this case, \(L(G) = \text{Var}_G(\mu)+(f_G(1))^2 = f_G(1)\). The minimum is attained at \(\PP_G[\\{f_G(1)\\}]=1\), with \(L(G) = f_G(1)^2\).