Identification under NR | RDP 2023-07: Identification and Inference under Narrative Restrictions

RDP 2023-07: Identification and Inference under Narrative Restrictions 4. Identification under NR

Raffaella Giacomini, Toru Kitagawa and Matthew Read

October 2023

Download the Paper 1.10MB

This section formally analyses identification in the SVAR under NR. Section 4.1 considers whether NR are point or set identifying in a frequentist sense. Section 4.2 introduces the notion of a ‘conditional identified set’, which extends the standard notion of an identified set to the setting where the mapping from reduced-form to structural parameters depends on the realisation of the data. This provides an interpretation of the set-valued mapping induced by the NR. Additionally, we make use of the conditional identified set when investigating the frequentist properties of our robust Bayesian procedure in Section 6.

4.1 Point identification under NR

Denoting the true parameter value by $(ϕ_{0}, Q_{0})$ , point identification for the parametric model (Equation (24)), which is based on the unconditional likelihood, requires that there is no other parameter value $(ϕ_{0}, Q_{0}) \neq (ϕ_{0}, Q_{0})$ that is observationally equivalent to $(ϕ_{0}, Q_{0})$ .^[12]

To assess the existence of observationally equivalent parameters, we analyse a statistical distance between $p (y^{T}, D_{N} = d | ϕ, Q)$ and $p (y^{T}, D_{N} = d | ϕ_{0}, Q_{0})$ that metrises observational equivalence. Since the support of the distribution of observables can depend on the parameters, it is convenient to work with the Hellinger distance:

(27)

\begin{array}{l} H D (ϕ, Q) \equiv {(\sum_{d = 0, 1} \int_{Y} {(p^{1 / 2} (y^{T}, D_{N} = d | ϕ, Q) - p^{1 / 2} (y^{T}, D_{N} = d | ϕ_{0}, Q_{0}))}^{2} d y^{T})}^{\frac{1}{2}} \\ = \sqrt{2} {(1 - H (ϕ, Q))}^{\frac{1}{2}}, where \\ H (ϕ, Q) \equiv \sum_{d = 0, 1} \int_{Y} p^{1 / 2} (y^{T}, D_{N} = d | ϕ, Q) \cdot p^{1 / 2} (y^{T}, D_{N} = d | ϕ_{0}, Q_{0}) d y^{T} \end{array}

and Y is the sample space for Y^T. As is known in the literature on minimum distance estimation, $(ϕ, Q)$ and $(ϕ_{0}, Q_{0})$ are observationally equivalent if and only if $H D (ϕ, Q) = 0$ or, equivalently, $ℋ (ϕ, Q) = 1$ (e.g. Basu, Shioya and Park 2011).

We similarly define the Hellinger distance for the conditional likelihood as

(28)

\begin{array}{l} H D_{c} (ϕ, Q) \equiv \sqrt{2} {(1 - ℋ_{c} (ϕ, Q))}^{\frac{1}{2}}, were \\ ℋ_{c} (ϕ, Q) \equiv {(\int_{Y} p^{1 / 2} (y^{T} | D_{N} = 1, ϕ, Q) \cdot p^{1 / 2} (y^{T} | D_{N} = 1, ϕ_{0} Q_{0}) d y^{T})}^{\frac{1}{2}} \end{array}

The next proposition analyses the conditions for $ℋ (ϕ, Q) = 1$ and $ℋ_{c} (ϕ, Q) = 1$ , and shows that observational equivalence of $(ϕ, Q)$ and $(ϕ_{0}, Q_{0})$ boils down to geometric equivalence of the set of reduced-form VAR innovations satisfying the NR.

Proposition 4.1. Let $(ϕ_{0}, Q_{0})$ be the true parameter value and let $U \equiv U (y^{T}; ϕ) = {({u^{'}}_{1}, ..., {u^{'}}_{T})}^{'}$ collect the reduced-form VAR innovations. Define

𝒬^{*} \equiv {\begin{matrix} Q \in 𝒪 (n) : {U : N (ϕ, Q, Y^{T}) \geq 0_{s \times 1}} = {U : N (ϕ_{0}, Q_{0}, Y^{T}) \geq 0_{n \times 1}} \\ u p t o f (Y^{T} | ϕ_{0}) - n u l l s e t, diag (Q^{'} Σ_{t r}^{- 1}) \geq 0_{n \times 1} \end{matrix}}

The unconditional likelihood model (Equation (24)) and the conditional likelihood model (Equation (23)) are globally identified (i.e. there are no observationally equivalent parameter points to $(ϕ_{0}, Q_{0})$ ) if and only if $𝒬^{*}$ is a singleton. If the parameter of interest is an impulse response to the jth structural shock, $η_{i, j, h} (ϕ, Q)$ as defined in Equation (15), then $η_{i, j, h} (ϕ, Q)$ is point identified if the projection of $𝒬^{*}$ onto its jth column vector is a singleton.

This proposition provides a necessary and sufficient condition for global identification of SVARs by NR. As shown in the proof in Appendix B, $𝒬^{*}$ defined in this proposition corresponds to the set of observationally equivalent values of Q given $ϕ = ϕ_{0}$ , but, importantly, it does not correspond to any flat region of the observed likelihood (the conditional identified set in Definition 4.1 below).

To illustrate this point, consider the bivariate model of Section 2 with the shock-sign restriction (Equation (3)), where y_t itself is the reduced-form error, so U in Proposition 4.1 can be set to y_k. Given $ϕ$ , the set of $y_{k} \in ℝ^{2}$ satisfying the NR is the half-space

(29)

{y_{k} \in ℝ^{2} : {(σ_{11} σ_{22})}^{- 1} (σ_{22} \cos θ - σ_{21} \sin θ, σ_{11} \sin θ) y_{k} \geq 0}

The condition for point identification shown in Proposition 4.1 is satisfied if no $θ^{'} \neq θ$ can generate a half-space identical to Equation (29). Such $θ^{'}$ cannot exist, since a half-space passing through the origin $(a_{1}, a_{2}) y_{k} \geq 0$ can be indexed uniquely by the slope a₁/a₂ and Equation (29) implies the slope $σ_{11}^{- 1} (σ_{22} {(\tan θ)}^{- 1} - σ_{21})$ is a bijective map of $θ$ on a constrained domain due to the sign normalisation. Figure 3 plots the squared Hellinger distances in the bivariate model under the shock-sign restriction (top panel) and the historical decomposition restriction (bottom panel). For both the conditional and unconditional likelihood, the squared Hellinger distances are minimised uniquely at the true $θ$ , which is consistent with our point-identification claim for $θ$ .^[13]

Figure 3: Squared Hellinger Distance in Bivariate Model

Proposition 4.1 also provides conditions under which $(ϕ, Q)$ is not globally identified, but a particular impulse response is. To give an example, consider an SVAR with n > 2 and with a shock-sign restriction on the first shock in period k. Given $ϕ$ , the set of $u_{k} \in ℝ^{n}$ satisfying the NR is a half-space defined by ${q^{'}}_{1} Σ_{t r}^{- 1} u_{k} \geq 0$ . The set of values of u_k satisfying this inequality is indexed uniquely by q₁ given $Σ_{t r}$ at its true value, so there are no values of Q that are observationally equivalent to Q₀ with $q_{1} \neq Q_{0} e_{1, n}$ . Any value for the remaining n – 1 columns of Q such that they are orthogonal to Q₀e_1,n will generate the same half-space for u_k, so $𝒬^{*}$ is not a singleton and the SVAR is not globally identified. However, the projection of $𝒬^{*}$ onto its first column is a singleton, so $η_{i, j, h} (ϕ, Q)$ is globally identified for all i and h.

Although a single NR can deliver global identification in the frequentist sense, the practical implication of this theoretical claim is not obvious. The observed unconditional likelihood is almost always flat at the maximum, so we cannot obtain a unique maximum likelihood estimator for the structural parameter. As a result, the standard asymptotic approximation of the sampling distribution of the maximum likelihood estimator is not applicable. The SVAR model with NR possesses features of set-identified models from the Bayesian standpoint (i.e. flat regions of the likelihood). However, strictly speaking, it can be classified as a globally identified model in the frequentist sense when the condition of Proposition 4.1 holds.

4.2 Conditional identified set

It is well-known that traditional sign restrictions $S (ϕ, Q) \geq 0_{\tilde{s} \times 1}$ set identify Q or, equivalently, the structural parameters. Given the reduced-form parameters $ϕ$ – which are point identified – there are multiple observationally equivalent values of Q, in the sense that there exists Q and $\tilde{Q} \neq Q$ such that $p (y^{T} | ϕ, Q) = p (y^{T} | ϕ, \tilde{Q})$ for every y^T in the sample space. The identified set for Q given $ϕ$ contains all such observationally equivalent parameter points, and is defined as

(30)

𝒬 (ϕ | S) = {Q \in 𝒪 (n) : S (ϕ, Q) \geq 0_{\tilde{s} \times 1}}

The identified set is a set-valued map only of $ϕ$ , which carries all the information about Q contained in the data.

The complication in applying this definition of the identified set in SVARs when there are NR is that $ϕ$ no longer represents all information about Q contained in the data; by truncating the likelihood, the realisations of the data entering the NR contain additional information about Q. To address this, we introduce a refinement of the definition of an identified set.

Definition 4.1. Let $N \equiv N (ϕ, Q, y^{T}) \geq 0_{s \times 1}$ represent a set of NR in terms of the parameters and the data.

(i) The conditional identified set for Q under NR is

(31)

𝒬 (ϕ | y^{T}, N) = {Q \in 𝒪 (n) : N (ϕ, Q, y^{T}) \geq 0_{s \times 1}}

The conditional identified set for the impulse response $η = η_{i, j, h} (ϕ, Q)$ under NR is defined by projecting $𝒬 (ϕ | y^{T}, N) v i a η_{i, j, h} (ϕ, Q)$ :

(32)

C I S_{η} (ϕ | y^{T}, N) = {η_{i, j, h} (ϕ, Q) : Q \in 𝒬 (ϕ | y^{T}, N)}

(ii) Let $s : Y \to ℝ^{S}$ be a statistic. We call s(y^T) a sufficient statistic for the conditional identified set $𝒬 (ϕ | y^{T}, N)$ if the conditional identified set for Q depends on the sample y^T through s(y^T); that is, there exists $\tilde{𝒬} (ϕ | y^{T}, N)$ such that

(33)

𝒬 (ϕ | y^{T}, N) = \tilde{𝒬} (ϕ | s (y^{T}), N)

holds for all $ϕ \in Φ$ and $y^{T} \in Y$ .

Unlike the standard identified set $𝒬 (ϕ | S)$ , the conditional identified set $𝒬 (ϕ | y^{T}, N)$ depends on the sample y^T because of the aforementioned data-dependent support of the likelihood. In terms of the observed likelihood, however, they share the property that the likelihood is flat on the (conditional) identified set. Hence, given the sample y^T and the reduced-form parameters $ϕ$ , all values of Q in $𝒬 (ϕ | y^{T}, N)$ fit the data equally well and, in this particular sense, they are observationally equivalent.

When the NR involve shocks in only a subset of time periods (as is typically the case), the conditional identified set depends on the sample only through the observations entering the NR, which are represented by the sufficient statistic s(y^T) in Definition 4.1(ii). For instance, in the example of Section 2.1 s(y^T) = y_k. If we extend the example to the SVAR(p), the shock-sign restriction in Equation (3) is

(34)

ε_{1 k} = {e^{'}}_{1, 2} A_{0} u_{k} = {e^{'}}_{1, 2} Q^{'} Σ_{t r}^{- 1} (y_{k} - B x_{k}) \geq 0

Hence, the conditional identified set $𝒬 (ϕ | y^{T}, N)$ depends on the data only through ${({y^{'}}_{k}^{}, {x^{'}}_{k})}^{'} = {({y^{'}}_{k}, {y^{'}}_{k - 1}, ..., {y^{'}}_{k - p})}^{'}$ , so we can set $s (y^{T}) = {({y^{'}}_{k}, {y^{'}}_{k - 1}, ..., {y^{'}}_{k - p})}^{'}$ .

If the conditional distribution of Y^T given s(Y^T) = s(y^T) is non-degenerate, we can consider a frequentist sampling experiment (repeated sampling of Y^T) conditional on the sufficient statistics set to their observed values. We can then view the conditional identified set $𝒬 (ϕ | y^{T}, N)$ as the standard identified set in set-identified models, since it no longer depends on the data in the conditional experiment where s(y^T) is fixed. This motivates referring to $𝒬 (ϕ | y^{T}, N)$ as the conditional identified set.

The conditional identified set resembles the finite-sample identified set introduced by Rosen and Ura (2020) in the context of maximum score estimation (Manski 1975, 1985). Their set corresponds to the plateau of the population objective function in the conditional frequentist sampling experiment given the regressors. If we impose only the shock-sign restrictions, and given knowledge of the true data-generating processes, the construction of the conditional identified set coincides with the construction of the finite-sample identified set for the scale-normalised coefficients, as they both solve the system of inequalities in Equations (3) or (34).^[14] Despite these common geometric features, there are several differences between the SVAR under NR and maximum score estimation. First, the SVAR under NR is a likelihood-based parametric model, while maximum score estimation is a semi-parametric binary regression without a likelihood. Second, NR directly trim the support of the sample objective function (the likelihood) by the intersection of inequalities, while the maximum score objective function counts the number of inequalities satisfied in the sample. Third, the number of NR depends on the researcher's choice, while the number of inequalities in maximum score estimation is driven by the support points of the regressors observed in the sample.

Footnotes

$(ϕ, Q) \neq (ϕ_{0}, Q_{0})$ is observationally equivalent to $(ϕ_{0}, Q_{0})$ if $p (y^{T}, D_{N} = d | ϕ, Q) = p (y^{T}, D_{N} = d | ϕ_{0}, Q_{0})$ holds for all y^T and $d \in {0, 1}$ . [12]

Under the restriction on the historical decomposition, a notable difference between the conditional and unconditional likelihood cases is the slope of the squared Hellinger distance around the minimum. The squared Hellinger distance of the unconditional likelihood has a steeper slope than the conditional likelihood. This indicates the loss of information for $θ$ in the conditional likelihood due to conditioning on a non-ancillary event. [13]

See also Komarova (2013) for the construction of identified sets for maximum score coefficients with discrete regressors. [14]