RDP 2023-07: Identification and Inference under Narrative Restrictions 2. Bivariate Example

We illustrate the econometric issues that arise when imposing NR in the context of the following bivariate SVAR(0): ${A}_{0}{y}_{t}={\epsilon }_{t}$, for t = 1,…,T, where yt = (y1t, y2t)′ and ${\epsilon }_{t}={\left({\epsilon }_{1t},{\epsilon }_{2t}\right)}^{\prime }$, with ${\epsilon }_{t}^{{}_{\sim }^{iid}}\mathcal{N}\left({0}_{2×1},{\text{I}}_{2}\right)$. We abstract from dynamics for ease of exposition, but this is without loss of generality. The orthogonal reduced form of the model reparameterises A0 as ${Q}^{\prime }{\sum }_{tr}^{-1}$, where ${\sum }_{tr}$ is the lower-triangular Cholesky factor (with positive diagonal elements) of $\sum =𝔼\left({y}_{t}{{y}^{\prime }}_{t}\right)={A}_{0}^{-1}{\left({A}_{0}^{-1}\right)}^{\prime }$. We parameterise ${\Sigma }_{tr}$ as

(1) $Σ tr =[ σ 11 0 σ 12 σ 22 ]( σ 11 , σ 22 >0 )$

and denote the vector of reduced-form parameters as $\varphi =\text{vech}\left({\Sigma }_{tr}\right)$. Q is an orthonormal matrix in the space of 2×2 orthonormal matrices, $𝒪$ (2):

(2) $Q∈𝒪( 2 )={ [ cosθ −sinθ sinθ cosθ ] }∪{ [ cosθ sinθ sinθ −cosθ ] }$

where $\theta \in \left[-\pi ,\pi \right]$. This formulation of the model, which follows Baumeister and Hamilton (2015), means that the structural parameters can be expressed as functions of the reduced-form parameters and $\theta$. Restrictions on the structural parameters and/or functions of the structural shocks can then be interpreted as restricting $\theta$ to some set. In what follows, we discuss properties of this set that are key for analysing identification and inference under NR.

2.1 Shock-sign restrictions

Consider the ‘shock-sign restriction’ that ${\epsilon }_{1k}$ is non-negative for some $k\in \left\{1,...,T\right\}$:

(3) $ε 1k = e ′ 1,2 A 0 y k = ( σ 11 σ 22 ) −1 ( σ 22 y 1k cosθ+( σ 11 y 2k − σ 21 y 1k )sinθ )≥0$

Equation (3) implies that the restricted structural shock can be written as a function ${\epsilon }_{1k}\left(\theta ,\varphi ,{y}_{k}\right)$. Along with the ‘sign normalisation’ $\text{diag}\left({A}_{0}\right)\ge {0}_{2×1}$, the shock-sign restriction implies that $\theta$ is restricted to the set

(4) $θ∈{ θ: σ 21 sinθ≤ σ 22 cosθ, cosθ≥0, σ 22 y 1k cosθ≥( σ 21 y 1k − σ 11 y 2k )sinθ } ∪{ θ: σ 21 sinθ≤ σ 22 cosθ,cosθ≤0, σ 22 y 1k cosθ≥( σ 21 y 1k − σ 11 y 2k )sinθ }$

The restriction induces a set-valued mapping from $\varphi$ to $\theta$ that depends on the realisation of yk. Giacomini et al (2022a) characterise this mapping in the case where ${\sigma }_{21}<0$. For example, if $h\left(\varphi ,{y}_{k}\right)={\sigma }_{21}{y}_{1k}-{\sigma }_{11}{y}_{2k}<0$, then

(5) $θ∈[ arctan( max{ σ 22 σ 21 ,C( ϕ, y k ) } ) ,π+arctan( min{ σ 22 σ 21 ,C( ϕ, y k ) } ) ]$

where $C\left(\varphi ,{y}_{k}\right)={\sigma }_{22}{y}_{1k}/h\left(\varphi ,{y}_{k}\right)$. The direct dependence of this mapping on the realisation of the data implies that the standard notion of an identified set – the set of observationally equivalent structural parameters given the reduced-form parameters – does not apply. Consequently, it is not obvious whether the restrictions are, in fact, set identifying in a formal frequentist sense, nor whether existing frequentist procedures for conducting inference in set-identified models are valid. We analyse identification under these restrictions in Section 4.

When conducting Bayesian inference, AR18 construct the posterior using the conditional likelihood – the likelihood of observing the data conditional on the NR holding. Letting ${y}^{T}={\left({{y}^{\prime }}_{1},...,{{y}^{\prime }}_{T}\right)}^{\prime }$, the conditional likelihood is

(6) $p( y T | θ,ϕ, ε 1k ( θ,ϕ, y k ) ≥0 )= Π t=1 T ( 2π ) −1 | Σ | − 1 2 exp( − 1 2 y ′ t Σ −1 y t ) Pr( ε 1k ≥0| θ,ϕ ) 1( ε 1k ( θ,ϕ, y k )≥0 )$

The denominator in the first term – the ex ante probability that the NR is satisfied – equals ½, because ${\epsilon }_{1k}$ is standard normal. The conditional likelihood therefore depends on $\theta$ only through the indicator function $1\left({\epsilon }_{1k}\left(\theta ,\varphi ,{y}_{k}\right)\ge 0\right)$, which truncates the likelihood, with the truncation points depending on yk. To illustrate, the left panel of Figure 1 plots the conditional likelihood as a function of $\theta$ given two realisations of a data-generating process and fixing $\varphi$ to its true value.[5] The conditional likelihood is flat over the interval for $\theta$ satisfying the shock-sign restriction and is zero outside this interval. The support of the non-zero region depends on yk.

The flat likelihood implies that the posterior for $\theta$ is proportional to the prior in the region where the likelihood is non-zero, and is zero outside this region. The standard approach to Bayesian inference in SVARs under sign restrictions assumes a uniform prior over Q, as do AR18.[6] In the bivariate example, this is equivalent to a prior for $\theta$ that is uniform (Baumeister and Hamilton 2015). This prior implies that the posterior for $\theta$ is also uniform over the interval for $\theta$ where the likelihood is non-zero.

The impact impulse response of y1t to a positive standard deviation shock ${\epsilon }_{1t}$ is $\eta \equiv {\sigma }_{11}\mathrm{cos}\theta$. The right panel of Figure 1 plots the posterior for $\eta$ induced by a uniform prior over $\theta$ given the same realisations of the data for which the likelihood was plotted in the left panel. It can be seen that the posterior for $\eta$ assigns more probability mass to more-extreme values of $\eta$. This highlights that even a uniform prior may be informative for parameters of interest, which also occurs under traditional sign restrictions (Baumeister and Hamilton 2015). One difference is that the prior under sign restrictions is never updated by the data, whereas the support and shape of the posterior for $\eta$ under NR may depend on the realisation of yk through its effect on the truncation points of the likelihood, so there may be some updating of the prior. However, the prior is not updated at values of $\theta$ corresponding to the flat region of the likelihood. Posterior inference about $\eta$ may therefore still be sensitive to the choice of prior, as in standard set-identified SVARs.

2.2 Historical decomposition restrictions

The historical decomposition is the contribution of a particular structural shock to the observed unexpected change in a particular variable over some horizon. The contribution of the first shock to the change in the first variable in the kth period is

(7) $H 1,1,k ( θ,ϕ, y k )= σ 22 −1 ( σ 22 y 1k cos 2 θ+( σ 11 y 2k − σ 21 y 1k )cosθsinθ )$

while the contribution of the second shock is

(8) $H 1,2,k ( θ,ϕ, y k )= σ 22 −1 ( σ 22 y 1k sin 2 θ+( σ 21 y 1k − σ 11 y 2k )cosθsinθ )$

Consider the restriction that the first structural shock in period k was positive and (in the language of AR18) the ‘most important contributor’ to the change in the first variable, which requires that $|{H}_{1,1,k}\left(\theta ,\varphi ,{y}_{k}\right)|\ge |{H}_{1,2,k}\left(\theta ,\varphi ,{y}_{k}\right)|$. Under these restrictions, $\theta$ must satisfy a set of inequalities that depends on $\varphi$ and yk. As in the case of the shock-sign restriction, this set of inequalities generates a set-valued mapping from $\varphi$ to $\theta$ that depends on yk.

Let $\mathcal{D}\left(\theta ,\varphi ,{y}_{k}\right)=1\left\{{\epsilon }_{1k}\left(\theta ,\varphi ,{y}_{k}\right)\ge 0,|{H}_{1,1,k}\left(\theta ,\varphi ,{y}_{k}\right)|\ge |{H}_{1,2,k}\left(\theta ,\varphi ,{y}_{k}\right)|\right\}$ represent the indicator function equal to one when the NR are satisfied and equal to zero otherwise, and let $\stackrel{˜}{\mathcal{D}}\left(\theta ,\varphi ,{\epsilon }_{k}\right)=1\left\{{\epsilon }_{1k}\ge 0,|{\stackrel{˜}{H}}_{1,1,k}\left(\theta ,\varphi ,{\epsilon }_{1k}\right)|\ge |{\stackrel{˜}{H}}_{1,2,k}\left(\theta ,\varphi ,{\epsilon }_{2k}\right)|\right\}$ denote the indicator function for the same event in terms of the structural shocks rather than the data. The conditional likelihood given the restrictions is then

(9) $p( y T | θ,ϕ, D( θ,ϕ, y k )=1 )= Π t=1 T ( 2π ) − n 2 | Σ | − 1 2 exp( − 1 2 y ′ t Σ −1 y t ) Pr( D ˜ ( θ,ϕ, ε k )=1| θ,ϕ ) D( θ,ϕ, y k )$

In contrast to the case of shock-sign restrictions, the probability in the denominator now depends on $\theta$ through the historical decomposition. Intuitively, changing $\theta$ changes the impulse responses of y1t to the two shocks and thus changes the ex ante probability that $|{\stackrel{˜}{H}}_{1,1,k}\left(\theta ,\varphi ,{\epsilon }_{1k}\right)|\ge |{\stackrel{˜}{H}}_{1,2,k}\left(\theta ,\varphi ,{\epsilon }_{2k}\right)|$. Consequently, the likelihood is not necessarily flat when it is non-zero.

To illustrate, the top panel of Figure 2 plots the conditional likelihood under the historical decomposition NR using the same data-generating process as in Figure 1. The bottom panel plots the probability in the denominator of the conditional likelihood. The likelihood is again truncated, but it is no longer flat – it has a maximum at the value of $\theta$ that minimises the ex ante probability that the NR are satisfied (within the set of values of $\theta$ that are consistent with the restriction given the realisation of the data). The posterior for $\theta$ induced by the usual uniform prior will therefore assign greater posterior probability to values of $\theta$ that yield a lower ex ante probability of satisfying the NR.

If we view the narrative event ($\text{i}\text{.e}\text{.}\text{\hspace{0.17em}}\stackrel{˜}{\mathcal{D}}\left(\theta ,\varphi ,{\epsilon }_{k}\right)$) as observable and its probability of occurring depends on the parameter of interest, then conditioning on the narrative event implies that we are conditioning on a non-ancillary statistic. This is undesirable when conducting likelihood-based inference, because it represents a loss of information about the parameter of interest. Unlike for the shock-sign restriction, the probability that the historical decomposition restriction is satisfied depends on $\theta$, so the event that the NR are satisfied is not ancillary. Conditioning on this event means that the shape of the likelihood (within the non-zero region) is fully driven by the inverse probability of the conditioning event.

Based on this consideration, we therefore advocate constructing the posterior using the joint (or unconditional) likelihood of observing the data and the NR holding:

(10) $p( y T ,D( θ,ϕ, y k )=1| θ,ϕ )= ∏ t=1 T ( 2π ) −1 | Σ | − 1 2 exp( − 1 2 y ′ t Σ −1 y t ) D( θ,ϕ, y k )$

For all types of NR, the unconditional likelihood is flat with respect to $\theta$ (when it is non-zero) and depends on $\theta$ only through the points of truncation. Of course, this means that posterior inference based on the unconditional likelihood may be sensitive to the choice of prior, as when using the conditional likelihood under shock-sign restrictions. In Section 5.2, we propose how to deal with this posterior sensitivity.

Footnotes

The data-generating process assumes vec(A0) = (1,0.2,0.5,1.2)′, which implies that ${\sigma }_{21}<0$ and $\theta =\mathrm{arcsin}\left(0.5{\sigma }_{22}\right)$ with Q equal to the rotation matrix. We assume the time series is of length T = 3 and draw sequences of structural shocks such that ${\epsilon }_{1,1}\ge 0.$ T is a small number to control Monte Carlo sampling error. The analysis with $\varphi$ set to its true value replicates the situation with a large sample, where the likelihood for $\varphi$ concentrates at the truth. It also facilitates visualising the likelihood, which otherwise is a function of four parameters. [5]

See, for example, Uhlig (2005), Rubio-Ramírez, Waggoner and Zha (2010) and Arias, Rubio-Ramírez and Waggoner (2018). [6]