RDP 2019-08: The Well-meaning Economist Appendix A: Technical Material

A.1 Propositions

Proposition 1. If E[ f ( Y ) 2 ]< then for any predictor g(X) and for any strictly positive and fixed λ ,

A1 E[ λ ( f( Y )f( g( X ) ) ) 2 ]E[ λ ( f( Y )f( M f [ Y|X ] ) ) 2 ]

Proof. Define random variable e as equal to f( Y )f( M f [ Y|X ] ) or, equivalently, to f( Y )E[ f( Y )|X ]. Then

A2 E[ λ ( f( Y )f( g( X ) ) ) 2 ]=λE[ e+f( M f [ Y|X ] )f ( g( X ) ) 2 ]
A3 λE[ e 2 ]+2λE[ e( f( M f [ Y|X ] )f( g( X ) ) ) ]+λE[ ( f( M f [ Y|X ] )f( g( X ) ) ) 2 ]
A4 =λE[ e 2 ]+λE[ ( f( M f [ Y|X ] )f( g( X ) ) ) 2 ]
A5 λE[ e 2 ]
A6 =E[ λ ( f( Y )f( M f [ Y|X ] ) ) 2 ]

The step from Equation (A3) to (A4) uses the definition for e, which implies that e is uncorrelated with any function of X and has an arithmetic mean of 0. The proof just generalises a version for the arithmetic mean in Hansen (2019, p 24).

Proposition 2. Let π be a vector of parameters, and define ( X;π ) as a predictor for f (Y) that is constrained to take some pre-specified form g(X;·). Likewise, define h( X;π )= f 1 ( g( X;π ) ) as a predictor for Y that is constrained to take the pre-specified form h(X;·). Then for any strictly positive λ,

A7 π* =  argminE π [ λ ( f( Y )g( X;π ) ) 2 ]
A8 π* =  argminE π [ λ ( f( Y )f( h( X;π ) ) ) 2 ]

Proof. Trivially,

A9 π* =  argminE π [ λ ( f( Y )g( X;π ) ) 2 ]
A10 =π*= argminE π [ λ ( f( Y )f( f 1 ( g( X;π ) ) ) ) 2 ]

Proposition 3. Let f( g( χ ) ^ ) be some estimator for E[ f( Y )| X=χ ].

A11 E[ f( g( χ ) ^ ) ]=E[ f( Y )| X=χ ]
A12 M f [ g( χ ) ^ ]= M f [ Y| X=χ ]

where Equation (A12) defines quasi-unbiasedness of g( χ ) ^ for M f [ Y| X=χ ].

Proof. The result is a direct application of the definition in Equation (3).

Proposition 4. Let g( χ ) ˜ be an adjusted version of some fitted value g( χ ) ^ , such that f( g( χ ) ˜ )=f( g( χ ) ^ )Ψ( χ ), where Ψ( χ )=E[ f( g( χ ) ^ ) ]E[ f( Y )| X=χ ]. Hence f 1 ( E[ f( g( χ ) ˜ ) ] )= M f [ Y| X=χ ]. If E[ f ( Y ) 2 ]<, then for all f( g( χ ) ^ ), for all strictly positive λ, and for all χ,

A13 E[ λ( f( Y )f ( g( X ) ^ ) 2 )| X=χ ]E[ λ ( f( Y )f( g( X ) ˜ ) ) 2 | X=χ ]

So a fitted value that is quasi-unbiased for a quasilinear mean minimises the same loss function that justifies learning about the quasilinear mean in the first place.

Proof. Define e as equal to f( Y )E[ f( Y )|X ].

A14 E[ λ ( f( Y )f( g( χ ) ^ ) ) 2 | X=χ ]=E[ λ ( f( Y )f( g( χ ) ˜ )+Ψ( X ) ) 2 | X=χ ]
A15 =λE[ ( f( Y )E[ f( Y )|X ]f( g( X ) ˜ )+E[ f( Y )|X ]+Ψ( X ) ) 2 | X=χ ]
A16 =λE[ ( e( f( g( X ) ˜ )E[ f( Y )|X ]Ψ( X ) ) ) 2 | X=χ ]
A17 =λE[ e 2 2e( f( g( X ) ˜ )E[ f( Y )|X ]+Ψ( X ) )+ ( f( g( X ) ˜ )E[ f( Y )|X ]+Ψ( X ) ) 2 | X=χ ]
A18 =λE[ e 2 + ( f( g( X ) ˜ )E[ f( Y )|X ]+Ψ( X ) ) 2 | X=χ ]
A19 λE[ e 2 | X=χ ]+λE[ f( g( X ) ˜ )E [ f( Y )|X ] 2 | X=χ ]
A20 =E[ λ ( f( Y )f( g( X ) ˜ ) ) 2 | X=χ ]

The step from Equation (A17) to (A18) relies on the definition for e, which implies that e must be uncorrelated with any function of X and has an arithmetic mean of 0. The step from Equation (A18) to (A19) is a repeated application of Proposition 1.

Proposition 5. Let [ a ^ , b ^ ] be some confidence interval for E[ f( Y )| X=χ ].

A21 prob( a ^ >E[ f( Y )| X=χ ] )=prob( f 1 ( a ^ )> M f [ Y| X=χ ] )and
A22 prob( b ^ <E[ f( Y )| X=χ ] )=prob( f 1 ( b ^ )< M f [ Y| X=χ ] )

Proof. Since f(·) is continuous and strictly monotone over the domain of Y,

A23 prob( a ^ >E[ f( Y )| X=χ ] )=prob( f 1 ( a ^ )> f 1 ( E[ f( Y )| X=χ ] ) )and
A24 prob( b ^ <E[ f( Y )| X=χ ] )=prob( f 1 ( b ^ )< f 1 ( E[ f( Y )| X=χ ] ) )

Note that de Carvalho (2016) also provides a central limit theorem for quasilinear means, using the delta method.

Proposition 6. Let f(·) be continuous and strictly monotone over all possible values of random variable Y. Then

A25 g( X )=E[ Y|X ]
A26 f 1 ( g( X ) )= M f [ f 1 ( Y )|X ]

Proof.

A27 g( X )=E[ Y|X ]
A28 =E[ f( f 1 ( Y ) )|X ]
A29 f 1 ( g( X ) )= f 1 ( E[ f( f 1 ( Y ) )|X ] )
A30 M f [ f 1 ( Y )|X ]

A.2 Gravity Estimators Target Different Means

A.2.1 The Tinbergen (1962) method

Tinbergen (1962) and many subsequent papers take logs of the gravity equation and estimate the parameters with OLS. Econometrically the method is designed to target ln( h( X;Δ ) ) as defined in

A31 ln( T ij )=ln( δ 0 )+ δ 1 ln( S i )+ δ 2 ln( S j )+ δ 3 ln( D ij )+ln( ε ij )
A32 ln( h( X;Δ ) )+ln( ε ij )

for which

A33 E[ ln( ε ij )|X ]=0 and/or E[ ln( ε ij )ln( X ) ]=0

X is vector shorthand for the independent variables in the gravity equation and Δ is vector shorthand for the δ k parameters.

The two error conditions in Equation (A33) distinguish whether ln( h( X;Δ ) ) is exact for E[ ln( T ij )|X ] or just an arithmetic approximation. If the first error condition holds, the second does too, and ln( h( X;Δ ) ) is exact. The OLS estimates ln( h( χ; Δ ^ ) ) are then consistent and unbiased for ln( h( χ;Δ ) ), for all χ. If only the second error condition holds, ln( h( X;Δ ) ) is an arithmetic approximation because Δ still minimises the standard quadratic loss function. The OLS estimates are then only consistent. So far these are standard results from the literature.

The first error specification implies

A34 E[ ln( T ij ) ]=ln( δ 0 )+ δ 1 ln( S i )+ δ 2 ln( S j )+ δ 3 ln( D ij )
A35 exp( E[ ln( T ij ) ] )= δ 0 S i δ 1 S j δ 2 D ij δ 3
A36 h( X, Δ )

So h( X,Δ ) δ 0 S i δ 1 S j δ 2 D ij δ 3 is defined to describe the conditional geometric mean of trade. Alternatively, under only the second error specification, h( X;Δ ) is defined to describe a conditional geometric approximation of trade (by Proposition 2).

Regarding estimation, the exp(·) transformations of the fitted values from OLS have the form h( χ; Δ ^ ) and, by the continuous mapping theorem and Proposition 4, are attractive estimators of the geometric mean (or approximation) of trade. The vector Δ ^ is thus effective in estimating the Δ .

When Tinbergen used OLS on a logged gravity equation he focused his analysis on a small set of countries. A challenge when working with a large or full set of countries is that half of the sample can record zero trade values (see, for instance, Santos Silva and Tenreyro (2006) and Helpman, Melitz and Rubinstein (2008)). Logs are undefined for those observations and OLS estimation is impossible. One option is to truncate the sample. However, the truncated sample over-represents observations with positive errors, creating an endogeneity problem.

Some recent approaches have been able to retain the Tinbergen approach in a full-country analysis by explicitly modelling the zeros. Estimation in these cases has been conducted with, say, a two-stage Heckman-type procedure (Helpman et al 2008), or maximum likelihood (Eaton and Kortum 2001).[19] The target is still a geometric mean.

A.2.2 The inverse hyperbolic sine method

The IHS method uses OLS after log transforming the right side of the gravity equation and IHS transforming the dependent variable. Since the IHS transformation is close to the log transformation, the idea is to solve the zeros problem without materially compromising the original functional relationship. Econometrically, this method targets ln( h( X;Δ ) ) as defined in

A37 sinh 1 ( T ij )=ln( δ 0 )+ δ 1 ln( S i )+ δ 2 ln( S j )+ δ 3 ln( D ij )+ln( ε ij )
A38 ln( h( X;Δ ) )+ln( ε ij )

for which

A39 E[ ln( ε ij )|X ]=0 and/or E[ ln( ε ij )ln( X ) ]=0

The error conditions in Equation (A39) look the same as in Equation (A33). But they imply a different interpretation for Δ, because the IHS transformation has been applied to the left-hand side of the gravity equation. For instance, under the first error specification in Equation (A39) it is implied that

A40 E[ sinh 1 ( T ij ) ]=ln( δ 0 )+ δ 1 ln( S i )+ δ 2 ln( S j )+ δ 3 ln( D ij )
A41 sinh( E[ sinh 1 ( T ij ) ] )=sinh( ln( δ 0 )+...+ δ 3 ln( D ij ) )
A42 sinh( ln( h( X,Δ ) ) )

which also implies that

A43 exp( E[ ln( T ij ) ] )h( X,Δ )and
A44 E[ T ij ]h( X,Δ )

In other words, h( X;Δ ) here is not defined to describe a geometric or arithmetic mean of trade, as it was before. But the method does target an IHS mean of trade, with sinh( ln( h( X;Δ ) ) ).

Alternatively, under only the second error specification, sinh( ln( h( X;Δ ) ) ) is defined to describe a conditional IHS approximation of trade (by Proposition 2). Strictly speaking, Δ no longer contains elasticities, which now depend on X. The elasticities can be read off of the function sinh( ln( h( X;Δ ) ) ) though. They will generally be very close to a straight read of Δ , because of the similarity between the IHS and log transformations.

Regarding estimation, sinh(·) transformations of the fitted values from OLS produce estimates of the form sinh( ln( h( χ; Δ ^ ) ) ). By the continuous mapping theorem and Proposition 4, these are attractive estimators of the conditional IHS mean (or approximation) of trade. Bellemare and Wichman (forthcoming) show how to infer elasticities from this messy, estimated function. Otherwise it is common to crudely base elasticity estimates on a straight read of Δ ^ .

A.2.3 The Gamma-shifted log method

This method is identical except for GSL-transforming the dependent variable.[20] Repeating the same logic, the method targets parameters that define the conditional GSL mean of trade.

Footnotes

Helpman et al (2008) investigate the intensive margin of trade only. [19]

A sophisticated version of this method is Eaton and Tamura (1994). They treat γ as a threshold value under which trade is censored as a zero, and use maximum likelihood estimation. [20]