-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong F-statistics for robust SE #631
Comments
Wu-Hausman is off, and the f.stat for first-stage diagnostics in IV2SLS is twice as high as it should be with robust SE. It is also using chi2-distribution instead of F-distribution to compute the p-values. |
Can you provide a code example that shows what you are seeing? |
I can't reproduce. When I change As for chi2 vs F, I always use chi2(df) distributions rather than using the "asymptotic F" which is just chi2(df) / df. |
Sure. Take this data set (the source is ivreg, see below): Q P D F A
0 98.48 100.32 87.40 98.00 1
1 99.19 104.26 97.60 99.10 2
2 102.16 103.44 96.70 99.10 3
3 101.50 104.51 98.20 98.10 4
4 104.24 98.00 99.80 110.80 5
5 103.24 99.46 100.50 108.20 6
6 103.99 101.07 103.20 105.60 7
7 99.90 104.76 107.80 109.80 8
8 100.35 96.45 96.60 108.70 9
9 102.82 91.23 88.90 100.60 10
10 95.44 93.08 75.10 81.00 11
11 92.42 98.80 76.90 68.60 12
12 94.53 102.91 84.60 70.90 13
13 98.76 98.76 90.60 81.40 14
14 105.80 95.12 103.10 102.30 15
15 100.22 98.45 105.10 105.00 16
16 103.52 86.50 96.40 110.50 17
17 99.93 104.02 104.40 92.50 18
18 105.22 105.77 110.70 89.30 19
19 106.23 113.49 127.10 93.00 20 Run this: def run_diagnostics(mods):
print("\n----- Coef. SE ------")
for cov, res in mods.items():
print(f"{cov:10s}: {res.std_errors.tolist()}")
print("\n----- Breusch-Pagan Test for Heteroskedasticity ------")
for cov, res in mods.items():
endo = res.model.endog.pandas.assign(Intercept=1)
residuals = res.resids
bp_test = sms.het_breuschpagan(residuals, endo)
print(f"{cov:10s}: {bp_test}")
print("\n----- Wu-Hausman Test ------")
for cov, res in mods.items():
print(f"{cov:10s}:\n {res.wu_hausman()}")
print("\n----- Sargan ------")
for cov, res in mods.items():
print(f"{cov:10s}:\n {res.sargan}")
print("\n----- Basmann ------")
for cov, res in mods.items():
print(f"{cov:10s}:\n {res.basmann}")
print("\n----- First-stage diagnostics ------")
for cov, res in mods.items():
print(f"{cov:10s}:\n {res.first_stage.diagnostics}")
res_unadj = ivreg.from_formula("Q ~ 1 + [P ~ F + A] + D", data=df).fit(cov_type='unadjusted')
res_robust = ivreg.from_formula("Q ~ 1 + [P ~ F + A] + D", data=df).fit(cov_type='robust')
mods = {'Unadjusted': res_unadj,
'Robust' : res_robust}
run_diagnostics(mods) You will get ----- Coef. SE ------
Unadjusted: [7.3026520951185425, 0.04327991369214111, 0.08895412123517202]
Robust : [5.1474532209912285, 0.04292534502530685, 0.07589901329404808]
----- Breusch-Pagan Test for Heteroskedasticity ------
Unadjusted: (0.01697789593123389, 0.8963296692336946, 0.015293088561413301, 0.9029507926709841)
Robust : (0.01697789593123389, 0.8963296692336946, 0.015293088561413301, 0.9029507926709841)
----- Wu-Hausman Test ------
Unadjusted:
Wu-Hausman test of exogeneity
H0: All endogenous variables are exogenous
Statistic: 11.7617
P-value: 0.0034
Distributed: F(1,16)
Robust :
Wu-Hausman test of exogeneity
H0: All endogenous variables are exogenous
Statistic: 11.7617
P-value: 0.0034
Distributed: F(1,16)
----- Sargan ------
Unadjusted:
Sargan's test of overidentification
H0: The model is not overidentified.
Statistic: 2.9831
P-value: 0.0841
Distributed: chi2(1)
Robust :
Sargan's test of overidentification
H0: The model is not overidentified.
Statistic: 2.9831
P-value: 0.0841
Distributed: chi2(1)
----- Basmann ------
Unadjusted:
Basmann's test of overidentification
H0: The model is not overidentified.
Statistic: 2.8049
P-value: 0.0940
Distributed: chi2(1)
Robust :
Basmann's test of overidentification
H0: The model is not overidentified.
Statistic: 2.8049
P-value: 0.0940
Distributed: chi2(1)
----- First-stage diagnostics ------
Unadjusted:
rsquared partial.rsquared shea.rsquared f.stat f.pval f.dist
P 0.94 0.92 0.92 110.03 0.00 F(2,16)
Robust :
rsquared partial.rsquared shea.rsquared f.stat f.pval f.dist
P 0.94 0.92 0.92 284.68 0.00 chi2(2) So, the only adjustment happens for the SE of the coefficients and the first-starge F-stat, but the latter does not match other implementations. See ivreg in R, for instance. data("Kmenta", package = "ivreg")
res <- ivreg::ivreg(Q ~ D + P | D + F + A, data=df)
summary(res, vcov=sandwich::sandwich) You get: Call:
ivreg(formula = Q ~ D + P | D + F + A, data = df)
Residuals:
Min 1Q Median 3Q Max
-3.430 -1.243 -0.189 1.576 2.492
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 94.6333 5.1475 18.38 0.0000000000012 ***
D 0.3140 0.0429 7.31 0.0000012080527 ***
P -0.2436 0.0759 -3.21 0.0051 **
Diagnostic tests:
df1 df2 statistic p-value
Weak instruments 2 16 142.34 0.000000000064 ***
Wu-Hausman 1 16 21.90 0.00025 ***
Sargan 1 NA 2.98 0.08414 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.97 on 17 degrees of freedom
Multiple R-Squared: 0.755, Adjusted R-squared: 0.726
Wald test: 34.4 on 2 and 17 DF, p-value: 0.00000105
Your F-stat is twice the one here (see line "weak instruments") |
The F-stats for the diagnostics are all wrong when using robust/clustered SE. They are simply not being updated with robust/clustered cases. And the documentation is incorrect for cov_type='cluster'. It should be "clustered".
The text was updated successfully, but these errors were encountered: