Coverage for /home/martinb/.local/share/virtualenvs/camcops/lib/python3.6/site-packages/statsmodels/stats/sandwich_covariance.py : 17%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
1# -*- coding: utf-8 -*-
2"""Sandwich covariance estimators
5Created on Sun Nov 27 14:10:57 2011
7Author: Josef Perktold
8Author: Skipper Seabold for HCxxx in linear_model.RegressionResults
9License: BSD-3
11Notes
12-----
14for calculating it, we have two versions
16version 1: use pinv
17pinv(x) scale pinv(x) used currently in linear_model, with scale is
181d (or diagonal matrix)
19(x'x)^(-1) x' scale x (x'x)^(-1), scale in general is (nobs, nobs) so
20pretty large general formulas for scale in cluster case are in [4],
21which can be found (as of 2017-05-20) at
22http://www.tandfonline.com/doi/abs/10.1198/jbes.2010.07136
23This paper also has the second version.
25version 2:
26(x'x)^(-1) S (x'x)^(-1) with S = x' scale x, S is (kvar,kvars),
27(x'x)^(-1) is available as normalized_covparams.
31S = sum (x*u) dot (x*u)' = sum x*u*u'*x' where sum here can aggregate
32over observations or groups. u is regression residual.
34x is (nobs, k_var)
35u is (nobs, 1)
36x*u is (nobs, k_var)
39For cluster robust standard errors, we first sum (x*w) over other groups
40(including time) and then take the dot product (sum of outer products)
42S = sum_g(x*u)' dot sum_g(x*u)
43For HAC by clusters, we first sum over groups for each time period, and then
44use HAC on the group sums of (x*w).
45If we have several groups, we have to sum first over all relevant groups, and
46then take the outer product sum. This can be done by summing using indicator
47functions or matrices or with explicit loops. Alternatively we calculate
48separate covariance matrices for each group, sum them and subtract the
49duplicate counted intersection.
51Not checked in details yet: degrees of freedom or small sample correction
52factors, see (two) references (?)
55This is the general case for MLE and GMM also
57in MLE hessian H, outerproduct of jacobian S, cov_hjjh = HJJH,
58which reduces to the above in the linear case, but can be used
59generally, e.g. in discrete, and is misnomed in GenericLikelihoodModel
61in GMM it's similar but I would have to look up the details, (it comes
62out in sandwich form by default, it's in the sandbox), standard Newey
63West or similar are on the covariance matrix of the moment conditions
65quasi-MLE: MLE with mis-specified model where parameter estimates are
66fine (consistent ?) but cov_params needs to be adjusted similar or
67same as in sandwiches. (I did not go through any details yet.)
69TODO
70----
71* small sample correction factors, Done for cluster, not yet for HAC
72* automatic lag-length selection for Newey-West HAC,
73 -> added: nlag = floor[4(T/100)^(2/9)] Reference: xtscc paper, Newey-West
74 note this will not be optimal in the panel context, see Peterson
75* HAC should maybe return the chosen nlags
76* get consistent notation, varies by paper, S, scale, sigma?
77* replace diag(hat_matrix) calculations in cov_hc2, cov_hc3
80References
81----------
82[1] John C. Driscoll and Aart C. Kraay, “Consistent Covariance Matrix Estimation
83with Spatially Dependent Panel Data,” Review of Economics and Statistics 80,
84no. 4 (1998): 549-560.
86[2] Daniel Hoechle, "Robust Standard Errors for Panel Regressions with
87Cross-Sectional Dependence", The Stata Journal
89[3] Mitchell A. Petersen, “Estimating Standard Errors in Finance Panel Data
90Sets: Comparing Approaches,” Review of Financial Studies 22, no. 1
91(January 1, 2009): 435 -480.
93[4] A. Colin Cameron, Jonah B. Gelbach, and Douglas L. Miller, “Robust Inference
94With Multiway Clustering,” Journal of Business and Economic Statistics 29
95(April 2011): 238-249.
98not used yet:
99A.C. Cameron, J.B. Gelbach, and D.L. Miller, “Bootstrap-based improvements
100for inference with clustered errors,” The Review of Economics and
101Statistics 90, no. 3 (2008): 414–427.
103"""
104import numpy as np
106from statsmodels.tools.grouputils import combine_indices, group_sums
107from statsmodels.stats.moment_helpers import se_cov
109__all__ = ['cov_cluster', 'cov_cluster_2groups', 'cov_hac', 'cov_nw_panel',
110 'cov_white_simple',
111 'cov_hc0', 'cov_hc1', 'cov_hc2', 'cov_hc3',
112 'se_cov', 'weights_bartlett', 'weights_uniform']
117#----------- from linear_model.RegressionResults
118'''
119 HC0_se
120 White's (1980) heteroskedasticity robust standard errors.
121 Defined as sqrt(diag(X.T X)^(-1)X.T diag(e_i^(2)) X(X.T X)^(-1)
122 where e_i = resid[i]
123 HC0_se is a property. It is not evaluated until it is called.
124 When it is called the RegressionResults instance will then have
125 another attribute cov_HC0, which is the full heteroskedasticity
126 consistent covariance matrix and also `het_scale`, which is in
127 this case just resid**2. HCCM matrices are only appropriate for OLS.
128 HC1_se
129 MacKinnon and White's (1985) alternative heteroskedasticity robust
130 standard errors.
131 Defined as sqrt(diag(n/(n-p)*HC_0)
132 HC1_se is a property. It is not evaluated until it is called.
133 When it is called the RegressionResults instance will then have
134 another attribute cov_HC1, which is the full HCCM and also `het_scale`,
135 which is in this case n/(n-p)*resid**2. HCCM matrices are only
136 appropriate for OLS.
137 HC2_se
138 MacKinnon and White's (1985) alternative heteroskedasticity robust
139 standard errors.
140 Defined as (X.T X)^(-1)X.T diag(e_i^(2)/(1-h_ii)) X(X.T X)^(-1)
141 where h_ii = x_i(X.T X)^(-1)x_i.T
142 HC2_se is a property. It is not evaluated until it is called.
143 When it is called the RegressionResults instance will then have
144 another attribute cov_HC2, which is the full HCCM and also `het_scale`,
145 which is in this case is resid^(2)/(1-h_ii). HCCM matrices are only
146 appropriate for OLS.
147 HC3_se
148 MacKinnon and White's (1985) alternative heteroskedasticity robust
149 standard errors.
150 Defined as (X.T X)^(-1)X.T diag(e_i^(2)/(1-h_ii)^(2)) X(X.T X)^(-1)
151 where h_ii = x_i(X.T X)^(-1)x_i.T
152 HC3_se is a property. It is not evaluated until it is called.
153 When it is called the RegressionResults instance will then have
154 another attribute cov_HC3, which is the full HCCM and also `het_scale`,
155 which is in this case is resid^(2)/(1-h_ii)^(2). HCCM matrices are
156 only appropriate for OLS.
158'''
160# Note: HCCM stands for Heteroskedasticity Consistent Covariance Matrix
161def _HCCM(results, scale):
162 '''
163 sandwich with pinv(x) * diag(scale) * pinv(x).T
165 where pinv(x) = (X'X)^(-1) X
166 and scale is (nobs,)
167 '''
168 H = np.dot(results.model.pinv_wexog,
169 scale[:,None]*results.model.pinv_wexog.T)
170 return H
172def cov_hc0(results):
173 """
174 See statsmodels.RegressionResults
175 """
177 het_scale = results.resid**2 # or whitened residuals? only OLS?
178 cov_hc0 = _HCCM(results, het_scale)
180 return cov_hc0
182def cov_hc1(results):
183 """
184 See statsmodels.RegressionResults
185 """
187 het_scale = results.nobs/(results.df_resid)*(results.resid**2)
188 cov_hc1 = _HCCM(results, het_scale)
189 return cov_hc1
191def cov_hc2(results):
192 """
193 See statsmodels.RegressionResults
194 """
196 # probably could be optimized
197 h = np.diag(np.dot(results.model.exog,
198 np.dot(results.normalized_cov_params,
199 results.model.exog.T)))
200 het_scale = results.resid**2/(1-h)
201 cov_hc2_ = _HCCM(results, het_scale)
202 return cov_hc2_
204def cov_hc3(results):
205 """
206 See statsmodels.RegressionResults
207 """
209 # above probably could be optimized to only calc the diag
210 h = np.diag(np.dot(results.model.exog,
211 np.dot(results.normalized_cov_params,
212 results.model.exog.T)))
213 het_scale=(results.resid/(1-h))**2
214 cov_hc3_ = _HCCM(results, het_scale)
215 return cov_hc3_
217#---------------------------------------
219def _get_sandwich_arrays(results, cov_type=''):
220 """Helper function to get scores from results
222 Parameters
223 """
225 if isinstance(results, tuple):
226 # assume we have jac and hessian_inv
227 jac, hessian_inv = results
228 xu = jac = np.asarray(jac)
229 hessian_inv = np.asarray(hessian_inv)
230 elif hasattr(results, 'model'):
231 if hasattr(results, '_results'):
232 # remove wrapper
233 results = results._results
234 # assume we have a results instance
235 if hasattr(results.model, 'jac'):
236 xu = results.model.jac(results.params)
237 hessian_inv = np.linalg.inv(results.model.hessian(results.params))
238 elif hasattr(results.model, 'score_obs'):
239 xu = results.model.score_obs(results.params)
240 hessian_inv = np.linalg.inv(results.model.hessian(results.params))
241 else:
242 xu = results.model.wexog * results.wresid[:, None]
244 hessian_inv = np.asarray(results.normalized_cov_params)
246 # experimental support for freq_weights
247 if hasattr(results.model, 'freq_weights') and not cov_type == 'clu':
248 # we do not want to square the weights in the covariance calculations
249 # assumes that freq_weights are incorporated in score_obs or equivalent
250 # assumes xu/score_obs is 2D
251 # temporary asarray
252 xu /= np.sqrt(np.asarray(results.model.freq_weights)[:, None])
254 else:
255 raise ValueError('need either tuple of (jac, hessian_inv) or results' +
256 'instance')
258 return xu, hessian_inv
261def _HCCM1(results, scale):
262 '''
263 sandwich with pinv(x) * scale * pinv(x).T
265 where pinv(x) = (X'X)^(-1) X
266 and scale is (nobs, nobs), or (nobs,) with diagonal matrix diag(scale)
268 Parameters
269 ----------
270 results : result instance
271 need to contain regression results, uses results.model.pinv_wexog
272 scale : ndarray (nobs,) or (nobs, nobs)
273 scale matrix, treated as diagonal matrix if scale is one-dimensional
275 Returns
276 -------
277 H : ndarray (k_vars, k_vars)
278 robust covariance matrix for the parameter estimates
280 '''
281 if scale.ndim == 1:
282 H = np.dot(results.model.pinv_wexog,
283 scale[:,None]*results.model.pinv_wexog.T)
284 else:
285 H = np.dot(results.model.pinv_wexog,
286 np.dot(scale, results.model.pinv_wexog.T))
287 return H
289def _HCCM2(hessian_inv, scale):
290 '''
291 sandwich with (X'X)^(-1) * scale * (X'X)^(-1)
293 scale is (kvars, kvars)
294 this uses results.normalized_cov_params for (X'X)^(-1)
296 Parameters
297 ----------
298 results : result instance
299 need to contain regression results, uses results.normalized_cov_params
300 scale : ndarray (k_vars, k_vars)
301 scale matrix
303 Returns
304 -------
305 H : ndarray (k_vars, k_vars)
306 robust covariance matrix for the parameter estimates
308 '''
309 if scale.ndim == 1:
310 scale = scale[:,None]
312 xxi = hessian_inv
313 H = np.dot(np.dot(xxi, scale), xxi.T)
314 return H
316#TODO: other kernels, move ?
317def weights_bartlett(nlags):
318 '''Bartlett weights for HAC
320 this will be moved to another module
322 Parameters
323 ----------
324 nlags : int
325 highest lag in the kernel window, this does not include the zero lag
327 Returns
328 -------
329 kernel : ndarray, (nlags+1,)
330 weights for Bartlett kernel
332 '''
334 #with lag zero
335 return 1 - np.arange(nlags+1)/(nlags+1.)
337def weights_uniform(nlags):
338 '''uniform weights for HAC
340 this will be moved to another module
342 Parameters
343 ----------
344 nlags : int
345 highest lag in the kernel window, this does not include the zero lag
347 Returns
348 -------
349 kernel : ndarray, (nlags+1,)
350 weights for uniform kernel
352 '''
354 #with lag zero
355 return np.ones(nlags+1)
358kernel_dict = {'bartlett': weights_bartlett,
359 'uniform': weights_uniform}
362def S_hac_simple(x, nlags=None, weights_func=weights_bartlett):
363 '''inner covariance matrix for HAC (Newey, West) sandwich
365 assumes we have a single time series with zero axis consecutive, equal
366 spaced time periods
369 Parameters
370 ----------
371 x : ndarray (nobs,) or (nobs, k_var)
372 data, for HAC this is array of x_i * u_i
373 nlags : int or None
374 highest lag to include in kernel window. If None, then
375 nlags = floor(4(T/100)^(2/9)) is used.
376 weights_func : callable
377 weights_func is called with nlags as argument to get the kernel
378 weights. default are Bartlett weights
380 Returns
381 -------
382 S : ndarray, (k_vars, k_vars)
383 inner covariance matrix for sandwich
385 Notes
386 -----
387 used by cov_hac_simple
389 options might change when other kernels besides Bartlett are available.
391 '''
393 if x.ndim == 1:
394 x = x[:,None]
395 n_periods = x.shape[0]
396 if nlags is None:
397 nlags = int(np.floor(4 * (n_periods / 100.)**(2./9.)))
399 weights = weights_func(nlags)
401 S = weights[0] * np.dot(x.T, x) #weights[0] just for completeness, is 1
403 for lag in range(1, nlags+1):
404 s = np.dot(x[lag:].T, x[:-lag])
405 S += weights[lag] * (s + s.T)
407 return S
409def S_white_simple(x):
410 '''inner covariance matrix for White heteroscedastistity sandwich
413 Parameters
414 ----------
415 x : ndarray (nobs,) or (nobs, k_var)
416 data, for HAC this is array of x_i * u_i
418 Returns
419 -------
420 S : ndarray, (k_vars, k_vars)
421 inner covariance matrix for sandwich
423 Notes
424 -----
425 this is just dot(X.T, X)
427 '''
428 if x.ndim == 1:
429 x = x[:,None]
431 return np.dot(x.T, x)
434def S_hac_groupsum(x, time, nlags=None, weights_func=weights_bartlett):
435 '''inner covariance matrix for HAC over group sums sandwich
437 This assumes we have complete equal spaced time periods.
438 The number of time periods per group need not be the same, but we need
439 at least one observation for each time period
441 For a single categorical group only, or a everything else but time
442 dimension. This first aggregates x over groups for each time period, then
443 applies HAC on the sum per period.
445 Parameters
446 ----------
447 x : ndarray (nobs,) or (nobs, k_var)
448 data, for HAC this is array of x_i * u_i
449 time : ndarray, (nobs,)
450 timeindes, assumed to be integers range(n_periods)
451 nlags : int or None
452 highest lag to include in kernel window. If None, then
453 nlags = floor[4(T/100)^(2/9)] is used.
454 weights_func : callable
455 weights_func is called with nlags as argument to get the kernel
456 weights. default are Bartlett weights
458 Returns
459 -------
460 S : ndarray, (k_vars, k_vars)
461 inner covariance matrix for sandwich
463 References
464 ----------
465 Daniel Hoechle, xtscc paper
466 Driscoll and Kraay
468 '''
469 #needs groupsums
471 x_group_sums = group_sums(x, time).T #TODO: transpose return in grou_sum
473 return S_hac_simple(x_group_sums, nlags=nlags, weights_func=weights_func)
476def S_crosssection(x, group):
477 '''inner covariance matrix for White on group sums sandwich
479 I guess for a single categorical group only,
480 categorical group, can also be the product/intersection of groups
482 This is used by cov_cluster and indirectly verified
484 '''
485 x_group_sums = group_sums(x, group).T #TODO: why transposed
487 return S_white_simple(x_group_sums)
490def cov_crosssection_0(results, group):
491 '''this one is still wrong, use cov_cluster instead'''
493 #TODO: currently used version of groupsums requires 2d resid
494 scale = S_crosssection(results.resid[:,None], group)
495 scale = np.squeeze(scale)
496 cov = _HCCM1(results, scale)
497 return cov
499def cov_cluster(results, group, use_correction=True):
500 '''cluster robust covariance matrix
502 Calculates sandwich covariance matrix for a single cluster, i.e. grouped
503 variables.
505 Parameters
506 ----------
507 results : result instance
508 result of a regression, uses results.model.exog and results.resid
509 TODO: this should use wexog instead
510 use_correction : bool
511 If true (default), then the small sample correction factor is used.
513 Returns
514 -------
515 cov : ndarray, (k_vars, k_vars)
516 cluster robust covariance matrix for parameter estimates
518 Notes
519 -----
520 same result as Stata in UCLA example and same as Peterson
522 '''
523 #TODO: currently used version of groupsums requires 2d resid
524 xu, hessian_inv = _get_sandwich_arrays(results, cov_type='clu')
526 if not hasattr(group, 'dtype') or group.dtype != np.dtype('int'):
527 clusters, group = np.unique(group, return_inverse=True)
528 else:
529 clusters = np.unique(group)
531 scale = S_crosssection(xu, group)
533 nobs, k_params = xu.shape
534 n_groups = len(clusters) #replace with stored group attributes if available
536 cov_c = _HCCM2(hessian_inv, scale)
538 if use_correction:
539 cov_c *= (n_groups / (n_groups - 1.) *
540 ((nobs-1.) / float(nobs - k_params)))
542 return cov_c
544def cov_cluster_2groups(results, group, group2=None, use_correction=True):
545 '''cluster robust covariance matrix for two groups/clusters
547 Parameters
548 ----------
549 results : result instance
550 result of a regression, uses results.model.exog and results.resid
551 TODO: this should use wexog instead
552 use_correction : bool
553 If true (default), then the small sample correction factor is used.
555 Returns
556 -------
557 cov_both : ndarray, (k_vars, k_vars)
558 cluster robust covariance matrix for parameter estimates, for both
559 clusters
560 cov_0 : ndarray, (k_vars, k_vars)
561 cluster robust covariance matrix for parameter estimates for first
562 cluster
563 cov_1 : ndarray, (k_vars, k_vars)
564 cluster robust covariance matrix for parameter estimates for second
565 cluster
567 Notes
568 -----
570 verified against Peterson's table, (4 decimal print precision)
571 '''
573 if group2 is None:
574 if group.ndim !=2 or group.shape[1] != 2:
575 raise ValueError('if group2 is not given, then groups needs to be ' +
576 'an array with two columns')
577 group0 = group[:, 0]
578 group1 = group[:, 1]
579 else:
580 group0 = group
581 group1 = group2
582 group = (group0, group1)
585 cov0 = cov_cluster(results, group0, use_correction=use_correction)
586 #[0] because we get still also returns bse
587 cov1 = cov_cluster(results, group1, use_correction=use_correction)
589 # cov of cluster formed by intersection of two groups
590 cov01 = cov_cluster(results,
591 combine_indices(group)[0],
592 use_correction=use_correction)
594 #robust cov matrix for union of groups
595 cov_both = cov0 + cov1 - cov01
597 #return all three (for now?)
598 return cov_both, cov0, cov1
601def cov_white_simple(results, use_correction=True):
602 '''
603 heteroscedasticity robust covariance matrix (White)
605 Parameters
606 ----------
607 results : result instance
608 result of a regression, uses results.model.exog and results.resid
609 TODO: this should use wexog instead
611 Returns
612 -------
613 cov : ndarray, (k_vars, k_vars)
614 heteroscedasticity robust covariance matrix for parameter estimates
616 Notes
617 -----
618 This produces the same result as cov_hc0, and does not include any small
619 sample correction.
621 verified (against LinearRegressionResults and Peterson)
623 See Also
624 --------
625 cov_hc1, cov_hc2, cov_hc3 : heteroscedasticity robust covariance matrices
626 with small sample corrections
628 '''
629 xu, hessian_inv = _get_sandwich_arrays(results)
630 sigma = S_white_simple(xu)
632 cov_w = _HCCM2(hessian_inv, sigma) #add bread to sandwich
634 if use_correction:
635 nobs, k_params = xu.shape
636 cov_w *= nobs / float(nobs - k_params)
638 return cov_w
641def cov_hac_simple(results, nlags=None, weights_func=weights_bartlett,
642 use_correction=True):
643 '''
644 heteroscedasticity and autocorrelation robust covariance matrix (Newey-West)
646 Assumes we have a single time series with zero axis consecutive, equal
647 spaced time periods
650 Parameters
651 ----------
652 results : result instance
653 result of a regression, uses results.model.exog and results.resid
654 TODO: this should use wexog instead
655 nlags : int or None
656 highest lag to include in kernel window. If None, then
657 nlags = floor[4(T/100)^(2/9)] is used.
658 weights_func : callable
659 weights_func is called with nlags as argument to get the kernel
660 weights. default are Bartlett weights
662 Returns
663 -------
664 cov : ndarray, (k_vars, k_vars)
665 HAC robust covariance matrix for parameter estimates
667 Notes
668 -----
669 verified only for nlags=0, which is just White
670 just guessing on correction factor, need reference
672 options might change when other kernels besides Bartlett are available.
674 '''
675 xu, hessian_inv = _get_sandwich_arrays(results)
676 sigma = S_hac_simple(xu, nlags=nlags, weights_func=weights_func)
678 cov_hac = _HCCM2(hessian_inv, sigma)
680 if use_correction:
681 nobs, k_params = xu.shape
682 cov_hac *= nobs / float(nobs - k_params)
684 return cov_hac
686cov_hac = cov_hac_simple #alias for users
688#---------------------- use time lags corrected for groups
689#the following were copied from a different experimental script,
690#groupidx is tuple, observations assumed to be stacked by group member and
691#sorted by time, equal number of periods is not required, but equal spacing is.
692#I think this is pure within group HAC: apply HAC to each group member
693#separately
695def lagged_groups(x, lag, groupidx):
696 '''
697 assumes sorted by time, groupidx is tuple of start and end values
698 not optimized, just to get a working version, loop over groups
699 '''
700 out0 = []
701 out_lagged = []
702 for l,u in groupidx:
703 if l+lag < u: #group is longer than lag
704 out0.append(x[l+lag:u])
705 out_lagged.append(x[l:u-lag])
707 if out0 == []:
708 raise ValueError('all groups are empty taking lags')
709 #return out0, out_lagged
710 return np.vstack(out0), np.vstack(out_lagged)
714def S_nw_panel(xw, weights, groupidx):
715 '''inner covariance matrix for HAC for panel data
717 no denominator nobs used
719 no reference for this, just accounting for time indices
720 '''
721 nlags = len(weights)-1
723 S = weights[0] * np.dot(xw.T, xw) #weights just for completeness
724 for lag in range(1, nlags+1):
725 xw0, xwlag = lagged_groups(xw, lag, groupidx)
726 s = np.dot(xw0.T, xwlag)
727 S += weights[lag] * (s + s.T)
728 return S
731def cov_nw_panel(results, nlags, groupidx, weights_func=weights_bartlett,
732 use_correction='hac'):
733 '''Panel HAC robust covariance matrix
735 Assumes we have a panel of time series with consecutive, equal spaced time
736 periods. Data is assumed to be in long format with time series of each
737 individual stacked into one array. Panel can be unbalanced.
739 Parameters
740 ----------
741 results : result instance
742 result of a regression, uses results.model.exog and results.resid
743 TODO: this should use wexog instead
744 nlags : int or None
745 Highest lag to include in kernel window. Currently, no default
746 because the optimal length will depend on the number of observations
747 per cross-sectional unit.
748 groupidx : list of tuple
749 each tuple should contain the start and end index for an individual.
750 (groupidx might change in future).
751 weights_func : callable
752 weights_func is called with nlags as argument to get the kernel
753 weights. default are Bartlett weights
754 use_correction : 'cluster' or 'hac' or False
755 If False, then no small sample correction is used.
756 If 'cluster' (default), then the same correction as in cov_cluster is
757 used.
758 If 'hac', then the same correction as in single time series, cov_hac
759 is used.
762 Returns
763 -------
764 cov : ndarray, (k_vars, k_vars)
765 HAC robust covariance matrix for parameter estimates
767 Notes
768 -----
769 For nlags=0, this is just White covariance, cov_white.
770 If kernel is uniform, `weights_uniform`, with nlags equal to the number
771 of observations per unit in a balance panel, then cov_cluster and
772 cov_hac_panel are identical.
774 Tested against STATA `newey` command with same defaults.
776 Options might change when other kernels besides Bartlett and uniform are
777 available.
779 '''
780 if nlags == 0: #so we can reproduce HC0 White
781 weights = [1, 0] #to avoid the scalar check in hac_nw
782 else:
783 weights = weights_func(nlags)
785 xu, hessian_inv = _get_sandwich_arrays(results)
787 S_hac = S_nw_panel(xu, weights, groupidx)
788 cov_hac = _HCCM2(hessian_inv, S_hac)
789 if use_correction:
790 nobs, k_params = xu.shape
791 if use_correction == 'hac':
792 cov_hac *= nobs / float(nobs - k_params)
793 elif use_correction in ['c', 'clu', 'cluster']:
794 n_groups = len(groupidx)
795 cov_hac *= n_groups / (n_groups - 1.)
796 cov_hac *= ((nobs-1.) / float(nobs - k_params))
798 return cov_hac
801def cov_nw_groupsum(results, nlags, time, weights_func=weights_bartlett,
802 use_correction=0):
803 '''Driscoll and Kraay Panel robust covariance matrix
805 Robust covariance matrix for panel data of Driscoll and Kraay.
807 Assumes we have a panel of time series where the time index is available.
808 The time index is assumed to represent equal spaced periods. At least one
809 observation per period is required.
811 Parameters
812 ----------
813 results : result instance
814 result of a regression, uses results.model.exog and results.resid
815 TODO: this should use wexog instead
816 nlags : int or None
817 Highest lag to include in kernel window. Currently, no default
818 because the optimal length will depend on the number of observations
819 per cross-sectional unit.
820 time : ndarray of int
821 this should contain the coding for the time period of each observation.
822 time periods should be integers in range(maxT) where maxT is obs of i
823 weights_func : callable
824 weights_func is called with nlags as argument to get the kernel
825 weights. default are Bartlett weights
826 use_correction : 'cluster' or 'hac' or False
827 If False, then no small sample correction is used.
828 If 'hac' (default), then the same correction as in single time series, cov_hac
829 is used.
830 If 'cluster', then the same correction as in cov_cluster is
831 used.
833 Returns
834 -------
835 cov : ndarray, (k_vars, k_vars)
836 HAC robust covariance matrix for parameter estimates
838 Notes
839 -----
840 Tested against STATA xtscc package, which uses no small sample correction
842 This first averages relevant variables for each time period over all
843 individuals/groups, and then applies the same kernel weighted averaging
844 over time as in HAC.
846 Warning:
847 In the example with a short panel (few time periods and many individuals)
848 with mainly across individual variation this estimator did not produce
849 reasonable results.
851 Options might change when other kernels besides Bartlett and uniform are
852 available.
854 References
855 ----------
856 Daniel Hoechle, xtscc paper
857 Driscoll and Kraay
859 '''
861 xu, hessian_inv = _get_sandwich_arrays(results)
863 #S_hac = S_nw_panel(xw, weights, groupidx)
864 S_hac = S_hac_groupsum(xu, time, nlags=nlags, weights_func=weights_func)
865 cov_hac = _HCCM2(hessian_inv, S_hac)
866 if use_correction:
867 nobs, k_params = xu.shape
868 if use_correction == 'hac':
869 cov_hac *= nobs / float(nobs - k_params)
870 elif use_correction in ['c', 'cluster']:
871 n_groups = len(np.unique(time))
872 cov_hac *= n_groups / (n_groups - 1.)
873 cov_hac *= ((nobs-1.) / float(nobs - k_params))
875 return cov_hac