Functions to smooth demographic data — smooth

These smoothing functions allow smoothing of a variable in a vital object. The vital object is returned along with some additional columns containing information about the smoothed variable: usually .smooth containing the smoothed values, and .smooth_se containing the corresponding standard errors.

Usage

smooth_spline(.data, .var, age_spacing = 1, k = -1)

smooth_mortality(.data, .var, age_spacing = 1, b = 65, power = 0.4, k = 30)

smooth_fertility(.data, .var, age_spacing = 1, lambda = 1e-10)

smooth_loess(.data, .var, age_spacing = 1, span = 0.2)

Arguments

.data: A vital object
.var: name of variable to smooth
age_spacing: Spacing between ages for smoothed vital. Default is 1.
k: Number of knots to use for penalized regression spline estimate.
b: Lower age for monotonicity. Above this, the smooth curve is assumed to be monotonically increasing.
power: Power transformation for age variable before smoothing. Default is 0.4 (for mortality data).
lambda: Penalty for constrained regression spline.
span: Span for loess smooth.

Value

vital with added columns containing smoothed values and their standard errors

Details

smooth_mortality() use penalized regression splines applied to log mortality with a monotonicity constraint above age b. The methodology is based on Wood (1994). smooth_fertility() uses weighted regression B-splines with a concavity constraint, based on He and Ng (1999). The function smooth_loess() uses locally quadratic regression, while smooth_spline() uses penalized regression splines.

References

Hyndman, R.J., and Ullah, S. (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Computational Statistics & Data Analysis, 51, 4942-4956. https://robjhyndman.com/publications/funcfor/

Author

Rob J Hyndman

Examples

library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
aus_mortality |>
  filter(State == "Victoria", Sex == "female", Year > 2000) |>
  smooth_mortality(Mortality)
#> # A vital: 2,020 x 10 [1Y]
#> # Key:     Age x (Sex, State, Code) [101 x 1]
#>     Year   Age Sex    State   Code  Mortality Exposure Deaths .smooth .smooth_se
#>    <int> <dbl> <chr>  <chr>   <chr>     <dbl>    <dbl>  <dbl> <dbl[1>  <dbl[1d]>
#>  1  2001     0 female Victor… VIC   0.00404     29229. 118.   4.02e-3  0.000347 
#>  2  2001     1 female Victor… VIC   0.000405    29654.  12    3.90e-4  0.0000789
#>  3  2001     2 female Victor… VIC   0.000201    29832.   6    2.16e-4  0.0000404
#>  4  2001     3 female Victor… VIC   0.000134    29859.   4.01 1.54e-4  0.0000287
#>  5  2001     4 female Victor… VIC   0.000165    30328.   5.01 1.25e-4  0.0000233
#>  6  2001     5 female Victor… VIC   0.0000652   30698.   2    1.09e-4  0.0000204
#>  7  2001     6 female Victor… VIC   0.0000959   31286.   3    1.02e-4  0.0000189
#>  8  2001     7 female Victor… VIC   0.0000945   31748.   3    9.95e-5  0.0000181
#>  9  2001     8 female Victor… VIC   0.0000943   31810.   3    1.01e-4  0.0000178
#> 10  2001     9 female Victor… VIC   0.0000627   31893.   2    1.05e-4  0.0000180
#> # ℹ 2,010 more rows
aus_fertility |>
  filter(Year > 2000) |>
  smooth_fertility(Fertility)
#> # A vital: 210 x 7 [1Y]
#> # Key:     Age [35 x 1]
#>     Year   Age Fertility Exposure Births .smooth .smooth_se
#>    <int> <dbl>     <dbl>    <dbl>  <dbl>   <dbl>      <dbl>
#>  1  2001    15   0.00320   132027   423. 0.00320    0.00333
#>  2  2001    16   0.00728   133096   969. 0.00755    0.00709
#>  3  2001    17   0.0158    131433  2075. 0.0158     0.0133 
#>  4  2001    18   0.0249    133123  3313. 0.0260     0.0196 
#>  5  2001    19   0.0372    132398  4931. 0.0357     0.0239 
#>  6  2001    20   0.0435    131377  5721. 0.0435     0.0258 
#>  7  2001    21   0.0485    127985  6202. 0.0499     0.0261 
#>  8  2001    22   0.0581    126901  7373. 0.0572     0.0262 
#>  9  2001    23   0.0656    127134  8336. 0.0656     0.0263 
#> 10  2001    24   0.0749    128239  9599. 0.0749     0.0264 
#> # ℹ 200 more rows