A multivariate version of base::scale(), that takes account
of the covariance matrix of the data. By default, robust estimates are used:
the centers are removed using medians,
the scale function for univariate data is s_Qn,
and the covariance matrix for multivariate data is estimated using a robust MCD estimate.
The data are scaled using the Cholesky decomposition of
the inverse (co)variance. Then the scaled data are returned.
Details of the methods are provided by Hyndman (2026).
Arguments
- object
A vector, matrix, or data frame containing some numerical data.
- center
A function to compute the center of each numerical variable. Set to NULL if no centering is required.
- scale
A function to scale each numerical variable. When
cov = robustbase::covOGK(),scaleis passed as thesigmamuargument. Whencov = robustbase::covMcd(),scaleis passed as thescalefnargument.- cov
A function to compute the covariance matrix. Set to NULL if no rotation required.
cov()must either return the matrix directly, or a list containing a matrix namedcov.- alpha
When
cov = robustbase::covMcd(),alphacontrols the size of the subsets over which the determinant is minimized. Otherwise it is ignored. Set to 0.9 by default.- warning
Should a warning be issued if non-numeric columns are ignored?
- ...
Other arguments are passed to
cov().
Value
A vector, matrix or data frame of the same size and class as object,
but with numerical variables replaced by scaled versions (renamed if they have been rotated).
Details
Optionally, the centering and scaling can be done for each variable
separately, by setting cov = NULL, so there is no rotation of the data,
Also optionally, non-robust methods can be used by specifying center = mean,
scale = stats::sd(), and cov = stats::cov(). Any non-numeric columns are retained
with a warning.
Missing values are removed before the centers, scale and cov are estimated.
Examples
# Univariate z-scores
z <- mvscale(oldfaithful$duration, center = mean, scale = sd)
# Non-robust scaling with no rotation
oldfaithful |>
mvscale(center = mean, scale = sd, cov = NULL, warning = FALSE)
#> # A tibble: 2,097 × 4
#> time recorded_duration duration waiting
#> <dttm> <chr> <dbl> <dbl>
#> 1 2017-01-14 00:06:00 3m 16s -0.651 0.455
#> 2 2017-01-26 14:27:00 ~4m 0.291 0.290
#> 3 2017-01-27 23:57:00 2m 1s -2.26 -2.35
#> 4 2017-01-30 15:09:00 ~4m 0.291 -0.453
#> 5 2017-01-31 13:27:00 ~3.5m -0.352 -0.0404
#> 6 2017-01-31 15:00:00 ~4m 0.291 0.207
#> 7 2017-02-03 23:13:00 3m 25s -0.459 -0.618
#> 8 2017-02-04 22:14:00 3m 34s -0.266 -0.288
#> 9 2017-02-05 17:19:00 4m 0s 0.291 0.620
#> 10 2017-02-05 19:00:00 4m 2s 0.334 0.620
#> # ℹ 2,087 more rows
# Non-robust scaling with rotation
oldfaithful |>
mvscale(center = mean, scale = sd, cov = stats::cov, warning = FALSE)
#> # A tibble: 2,097 × 4
#> time recorded_duration z1 z2
#> <dttm> <chr> <dbl> <dbl>
#> 1 2017-01-14 00:06:00 3m 16s -1.65 0.455
#> 2 2017-01-26 14:27:00 ~4m 0.101 0.290
#> 3 2017-01-27 23:57:00 2m 1s -0.653 -2.35
#> 4 2017-01-30 15:09:00 ~4m 1.06 -0.453
#> 5 2017-01-31 13:27:00 ~3.5m -0.521 -0.0404
#> 6 2017-01-31 15:00:00 ~4m 0.207 0.207
#> 7 2017-02-03 23:13:00 3m 25s 0.0483 -0.618
#> 8 2017-02-04 22:14:00 3m 34s -0.0627 -0.288
#> 9 2017-02-05 17:19:00 4m 0s -0.324 0.620
#> 10 2017-02-05 19:00:00 4m 2s -0.254 0.620
#> # ℹ 2,087 more rows
# Robust scaling and rotation
oldfaithful |>
mvscale(warning = FALSE)
#> # A tibble: 2,097 × 4
#> time recorded_duration z1 z2
#> <dttm> <chr> <dbl> <dbl>
#> 1 2017-01-14 00:06:00 3m 16s -2.34 0.394
#> 2 2017-01-26 14:27:00 ~4m -0.0524 0.131
#> 3 2017-01-27 23:57:00 2m 1s -4.28 -4.07
#> 4 2017-01-30 15:09:00 ~4m 0.419 -1.05
#> 5 2017-01-31 13:27:00 ~3.5m -1.33 -0.394
#> 6 2017-01-31 15:00:00 ~4m 0 0
#> 7 2017-02-03 23:13:00 3m 25s -1.21 -1.31
#> 8 2017-02-04 22:14:00 3m 34s -0.976 -0.787
#> 9 2017-02-05 17:19:00 4m 0s -0.262 0.656
#> 10 2017-02-05 19:00:00 4m 2s -0.163 0.656
#> # ℹ 2,087 more rows