A multivariate version of base::scale()
, that takes account
of the covariance matrix of the data, and uses robust estimates
of center, scale and covariance by default. The centers are removed using medians, the
scale function is the IQR, and the covariance matrix is estimated using a
robust OGK estimate. The data are scaled using the Cholesky decomposition of
the inverse covariance. Then the scaled data are returned. This is useful for
computing pairwise Mahalanobis distances.
Arguments
- object
A vector, matrix, or data frame containing some numerical data.
- center
A function to compute the center of each numerical variable. Set to NULL if no centering is required.
- scale
A function to scale each numerical variable. When
cov = robustbase::covOGK
, it is passed as thesigmamu
argument.- cov
A function to compute the covariance matrix. Set to NULL if no rotation required.
- warning
Should a warning be issued if non-numeric columns are ignored?
Value
A vector, matrix or data frame of the same size and class as object
,
but with numerical variables replaced by scaled versions.
Details
Optionally, the centering and scaling can be done for each variable
separately, so there is no rotation of the data, by setting cov = NULL
.
Also optionally, non-robust methods can be used by specifying center = mean
,
scale = stats::sd
, and cov = stats::cov
. Any non-numeric columns are retained
with a warning.
Examples
# Univariate z-scores (no rotation)
mvscale(oldfaithful, center = mean, scale = sd, cov = NULL, warning = FALSE)
#> # A tibble: 2,261 × 3
#> time duration waiting
#> <dttm> <dbl> <dbl>
#> 1 2015-01-02 14:53:00 0.261 -0.258
#> 2 2015-01-09 23:55:00 0.104 -0.0337
#> 3 2015-02-07 00:49:00 -0.185 -0.166
#> 4 2015-02-14 01:09:00 -0.237 -0.218
#> 5 2015-02-21 01:12:00 -0.139 -0.179
#> 6 2015-02-28 01:11:00 -0.303 -0.153
#> 7 2015-03-07 00:50:00 -0.467 -0.205
#> 8 2015-03-13 21:57:00 -0.0340 -0.0469
#> 9 2015-03-13 23:37:00 -0.270 -0.192
#> 10 2015-03-20 22:26:00 -0.847 -0.496
#> # ℹ 2,251 more rows
# Non-robust scaling with rotation
mvscale(oldfaithful, center = mean, cov = stats::cov, warning = FALSE)
#> # A tibble: 2,261 × 3
#> time z1 z2
#> <dttm> <dbl> <dbl>
#> 1 2015-01-02 14:53:00 0.266 -0.258
#> 2 2015-01-09 23:55:00 0.104 -0.0337
#> 3 2015-02-07 00:49:00 -0.182 -0.166
#> 4 2015-02-14 01:09:00 -0.234 -0.218
#> 5 2015-02-21 01:12:00 -0.136 -0.179
#> 6 2015-02-28 01:11:00 -0.300 -0.153
#> 7 2015-03-07 00:50:00 -0.463 -0.205
#> 8 2015-03-13 21:57:00 -0.0332 -0.0469
#> 9 2015-03-13 23:37:00 -0.267 -0.192
#> 10 2015-03-20 22:26:00 -0.839 -0.496
#> # ℹ 2,251 more rows
mvscale(oldfaithful, warning = FALSE)
#> # A tibble: 2,261 × 3
#> time z1 z2
#> <dttm> <dbl> <dbl>
#> 1 2015-01-02 14:53:00 1.93 -1.25
#> 2 2015-01-09 23:55:00 0.0615 0.684
#> 3 2015-02-07 00:49:00 -1.55 -0.456
#> 4 2015-02-14 01:09:00 -1.74 -0.910
#> 5 2015-02-21 01:12:00 -1.18 -0.568
#> 6 2015-02-28 01:11:00 -2.43 -0.342
#> 7 2015-03-07 00:50:00 -3.42 -0.796
#> 8 2015-03-13 21:57:00 -0.873 0.570
#> 9 2015-03-13 23:37:00 -2.07 -0.682
#> 10 2015-03-20 22:26:00 -5.15 -3.30
#> # ℹ 2,251 more rows
# Robust Mahalanobis distances
oldfaithful |>
select(-time) |>
mvscale() |>
head(5) |>
dist()
#> 1 2 3 4
#> 2 2.6919702
#> 3 3.5671576 1.9718769
#> 4 3.6897826 2.4089549 0.4950822
#> 5 3.1820624 1.7618897 0.3861243 0.6617578