Skip to contents

A multivariate version of base::scale(), that takes account of the covariance matrix of the data, and uses robust estimates of center, scale and covariance by default. The centers are removed using medians, the scale function is the IQR, and the covariance matrix is estimated using a robust OGK estimate. The data are scaled using the Cholesky decomposition of the inverse covariance. Then the scaled data are returned. This is useful for computing pairwise Mahalanobis distances.

Usage

mvscale(
  object,
  center = stats::median,
  scale = robustbase::s_IQR,
  cov = robustbase::covOGK,
  warning = TRUE
)

Arguments

object

A vector, matrix, or data frame containing some numerical data.

center

A function to compute the center of each numerical variable. Set to NULL if no centering is required.

scale

A function to scale each numerical variable. When cov = robustbase::covOGK, it is passed as the sigmamu argument.

cov

A function to compute the covariance matrix. Set to NULL if no rotation required.

warning

Should a warning be issued if non-numeric columns are ignored?

Value

A vector, matrix or data frame of the same size and class as object, but with numerical variables replaced by scaled versions.

Details

Optionally, the centering and scaling can be done for each variable separately, so there is no rotation of the data, by setting cov = NULL. Also optionally, non-robust methods can be used by specifying center = mean, scale = stats::sd, and cov = stats::cov. Any non-numeric columns are retained with a warning.

Author

Rob J Hyndman

Examples

# Univariate z-scores (no rotation)
mvscale(oldfaithful, center = mean, scale = sd, cov = NULL, warning = FALSE)
#> # A tibble: 2,261 × 3
#>    time                duration waiting
#>    <dttm>                 <dbl>   <dbl>
#>  1 2015-01-02 14:53:00   0.261  -0.258 
#>  2 2015-01-09 23:55:00   0.104  -0.0337
#>  3 2015-02-07 00:49:00  -0.185  -0.166 
#>  4 2015-02-14 01:09:00  -0.237  -0.218 
#>  5 2015-02-21 01:12:00  -0.139  -0.179 
#>  6 2015-02-28 01:11:00  -0.303  -0.153 
#>  7 2015-03-07 00:50:00  -0.467  -0.205 
#>  8 2015-03-13 21:57:00  -0.0340 -0.0469
#>  9 2015-03-13 23:37:00  -0.270  -0.192 
#> 10 2015-03-20 22:26:00  -0.847  -0.496 
#> # ℹ 2,251 more rows
# Non-robust scaling with rotation
mvscale(oldfaithful, center = mean, cov = stats::cov, warning = FALSE)
#> # A tibble: 2,261 × 3
#>    time                     z1      z2
#>    <dttm>                <dbl>   <dbl>
#>  1 2015-01-02 14:53:00  0.266  -0.258 
#>  2 2015-01-09 23:55:00  0.104  -0.0337
#>  3 2015-02-07 00:49:00 -0.182  -0.166 
#>  4 2015-02-14 01:09:00 -0.234  -0.218 
#>  5 2015-02-21 01:12:00 -0.136  -0.179 
#>  6 2015-02-28 01:11:00 -0.300  -0.153 
#>  7 2015-03-07 00:50:00 -0.463  -0.205 
#>  8 2015-03-13 21:57:00 -0.0332 -0.0469
#>  9 2015-03-13 23:37:00 -0.267  -0.192 
#> 10 2015-03-20 22:26:00 -0.839  -0.496 
#> # ℹ 2,251 more rows
mvscale(oldfaithful, warning = FALSE)
#> # A tibble: 2,261 × 3
#>    time                    z1     z2
#>    <dttm>               <dbl>  <dbl>
#>  1 2015-01-02 14:53:00  1.91  -1.42 
#>  2 2015-01-09 23:55:00  0.149  0.777
#>  3 2015-02-07 00:49:00 -1.71  -0.518
#>  4 2015-02-14 01:09:00 -1.97  -1.03 
#>  5 2015-02-21 01:12:00 -1.33  -0.645
#>  6 2015-02-28 01:11:00 -2.63  -0.388
#>  7 2015-03-07 00:50:00 -3.74  -0.904
#>  8 2015-03-13 21:57:00 -0.862  0.647
#>  9 2015-03-13 23:37:00 -2.29  -0.775
#> 10 2015-03-20 22:26:00 -5.90  -3.75 
#> # ℹ 2,251 more rows
# Robust Mahalanobis distances
oldfaithful |>
  select(-time) |>
  mvscale() |>
  head(5) |>
  dist()
#>           1         2         3         4
#> 2 2.8170543                              
#> 3 3.7249234 2.2623734                    
#> 4 3.8979507 2.7884156 0.5800668          
#> 5 3.3248738 2.0486310 0.4013823 0.7538288