Introduction to the compenginets package
Yangzhuoran Fin Yang
2024-02-28
Source:vignettes/compenginets.Rmd
compenginets.Rmd
The goal of compenginets is to provide all the time series from https://www.comp-engine.org/
Installation
compenginets
currently isn’t on CRAN. You can install
the development version from Github
# install.packages("devtools")
devtools::install_github("robjhyndman/compenginets")
CompEngine: A self-organizing database of time-series data
CompEngine is an online time-series database which allow users to upload and interactively compare data with similar time series data set. The website was build by Nick Jones and Ben Fulcher, based on the early work conducted by Ben D. Fulcher, Max A. Little, and Nick S. Jones (2013). To provide time series similar to the data user uploads, it compute features of the data, and find existing time series matching those features. The features list and detailed description can be found on the page of CompEngine.
Usage
This package intends to provide means to access data from CompEngine
with ease with an R
solution. Function
get_cets
can be used to return time series with a specified
name or within a certain category. By default, get_cets
returns the first 10 pages (maximum 10 in one page) time series within
the category which matches argument key
.
library(compenginets)
# Get series within Finance category (including subcategory)
cets_finance <- get_cets("finance")
length(cets_finance)
#> [1] 100
str(cets_finance[[1]])
#> Time-Series [1:4197] from 1 to 4197: 6890 6783 6553 6453 6508 ...
#> - attr(*, "name")= chr "M4_D3333_Finance_1"
#> - attr(*, "description")= chr ""
#> - attr(*, "samplingInformation.name")= chr "M4_D3333_Finance_1"
#> - attr(*, "samplingInformation.description")= chr ""
#> - attr(*, "samplingInformation.samplingInformation")='data.frame': 1 obs. of 2 variables:
#> ..$ samplingRate: chr "1.00"
#> ..$ samplingUnit: chr "/day"
#> - attr(*, "tags")= chr [1:3] "finance" "M4" "Daily"
#> - attr(*, "category.name")= chr "Finance"
#> - attr(*, "category.uri")= chr "real/finance/"
#> - attr(*, "sfi.name")= chr [1:22] "DN_HistogramMode_5" "DN_HistogramMode_10" "CO_Embed2_Dist_tau_d_expfit_meandiff" "CO_f1ecac" ...
#> - attr(*, "sfi.prettyName")= chr [1:22] "DN_HistogramMode_5" "DN_HistogramMode_10" "CO_Embed2_Dist_tau_d_expfit_meandiff" "CO_f1ecac" ...
#> - attr(*, "sfi.value")= num [1:22] 36 56.9 72.9 56.8 50 ...
#> - attr(*, "source")= chr NA
# Supply the number of pages need with option maxpage
# A maximum of 10 time series are in one page
cets_finance_20 <- get_cets("finance", maxpage = 2)
length(cets_finance_20)
#> [1] 20
# Switch category to FALSE to get the time series matching a name
W138_finance_m4 <- get_cets("M4_W138_Finance_1", category = FALSE)
str(W138_finance_m4)
#> Time-Series [1:1044] from 1 to 1044: 2062 2086 2026 2076 2077 ...
#> - attr(*, "name")= chr "M4_W138_Finance_1"
#> - attr(*, "description")= chr ""
#> - attr(*, "samplingInformation.samplingRate")= chr "1.00"
#> - attr(*, "samplingInformation.samplingUnit")= chr "/week"
#> - attr(*, "tags")= chr [1:3] "finance" "M4" "weekly"
#> - attr(*, "category.name")= chr "Finance"
#> - attr(*, "category.uri")= chr "real/finance/"
#> - attr(*, "sfi.name")= chr [1:22] "DN_HistogramMode_5" "DN_HistogramMode_10" "CO_Embed2_Dist_tau_d_expfit_meandiff" "CO_f1ecac" ...
#> - attr(*, "sfi.prettyName")= chr [1:22] "DN_HistogramMode_5" "DN_HistogramMode_10" "CO_Embed2_Dist_tau_d_expfit_meandiff" "CO_f1ecac" ...
#> - attr(*, "sfi.value")= num [1:22] 84 86.8 65.2 14.1 23.8 ...
#> - attr(*, "source")= logi NA
A list of category can be obtained externally.
cate_path <- category_scraping()
str(cate_path, list.len = 10)
#> List of 195
#> $ real : chr [1:55] "real" "audio" "ecology" "economics" ...
#> $ synthetic : chr [1:139] "synthetic" "flow" "iterative map" "periodic" ...
#> $ unassigned : chr "unassigned"
#> $ audio : chr [1:7] "audio" "animal sounds" "human speech" "music" ...
#> $ ecology : chr [1:2] "ecology" "zooplankton growth"
#> $ economics : chr "economics"
#> $ finance : chr [1:8] "finance" "crude oil prices" "exchange rate" "gas prices" ...
#> $ industry : chr "industry"
#> $ medical : chr [1:10] "medical" "boc" "chest volume" "ecg" ...
#> $ meteorology : chr [1:14] "meteorology" "air pressure" "air temperature" "carbon dioxide" ...
#> [list output truncated]