Title: | Latent Dirichlet Allocation Coupled with Time Series Analyses |
---|---|
Description: | Combines Latent Dirichlet Allocation (LDA) and Bayesian multinomial time series methods in a two-stage analysis to quantify dynamics in high-dimensional temporal data. LDA decomposes multivariate data into lower-dimension latent groupings, whose relative proportions are modeled using generalized Bayesian time series models that include abrupt changepoints and smooth dynamics. The methods are described in Blei et al. (2003) <doi:10.1162/jmlr.2003.3.4-5.993>, Western and Kleykamp (2004) <doi:10.1093/pan/mph023>, Venables and Ripley (2002, ISBN-13:978-0387954578), and Christensen et al. (2018) <doi:10.1002/ecy.2373>. |
Authors: | Juniper L. Simonis [aut, cre] |
Maintainer: | Juniper L. Simonis <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.7 |
Built: | 2025-03-03 06:05:45 UTC |
Source: | https://github.com/weecology/ldats |
Calculate the small sample size correction of
AIC
for the input object.
AICc(object)
AICc(object)
object |
numeric
value of AICc.
dat <- data.frame(y = rnorm(50), x = rnorm(50)) mod <- lm(dat) AICc(mod)
dat <- data.frame(y = rnorm(50), x = rnorm(50)) mod <- lm(dat) AICc(mod)
Produce a vanilla ACF plot using acf
for
the parameter of interest (rho or eta) as part of
TS_diagnostics_plot
.
autocorr_plot(x)
autocorr_plot(x)
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
NULL
.
autocorr_plot(rnorm(100, 0, 1))
autocorr_plot(rnorm(100, 0, 1))
Check that the change point locations are numeric
and conformable to interger
values.
check_changepoints(changepoints = NULL)
check_changepoints(changepoints = NULL)
changepoints |
Change point locations to evaluate. |
An error message is thrown if changepoints
are not proper,
else NULL
.
check_changepoints(100)
check_changepoints(100)
Check that a list of controls is of the right class.
check_control(control, eclass = "list")
check_control(control, eclass = "list")
control |
Control list to evaluate. |
eclass |
Expected class of the list to be evaluated. |
an error message is thrown if the input is improper, otherwise
NULL
.
check_control(list())
check_control(list())
Check that the table of document-level covariates is conformable to a data frame and of the right size (correct number of documents) for the document-topic output from the LDA models.
check_document_covariate_table(document_covariate_table, LDA_models = NULL, document_term_table = NULL)
check_document_covariate_table(document_covariate_table, LDA_models = NULL, document_term_table = NULL)
document_covariate_table |
Document covariate table to evaluate. |
LDA_models |
Reference LDA model list (class |
document_term_table |
Optional input for checking when
|
An error message is thrown if document_covariate_table
is
not proper, else NULL
.
data(rodents) check_document_covariate_table(rodents$document_covariate_table)
data(rodents) check_document_covariate_table(rodents$document_covariate_table)
Check that the table of observations is conformable to a matrix of integers.
check_document_term_table(document_term_table)
check_document_term_table(document_term_table)
document_term_table |
Table of observation count data (rows:
documents, columns: terms. May be a class |
an error message is thrown if the input is improper, otherwise
NULL
.
data(rodents) check_document_term_table(rodents$document_term_table)
data(rodents) check_document_term_table(rodents$document_term_table)
Check that formula
is actually a
formula
and that the
response and predictor variables are all included in data
.
check_formula(data, formula)
check_formula(data, formula)
data |
|
formula |
|
An error message is thrown if formula
is not proper,
else NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma check_formula(data, gamma ~ 1)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma check_formula(data, gamma ~ 1)
Check that the vector of formulas is actually formatted
as a vector of formula
objects and that the
predictor variables are all included in the document covariate table.
check_formulas(formulas, document_covariate_table, control = list())
check_formulas(formulas, document_covariate_table, control = list())
formulas |
Vector of the formulas to evaluate. |
document_covariate_table |
Document covariate table used to evaluate the availability of the data required by the formula inputs. |
control |
A |
An error message is thrown if formulas
is
not proper, else NULL
.
data(rodents) check_formulas(~ 1, rodents$document_covariate_table)
data(rodents) check_formulas(~ 1, rodents$document_covariate_table)
Check that the LDA_models
input is either a set of
LDA models (class LDA_set
, produced by
LDA_set
) or a singular LDA model (class LDA
,
produced by LDA
).
check_LDA_models(LDA_models)
check_LDA_models(LDA_models)
LDA_models |
List of LDA models or singular LDA model to evaluate. |
An error message is thrown if LDA_models
is not proper,
else NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2, nseeds = 1) LDA_models <- select_LDA(LDAs) check_LDA_models(LDA_models)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2, nseeds = 1) LDA_models <- select_LDA(LDAs) check_LDA_models(LDA_models)
Check that the vector of numbers of changepoints is conformable to integers greater than 1.
check_nchangepoints(nchangepoints)
check_nchangepoints(nchangepoints)
nchangepoints |
Vector of the number of changepoints to evaluate. |
An error message is thrown if nchangepoints
is not proper,
else NULL
.
check_nchangepoints(0) check_nchangepoints(2)
check_nchangepoints(0) check_nchangepoints(2)
Check that the vector of numbers of seeds is conformable to integers greater than 0.
check_seeds(nseeds)
check_seeds(nseeds)
nseeds |
|
an error message is thrown if the input is improper, otherwise
NULL
.
check_seeds(1) check_seeds(2)
check_seeds(1) check_seeds(2)
Check that the vector of time values is included in the
document covariate table and that it is either a integer-conformable or
a date
. If it is a date
, the input is converted to an
integer, resulting in the timestep being 1 day, which is often not
desired behavior.
check_timename(document_covariate_table, timename)
check_timename(document_covariate_table, timename)
document_covariate_table |
Document covariate table used to query for the time column. |
timename |
Column name for the time variable to evaluate. |
An error message is thrown if timename
is
not proper, else NULL
.
data(rodents) check_timename(rodents$document_covariate_table, "newmoon")
data(rodents) check_timename(rodents$document_covariate_table, "newmoon")
Check that the vector of numbers of topics is conformable to integers greater than 1.
check_topics(topics)
check_topics(topics)
topics |
Vector of the number of topics to evaluate for each model.
Must be conformable to |
an error message is thrown if the input is improper, otherwise
NULL
.
check_topics(2)
check_topics(2)
Check that the vector of document weights is numeric and positive and inform the user if the average weight isn't 1.
check_weights(weights)
check_weights(weights)
weights |
Vector of the document weights to evaluate, or |
An error message is thrown if weights
is not proper,
else NULL
.
check_weights(1) wts <- runif(100, 0.1, 100) check_weights(wts) wts2 <- wts / mean(wts) check_weights(wts2) check_weights(TRUE)
check_weights(1) wts <- runif(100, 0.1, 100) check_weights(wts) wts2 <- wts / mean(wts) check_weights(wts2) check_weights(TRUE)
Count the full trips (from one extreme temperature chain to
the other and back again; Katzgraber et al. 2006) for each of the
ptMCMC particles, as identified by their id on initialization.
This function was designed to work within TS
and process
the output of est_changepoints
as a component of
diagnose_ptMCMC
, but has been generalized
and would work with any output from a ptMCMC as long as ids
is formatted properly.
count_trips(ids)
count_trips(ids)
ids |
|
list
of [1] vector
of within particle trip counts
($trip_counts
), and [2] vector
of within-particle average
trip rates ($trip_rates
).
Katzgraber, H. G., S. Trebst, D. A. Huse. And M. Troyer. 2006. Feedback-optimized parallel tempering Monte Carlo. Journal of Statistical Mechanics: Theory and Experiment 3:P03018 link.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) count_trips(rho_dist$ids)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) count_trips(rho_dist$ids)
Summarize the step and swap acceptance rates as well as trip metrics from the saved output of a ptMCMC estimation.
diagnose_ptMCMC(ptMCMCout)
diagnose_ptMCMC(ptMCMCout)
ptMCMCout |
Named |
Within-chain step acceptance rates are averaged for each of the
chains from the raw step acceptance histories
(ptMCMCout$step_accepts
) and between-chain swap acceptance rates
are similarly averaged for each of the neighboring pairs of chains from
the raw swap acceptance histories (ptMCMCout$swap_accepts
).
Trips are defined as movement from one extreme chain to the other and
back again (Katzgraber et al. 2006). Trips are counted and turned
to per-iteration rates using count_trips
.
This function was first designed to work within TS
and
process the output of est_changepoints
, but has been
generalized and would work with any output from a ptMCMC as long as
ptMCMCout
is formatted properly.
list
of [1] within-chain average step acceptance rates
($step_acceptance_rate
), [2] average between-chain swap acceptance
rates ($swap_acceptance_rate
), [3] within particle trip counts
($trip_counts
), and [4] within-particle average trip rates
($trip_rates
).
Katzgraber, H. G., S. Trebst, D. A. Huse. And M. Troyer. 2006. Feedback-optimized parallel tempering Monte Carlo. Journal of Statistical Mechanics: Theory and Experiment 3:P03018 link.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) diagnose_ptMCMC(rho_dist)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) diagnose_ptMCMC(rho_dist)
Simple calculation of document weights based on the average number of words in a document within the corpus (mean value = 1).
document_weights(document_term_table)
document_weights(document_term_table)
document_term_table |
Table of observation count data (rows:
documents, columns: terms. May be a class |
Vector of weights, one for each document, with the average sample receiving a weight of 1.0.
data(rodents) document_weights(rodents$document_term_table)
data(rodents) document_weights(rodents$document_term_table)
Produce a vanilla ECDF (empirical cumulative distribution
function) plot using ecdf
for the parameter of interest (rho or
eta) as part of TS_diagnostics_plot
. A horizontal line
is added to show the median of the posterior.
ecdf_plot(x, xlab = "parameter value")
ecdf_plot(x, xlab = "parameter value")
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
xlab |
|
NULL
.
ecdf_plot(rnorm(100, 0, 1))
ecdf_plot(rnorm(100, 0, 1))
This function executes ptMCMC-based estimation of the change point location distributions for multinomial Time Series analyses.
est_changepoints(data, formula, nchangepoints, timename, weights, control = list())
est_changepoints(data, formula, nchangepoints, timename, weights, control = list())
data |
|
formula |
|
nchangepoints |
|
timename |
|
weights |
Optional class |
control |
A |
List of saved data objects from the ptMCMC estimation of
change point locations (unless nchangepoints
is 0, then
NULL
is returned).
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) formula <- gamma ~ 1 nchangepoints <- 1 control <- TS_control() data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon", weights, control)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) formula <- gamma ~ 1 nchangepoints <- 1 control <- TS_control() data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon", weights, control)
This function uses the marginal posterior distributions of
the change point locations (estimated by est_changepoints
)
in combination with the conditional (on the change point locations)
posterior distributions of the regressors (estimated by
multinom_TS
) to estimate the marginal posterior
distribution of the regressors, unconditional on the change point
locations.
est_regressors(rho_dist, data, formula, timename, weights, control = list())
est_regressors(rho_dist, data, formula, timename, weights, control = list())
rho_dist |
List of saved data objects from the ptMCMC estimation of
change point locations (unless |
data |
|
formula |
|
timename |
|
weights |
Optional class |
control |
A |
The general approach follows that of Western and Kleykamp
(2004), although we note some important differences. Our regression
models are fit independently for each chunk (segment of time), and
therefore the variance-covariance matrix for the full model
has 0
entries for covariances between regressors in different
chunks of the time series. Further, because the regression model here
is a standard (non-hierarchical) softmax (Ripley 1996, Venables and
Ripley 2002, Bishop 2006), there is no error term in the regression
(as there is in the normal model used by Western and Kleykamp 2004),
and so the posterior distribution used here is a multivariate normal,
as opposed to a multivariate t, as used by Western and Kleykamp (2004).
matrix
of draws (rows) from the marginal posteriors of the
coefficients across the segments (columns).
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY, USA.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) formula <- gamma ~ 1 nchangepoints <- 1 control <- TS_control() data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon", weights, control) eta_dist <- est_regressors(rho_dist, data, formula, "newmoon", weights, control)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) formula <- gamma ~ 1 nchangepoints <- 1 control <- TS_control() data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon", weights, control) eta_dist <- est_regressors(rho_dist, data, formula, "newmoon", weights, control)
Expand the completely crossed combination of model inputs: LDA model results, formulas, and number of change points.
expand_TS(LDA_models, formulas, nchangepoints)
expand_TS(LDA_models, formulas, nchangepoints)
LDA_models |
List of LDA models (class |
formulas |
Vector of |
nchangepoints |
Vector of |
Expanded data.frame
table of the three values (columns) for
each unique model run (rows): [1] the LDA model (indicated
as a numeric element reference to the LDA_models
object), [2] the
regressor formula, and [3] the number of changepoints.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) nchangepoints <- 0:1 expand_TS(LDA_models, formulas, nchangepoints)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) nchangepoints <- 0:1 expand_TS(LDA_models, formulas, nchangepoints)
If the focal input is TRUE
, replace it with
alternative.
iftrue(x = TRUE, alt = NULL)
iftrue(x = TRUE, alt = NULL)
x |
Focal input. |
alt |
Alternative value. |
x
if not TRUE
, alt
otherwise.
iftrue() iftrue(TRUE, 1) iftrue(2, 1) iftrue(FALSE, 1)
iftrue() iftrue(TRUE, 1) iftrue(2, 1) iftrue(FALSE, 1)
Counts of 17 rodent species across 24 sampling events, with the count being the total number observed across three trapping webs (146 traps in total) (Lightfoot et al. 2012).
jornada
jornada
A list
of two data.frame
-class objects with rows
corresponding to documents (sampling events). One element is the
document term table (called document_term_table
), which contains
counts of the species (terms) in each sample (document), and the other is
the document covariate table (called document_covariate_table
)
with columns of covariates (time step, year, season).
https://jornada.nmsu.edu/lter/dataset/49798/view
Lightfoot, D. C., A. D. Davidson, D. G. Parker, L. Hernandez, and J. W. Laundre. 2012. Bottom-up regulation of desert grassland and shrubland rodent communities: implications of species-specific reproductive potentials. Journal of Mammalogy 93:1017-1028. link.
Produce and print the message for a given LDA model.
LDA_msg(mod_topics, mod_seeds, control = list())
LDA_msg(mod_topics, mod_seeds, control = list())
mod_topics |
|
mod_seeds |
|
control |
Class |
LDA_msg(mod_topics = 4, mod_seeds = 2)
LDA_msg(mod_topics = 4, mod_seeds = 2)
For a given dataset consisting of counts of words across
multiple documents in a corpus, conduct multiple Latent Dirichlet
Allocation (LDA) models (using the Variational Expectation
Maximization (VEM) algorithm; Blei et al. 2003) to account for [1]
uncertainty in the number of latent topics and [2] the impact of initial
values in the estimation procedure. LDA_set
is a list wrapper of LDA
in the topicmodels
package (Grun and Hornik 2011). check_LDA_set_inputs
checks that all of the inputs
are proper for LDA_set
(that the table of observations is
conformable to a matrix of integers, the number of topics is an integer,
the number of seeds is an integer and the controls list is proper).
LDA_set(document_term_table, topics = 2, nseeds = 1, control = list()) check_LDA_set_inputs(document_term_table, topics, nseeds, control)
LDA_set(document_term_table, topics = 2, nseeds = 1, control = list()) check_LDA_set_inputs(document_term_table, topics, nseeds, control)
document_term_table |
Table of observation count data (rows:
documents, columns: terms. May be a class |
topics |
Vector of the number of topics to evaluate for each model.
Must be conformable to |
nseeds |
Number of seeds (replicate starts) to use for each
value of |
control |
A |
LDA_set
: list
(class: LDA_set
) of LDA models
(class: LDA_VEM
).
check_LDA_set_inputs
: an error message is thrown if any input is
improper, otherwise NULL
.
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)
This function provides a simple creation and definition of
the list used to control the set of LDA models. It is set up to be easy
to work with the existing control capacity of
LDA
.
LDA_set_control(quiet = FALSE, measurer = AIC, selector = min, iseed = 2, ...)
LDA_set_control(quiet = FALSE, measurer = AIC, selector = min, iseed = 2, ...)
quiet |
|
measurer , selector
|
Function names for use in evaluation of the LDA
models. |
iseed |
|
... |
Additional arguments to be passed to
|
list
for controlling the LDA model fit.
LDA_set_control()
LDA_set_control()
Conduct a complete LDATS analysis (Christensen
et al. 2018), including running a suite of Latent Dirichlet
Allocation (LDA) models (Blei et al. 2003, Grun and Hornik 2011)
via LDA_set
, selecting LDA model(s) via
select_LDA
, running a complete set of Bayesian Time Series
(TS) models (Western and Kleykamp 2004) via TS_on_LDA
on
the chosen LDA model(s), and selecting the best TS model via
select_TS
. conform_LDA_TS_data
converts the data
input to
match internal and sub-function specifications. check_LDA_TS_inputs
checks that the inputs to
LDA_TS
are of proper classes for a full analysis.
LDA_TS(data, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 0, timename = "time", weights = TRUE, control = list()) conform_LDA_TS_data(data, quiet = FALSE) check_LDA_TS_inputs(data = NULL, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 0, timename = "time", weights = TRUE, control = list())
LDA_TS(data, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 0, timename = "time", weights = TRUE, control = list()) conform_LDA_TS_data(data, quiet = FALSE) check_LDA_TS_inputs(data = NULL, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 0, timename = "time", weights = TRUE, control = list())
data |
Either a document term table or a list including at least
a document term table (with the word "term" in the name of the element)
and optionally also a document covariate table (with the word
"covariate" in the name of the element).
|
topics |
Vector of the number of topics to evaluate for each model.
Must be conformable to |
nseeds |
|
formulas |
Vector of |
nchangepoints |
Vector of |
timename |
|
weights |
Optional input for overriding standard weighting for
documents in the time series. Defaults to |
control |
A |
quiet |
|
LDA_TS
: a class LDA_TS
list object including all
fitted LDA and TS models and selected models specifically as elements
"LDA models"
(from LDA_set
),
"Selected LDA model"
(from select_LDA
),
"TS models"
(from TS_on_LDA
), and
"Selected TS model"
(from select_TS
). conform_LDA_TS_data
: a data list
that is ready for analyses
using the stage-specific functions. check_LDA_TS_inputs
: an error message is thrown if any input is
improper, otherwise NULL
.
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
data(rodents) mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 1, timename = "newmoon") conform_LDA_TS_data(rodents) check_LDA_TS_inputs(rodents, timename = "newmoon")
data(rodents) mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 1, timename = "newmoon") conform_LDA_TS_data(rodents) check_LDA_TS_inputs(rodents, timename = "newmoon")
Create and define a list of control options used to run the
LDATS model, as implemented by LDA_TS
.
LDA_TS_control(quiet = FALSE, measurer_LDA = AIC, selector_LDA = min, iseed = 2, memoise = TRUE, response = "gamma", lambda = 0, measurer_TS = AIC, selector_TS = min, ntemps = 6, penultimate_temp = 2^6, ultimate_temp = 1e+10, q = 0, nit = 10000, magnitude = 12, burnin = 0, thin_frac = 1, summary_prob = 0.95, seed = NULL, ...)
LDA_TS_control(quiet = FALSE, measurer_LDA = AIC, selector_LDA = min, iseed = 2, memoise = TRUE, response = "gamma", lambda = 0, measurer_TS = AIC, selector_TS = min, ntemps = 6, penultimate_temp = 2^6, ultimate_temp = 1e+10, q = 0, nit = 10000, magnitude = 12, burnin = 0, thin_frac = 1, summary_prob = 0.95, seed = NULL, ...)
quiet |
|
measurer_LDA , selector_LDA
|
Function names for use in evaluation of
the LDA models. |
iseed |
|
memoise |
|
response |
|
lambda |
|
measurer_TS , selector_TS
|
Function names for use in evaluation of the
TS models. |
ntemps |
|
penultimate_temp |
Penultimate temperature in the ptMCMC sequence. |
ultimate_temp |
Ultimate temperature in the ptMCMC sequence. |
q |
Exponent controlling the ptMCMC temperature sequence from the focal chain (reference with temperature = 1) to the penultimate chain. 0 (default) implies a geometric sequence. 1 implies squaring before exponentiating. |
nit |
|
magnitude |
Average magnitude (defining a geometric distribution) for the proposed step size in the ptMCMC algorithm. |
burnin |
|
thin_frac |
Fraction of iterations to retain, from the ptMCMC. Must be
|
summary_prob |
Probability used for summarizing the posterior
distributions (via the highest posterior density interval, see
|
seed |
Input to |
... |
Additional arguments to be passed to
|
list
of control lists
, with named elements
LDAcontrol
, TScontrol
, and quiet
.
LDA_TS_control()
LDA_TS_control()
Performs two-stage analysis of multivariate temporal data using a combination of Latent Dirichlet Allocation (Blei et al. 2003) and Bayesian Time Series models (Western and Kleykamp 2004) that we extend for multinomial data using softmax regression (Venables and Ripley 2002) following Christensen et al. (2018).
Technical mathematical manuscript
End-user-focused vignette worked example
Computational pipeline vignette
Comparison to Christensen et al.
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
Imported but updated calculations from topicmodels package, as
applied to Latent Dirichlet Allocation fit with Variational Expectation
Maximization via LDA
.
## S3 method for class 'LDA_VEM' logLik(object, ...)
## S3 method for class 'LDA_VEM' logLik(object, ...)
object |
A |
... |
Not used, simply included to maintain method compatibility. |
The number of degrees of freedom is 1 (for alpha) plus the number of entries in the document-topic matrix. The number of observations is the number of entries in the document-term matrix.
Log likelihood of the model logLik
, also with df
(degrees of freedom) and nobs
(number of observations) values.
Buntine, W. 2002. Variational extensions to EM and multinomial PCA. European Conference on Machine Learning, Lecture Notes in Computer Science 2430:23-34. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
Hoffman, M. D., D. M. Blei, and F. Bach. 2010. Online learning for latent Dirichlet allocation. Advances in Neural Information Processing Systems 23:856-864. link.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2) logLik(r_LDA[[1]])
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2) logLik(r_LDA[[1]])
Convenience function to simply extract the logLik
element (and df
and nobs
) from a multinom_TS_fit
object fit by multinom_TS
. Extends
logLik
from multinom
to
multinom_TS_fit
objects.
## S3 method for class 'multinom_TS_fit' logLik(object, ...)
## S3 method for class 'multinom_TS_fit' logLik(object, ...)
object |
A |
... |
Not used, simply included to maintain method compatibility. |
Log likelihood of the model, as class logLik
, with
attributes df
(degrees of freedom) and nobs
(the number of
weighted observations, accounting for size differences among documents).
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50), timename = "newmoon", weights = weights) logLik(mts)
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50), timename = "newmoon", weights = weights) logLik(mts)
Convenience function to extract and format the log likelihood
of a TS_fit
-class object fit by multinom_TS
.
## S3 method for class 'TS_fit' logLik(object, ...)
## S3 method for class 'TS_fit' logLik(object, ...)
object |
Class |
... |
Not used, simply included to maintain method compatibility. |
Log likelihood of the model logLik
, also with df
(degrees of freedom) and nobs
(number of observations) values.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) logLik(TSmod)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) logLik(TSmod)
Calculate the exponent of a vector (offset by the max), sum the elements, calculate the log, remove the offset.
logsumexp(x)
logsumexp(x)
x |
|
The LSE.
logsumexp(1:10)
logsumexp(1:10)
This function provides a simple, logical toggle control on
whether the function fun
should be memoised via
memoise
or not.
memoise_fun(fun, memoise_tf = TRUE)
memoise_fun(fun, memoise_tf = TRUE)
fun |
Function name to (potentially) be memoised. |
memoise_tf |
|
fun
, memoised if desired.
sum_memo <- memoise_fun(sum)
sum_memo <- memoise_fun(sum)
Given the input to quiet
, generate the message(s)
in msg
or not.
messageq(msg = NULL, quiet = FALSE)
messageq(msg = NULL, quiet = FALSE)
msg |
|
quiet |
|
messageq("hello") messageq("hello", TRUE)
messageq("hello") messageq("hello", TRUE)
A wrapper on vcov
to produce a symmetric
matrix. If the default matrix returned by vcov
is
symmetric it is returned simply. If it is not, in fact, symmetric
(as occurs occasionally with multinom
applied to
proportions), the matrix is made symmetric by averaging the lower and
upper triangles. If the relative difference between the upper and lower
triangles for any entry is more than 0.1
mirror_vcov(x)
mirror_vcov(x)
x |
Model object that has a defined method for
|
Properly symmetric variance covariance matrix
.
dat <- data.frame(y = rnorm(50), x = rnorm(50)) mod <- lm(dat) mirror_vcov(mod)
dat <- data.frame(y = rnorm(50), x = rnorm(50)) mod <- lm(dat) mirror_vcov(mod)
Find the most common entry in a vector. Ties are not allowed, the first value encountered within the modal set if there are ties is deemed the mode.
modalvalue(x)
modalvalue(x)
x |
|
Numeric value of the mode.
d1 <- c(1, 1, 1, 2, 2, 3) modalvalue(d1)
d1 <- c(1, 1, 1, 2, 2, 3) modalvalue(d1)
Fit a set of multinomial regression models (via
multinom
, Venables and Ripley 2002) to a time series
of data divided into multiple segments (a.k.a. chunks) based on given
locations for a set of change points. check_multinom_TS_inputs
checks that the inputs to
multinom_TS
are of proper classes for an analysis.
multinom_TS(data, formula, changepoints = NULL, timename = "time", weights = NULL, control = list()) check_multinom_TS_inputs(data, formula = gamma ~ 1, changepoints = NULL, timename = "time", weights = NULL, control = list())
multinom_TS(data, formula, changepoints = NULL, timename = "time", weights = NULL, control = list()) check_multinom_TS_inputs(data, formula = gamma ~ 1, changepoints = NULL, timename = "time", weights = NULL, control = list())
data |
|
formula |
|
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to |
timename |
|
weights |
Optional class |
control |
A |
multinom_TS
: Object of class multinom_TS_fit
,
which is a list of [1]
chunk-level model fits ("chunk models"
), [2] the total log
likelihood combined across all chunks ("logLik"
), and [3] a
data.frame
of chunk beginning and ending times ("logLik"
with columns "start"
and "end"
). check_multinom_TS_inputs
: an error message is thrown if any
input is improper, otherwise NULL
.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) check_multinom_TS_inputs(dct, timename = "newmoon") mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50), timename = "newmoon", weights = weights)
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) check_multinom_TS_inputs(dct, timename = "newmoon") mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50), timename = "newmoon", weights = weights)
Fit a multinomial regression model (via
multinom
, Ripley 1996, Venables and Ripley 2002)
to a defined chunk of time (a.k.a. segment)
[chunk$start, chunk$end]
within a time series.
multinom_TS_chunk(data, formula, chunk, timename = "time", weights = NULL, control = list())
multinom_TS_chunk(data, formula, chunk, timename = "time", weights = NULL, control = list())
data |
Class |
formula |
Formula as a |
chunk |
Length-2 vector of times: [1] |
timename |
|
weights |
Optional class |
control |
A |
Fitted model object for the chunk, of classes multinom
and
nnet
.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge.
Venables, W. N. and B. D. Ripley. 2002. Modern Applied Statistics with S. Fourth edition. Springer.
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) chunk <- c(start = 0, end = 100) mtsc <- multinom_TS_chunk(dct, formula = gamma ~ 1, chunk = chunk, timename = "newmoon", weights = weights)
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) chunk <- c(start = 0, end = 100) mtsc <- multinom_TS_chunk(dct, formula = gamma ~ 1, chunk = chunk, timename = "newmoon", weights = weights)
Normalize a numeric
vector to be on the scale of [0,1].
normalize(x)
normalize(x)
x |
|
Normalized x
.
normalize(1:10)
normalize(1:10)
Takes the list of fitted chunk-level models returned from
TS_chunk_memo
(the memoised version of
multinom_TS_chunk
and packages it as a
multinom_TS_fit
object. This involves naming the model fits based
on the chunk time windows, combining the log likelihood values across the
chunks, and setting the class of the output object.
package_chunk_fits(chunks, fits)
package_chunk_fits(chunks, fits)
chunks |
Data frame of |
fits |
List of chunk-level fits returned by |
Object of class multinom_TS_fit
, which is a list of [1]
chunk-level model fits, [2] the total log likelihood combined across
all chunks, and [3] the chunk time data table.
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) formula <- gamma ~ 1 changepoints <- c(20,50) timename <- "newmoon" TS_chunk_memo <- memoise_fun(multinom_TS_chunk, TRUE) chunks <- prep_chunks(dct, changepoints, timename) nchunks <- nrow(chunks) fits <- vector("list", length = nchunks) for (i in 1:nchunks){ fits[[i]] <- TS_chunk_memo(dct, formula, chunks[i, ], timename, weights, TS_control()) } package_chunk_fits(chunks, fits)
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma weights <- document_weights(dtt) formula <- gamma ~ 1 changepoints <- c(20,50) timename <- "newmoon" TS_chunk_memo <- memoise_fun(multinom_TS_chunk, TRUE) chunks <- prep_chunks(dct, changepoints, timename) nchunks <- nrow(chunks) fits <- vector("list", length = nchunks) for (i in 1:nchunks){ fits[[i]] <- TS_chunk_memo(dct, formula, chunks[i, ], timename, weights, TS_control()) } package_chunk_fits(chunks, fits)
Name the elements (LDA models) and set the class
(LDA_set
) of the models returned by LDA_set
.
package_LDA_set(mods, mod_topics, mod_seeds)
package_LDA_set(mods, mod_topics, mod_seeds)
mods |
Fitted models returned from |
mod_topics |
Vector of |
mod_seeds |
Vector of |
lis
(class: LDA_set
) of LDA models (class:
LDA_VEM
).
data(rodents) document_term_table <- rodents$document_term_table topics <- 2 nseeds <- 2 control <- LDA_set_control() mod_topics <- rep(topics, each = length(seq(2, nseeds * 2, 2))) iseed <- control$iseed mod_seeds <- rep(seq(iseed, iseed + (nseeds - 1)* 2, 2), length(topics)) nmods <- length(mod_topics) mods <- vector("list", length = nmods) for (i in 1:nmods){ LDA_msg(mod_topics[i], mod_seeds[i], control) control_i <- prep_LDA_control(seed = mod_seeds[i], control = control) mods[[i]] <- topicmodels::LDA(document_term_table, k = mod_topics[i], control = control_i) } package_LDA_set(mods, mod_topics, mod_seeds)
data(rodents) document_term_table <- rodents$document_term_table topics <- 2 nseeds <- 2 control <- LDA_set_control() mod_topics <- rep(topics, each = length(seq(2, nseeds * 2, 2))) iseed <- control$iseed mod_seeds <- rep(seq(iseed, iseed + (nseeds - 1)* 2, 2), length(topics)) nmods <- length(mod_topics) mods <- vector("list", length = nmods) for (i in 1:nmods){ LDA_msg(mod_topics[i], mod_seeds[i], control) control_i <- prep_LDA_control(seed = mod_seeds[i], control = control) mods[[i]] <- topicmodels::LDA(document_term_table, k = mod_topics[i], control = control_i) } package_LDA_set(mods, mod_topics, mod_seeds)
Combine the objects returned by LDA_set
,
select_LDA
, TS_on_LDA
, and
select_TS
, name them as elements of the list, and
set the class of the list as LDA_TS
, for the return from
LDA_TS
.
package_LDA_TS(LDAs, sel_LDA, TSs, sel_TSs)
package_LDA_TS(LDAs, sel_LDA, TSs, sel_TSs)
LDAs |
List (class: |
sel_LDA |
A reduced version of |
TSs |
Class |
sel_TSs |
A reduced version of |
Class LDA_TS
-class object including all fitted models and
selected models specifically, ready to be returned from
LDA_TS
.
data(rodents) data <- rodents control <- LDA_TS_control() dtt <- data$document_term_table dct <- data$document_covariate_table weights <- document_weights(dtt) LDAs <- LDA_set(dtt, 2, 1, control$LDA_set_control) sel_LDA <- select_LDA(LDAs, control$LDA_set_control) TSs <- TS_on_LDA(sel_LDA, dct, ~1, 1, "newmoon", weights, control$TS_control) sel_TSs <- select_TS(TSs, control$TS_control) package_LDA_TS(LDAs, sel_LDA, TSs, sel_TSs)
data(rodents) data <- rodents control <- LDA_TS_control() dtt <- data$document_term_table dct <- data$document_covariate_table weights <- document_weights(dtt) LDAs <- LDA_set(dtt, 2, 1, control$LDA_set_control) sel_LDA <- select_LDA(LDAs, control$LDA_set_control) TSs <- TS_on_LDA(sel_LDA, dct, ~1, 1, "newmoon", weights, control$TS_control) sel_TSs <- select_TS(TSs, control$TS_control) package_LDA_TS(LDAs, sel_LDA, TSs, sel_TSs)
Calculate relevant summaries for the run of a Time Series
model within TS
and package the output as a
TS_fit
-class object.
package_TS(data, formula, timename, weights, control, rho_dist, eta_dist)
package_TS(data, formula, timename, weights, control, rho_dist, eta_dist)
data |
|
formula |
|
timename |
|
weights |
Optional class |
control |
A |
rho_dist |
List of saved data objects from the ptMCMC estimation of
change point locations returned by |
eta_dist |
Matrix of draws (rows) from the marginal posteriors of the
coefficients across the segments (columns), as estimated by
|
TS_fit
-class list containing the following elements, many of
which are hidden for print
ing, but are accessible:
data
input to the function.
formula
input to the function.
nchangepoints
input to the function.
weights
input to the function.
timename
input to the function.
control
input to the function.
Iteration-by-iteration
logLik values for the
full time series fit by multinom_TS
.
Iteration-by-iteration change point estimates from
est_changepoints
.
Iteration-by-iteration marginal regressor estimates from
est_regressors
, which have been
unconditioned with respect to the change point locations.
ptMCMC diagnostics,
see diagnose_ptMCMC
Summary table describing rhos
(the change
point locations),
see summarize_rhos
.
Variance-covariance matrix for the estimates of
rhos
(the change point locations), see
measure_rho_vcov
.
Summary table describing ets
(the
regressors),
see summarize_etas
.
Variance-covariance matrix for the estimates of
etas
(the regressors), see
measure_eta_vcov
.
Across-iteration average of log-likelihoods
(lls
).
Total number of parameters in the full model, including the change point locations and regressors.
Penalized negative log-likelihood, based on
logLik
and nparams
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) formula <- gamma ~ 1 nchangepoints <- 1 control <- TS_control() data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon", weights, control) eta_dist <- est_regressors(rho_dist, data, formula, "newmoon", weights, control) package_TS(data, formula, "newmoon", weights, control, rho_dist, eta_dist)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) formula <- gamma ~ 1 nchangepoints <- 1 control <- TS_control() data <- data[order(data[,"newmoon"]), ] rho_dist <- est_changepoints(data, formula, nchangepoints, "newmoon", weights, control) eta_dist <- est_regressors(rho_dist, data, formula, "newmoon", weights, control) package_TS(data, formula, "newmoon", weights, control, rho_dist, eta_dist)
Set the class and name the elements of the results list
returned from applying TS
to the combination of TS models
requested for the LDA model(s) input.
package_TS_on_LDA(TSmods, LDA_models, models)
package_TS_on_LDA(TSmods, LDA_models, models)
TSmods |
list of results from |
LDA_models |
List of LDA models (class |
models |
|
Class TS_on_LDA
list of results from TS
applied for each model on each LDA model input.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) mods <- expand_TS(LDA_models, c(~ 1, ~ newmoon), 0:1) nmods <- nrow(mods) TSmods <- vector("list", nmods) for(i in 1:nmods){ formula_i <- mods$formula[[i]] nchangepoints_i <- mods$nchangepoints[i] data_i <- prep_TS_data(document_covariate_table, LDA_models, mods, i) TSmods[[i]] <- TS(data_i, formula_i, nchangepoints_i, "newmoon", weights, TS_control()) } package_TS_on_LDA(TSmods, LDA_models, mods)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) mods <- expand_TS(LDA_models, c(~ 1, ~ newmoon), 0:1) nmods <- nrow(mods) TSmods <- vector("list", nmods) for(i in 1:nmods){ formula_i <- mods$formula[[i]] nchangepoints_i <- mods$nchangepoints[i] data_i <- prep_TS_data(document_covariate_table, LDA_models, mods, i) TSmods[[i]] <- TS(data_i, formula_i, nchangepoints_i, "newmoon", weights, TS_control()) } package_TS_on_LDA(TSmods, LDA_models, mods)
Generalization of the plot
function to
work on a list of LDA topic models (class LDA_set
).
## S3 method for class 'LDA_set' plot(x, ...)
## S3 method for class 'LDA_set' plot(x, ...)
x |
An |
... |
Additional arguments to be passed to subfunctions. |
NULL
.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2) plot(r_LDA)
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2) plot(r_LDA)
Generalization of the plot
function to
work on fitted LDA_TS model objects (class LDA_TS
) returned by
LDA_TS
).
## S3 method for class 'LDA_TS' plot(x, ..., cols = set_LDA_TS_plot_cols(), bin_width = 1, xname = NULL, border = NA, selection = "median")
## S3 method for class 'LDA_TS' plot(x, ..., cols = set_LDA_TS_plot_cols(), bin_width = 1, xname = NULL, border = NA, selection = "median")
x |
A |
... |
Additional arguments to be passed to subfunctions. Not currently
used, just retained for alignment with |
cols |
|
bin_width |
Width of the bins used in the histograms of the summary time series plot, in units of the time variable used to fit the model (the x-axis). |
xname |
Label for the x-axis in the summary time series plot. Defaults
to |
border |
Border for the histogram, default is |
selection |
Indicator of the change points to use in the time series
summary plot. Currently only defined for |
NULL
.
data(rodents) mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 1, timename = "newmoon") plot(mod, binwidth = 5, xlab = "New moon")
data(rodents) mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 1, timename = "newmoon") plot(mod, binwidth = 5, xlab = "New moon")
Create an LDATS LDA summary plot, with a top panel showing
the topic proportions for each word and a bottom panel showing the topic
proportions of each document/over time. The plot function is defined for
class LDA_VEM
specifically (see LDA
).
LDA_plot_top_panel
creates an LDATS LDA summary plot
top panel showing the topic proportions word-by-word. LDA_plot_bottom_panel
creates an LDATS LDA summary plot
bottom panel showing the topic proportions over time/documents.
## S3 method for class 'LDA_VEM' plot(x, ..., xtime = NULL, xname = NULL, cols = NULL, option = "C", alpha = 0.8, LDATS = FALSE) LDA_plot_top_panel(x, cols = NULL, option = "C", alpha = 0.8, together = FALSE, LDATS = FALSE) LDA_plot_bottom_panel(x, xtime = NULL, xname = NULL, cols = NULL, option = "C", alpha = 0.8, together = FALSE, LDATS = FALSE)
## S3 method for class 'LDA_VEM' plot(x, ..., xtime = NULL, xname = NULL, cols = NULL, option = "C", alpha = 0.8, LDATS = FALSE) LDA_plot_top_panel(x, cols = NULL, option = "C", alpha = 0.8, together = FALSE, LDATS = FALSE) LDA_plot_bottom_panel(x, xtime = NULL, xname = NULL, cols = NULL, option = "C", alpha = 0.8, together = FALSE, LDATS = FALSE)
x |
Object of class |
... |
Not used, retained for alignment with base function. |
xtime |
Optional x values used to plot the topic proportions according to a specific time value (rather than simply the order of observations). |
xname |
Optional name for the x values used in plotting the topic proportions (otherwise defaults to "Document"). |
cols |
Colors to be used to plot the topics.
Any valid color values (e.g., see |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
LDATS |
|
together |
|
NULL
.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 4, nseeds = 10) best_lda <- select_LDA(r_LDA)[[1]] plot(best_lda, option = "cividis") LDA_plot_top_panel(best_lda, option = "cividis") LDA_plot_bottom_panel(best_lda, option = "cividis")
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 4, nseeds = 10) best_lda <- select_LDA(r_LDA)[[1]] plot(best_lda, option = "cividis") LDA_plot_top_panel(best_lda, option = "cividis") LDA_plot_bottom_panel(best_lda, option = "cividis")
Generalization of the plot
function to
work on fitted TS model objects (class TS_fit
) returned from
TS
.
## S3 method for class 'TS_fit' plot(x, ..., plot_type = "summary", interactive = FALSE, cols = set_TS_summary_plot_cols(), bin_width = 1, xname = NULL, border = NA, selection = "median", LDATS = FALSE)
## S3 method for class 'TS_fit' plot(x, ..., plot_type = "summary", interactive = FALSE, cols = set_TS_summary_plot_cols(), bin_width = 1, xname = NULL, border = NA, selection = "median", LDATS = FALSE)
x |
A |
... |
Additional arguments to be passed to subfunctions. Not currently
used, just retained for alignment with |
plot_type |
"diagnostic" or "summary". |
interactive |
|
cols |
|
bin_width |
Width of the bins used in the histograms of the summary time series plot, in units of the x-axis (the time variable used to fit the model). |
xname |
Label for the x-axis in the summary time series plot. Defaults
to |
border |
Border for the histogram, default is |
selection |
Indicator of the change points to use in the time series
summary plot. Currently only defined for |
LDATS |
|
NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) plot(TSmod)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) plot(TSmod)
Produce a vanilla histogram plot using hist
for the
parameter of interest (rho or eta) as part of
TS_diagnostics_plot
. A vertical line is added to show the
median of the posterior.
posterior_plot(x, xlab = "parameter value")
posterior_plot(x, xlab = "parameter value")
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
xlab |
|
NULL
.
posterior_plot(rnorm(100, 0, 1))
posterior_plot(rnorm(100, 0, 1))
Creates the table containing the start and end times for each
chunk within a time series, based on the change points (used to break up
the time series) and the range of the time series. If there are no
change points (i.e. changepoints
is NULL
, there is still a
single chunk defined by the start and end of the time series.
prep_chunks(data, changepoints = NULL, timename = "time")
prep_chunks(data, changepoints = NULL, timename = "time")
data |
Class |
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to |
timename |
|
data.frame
of start
and end
times (columns)
for each chunk (rows).
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma chunks <- prep_chunks(dct, changepoints = 100, timename = "newmoon")
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma chunks <- prep_chunks(dct, changepoints = 100, timename = "newmoon")
Each of the chains is initialized by prep_cpts
using a
draw from the available times (i.e. assuming a uniform prior), the best
fit (by likelihood) draw is put in the focal chain with each subsequently
worse fit placed into the subsequently hotter chain. update_cpts
updates the change points after every iteration in the ptMCMC algorithm.
prep_cpts(data, formula, nchangepoints, timename, weights, control = list()) update_cpts(cpts, swaps)
prep_cpts(data, formula, nchangepoints, timename, weights, control = list()) update_cpts(cpts, swaps)
data |
|
formula |
|
nchangepoints |
|
timename |
|
weights |
Optional class |
control |
A |
cpts |
The existing matrix of change points. |
swaps |
Chain configuration after among-temperature swaps. |
list
of [1] matrix
of change points (rows) for
each temperature (columns) and [2] vector
of log-likelihood
values for each of the chains.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) }
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) }
prep_ids
creates and update_ids
updates
the active vector of identities (ids) for each of the chains in the
ptMCMC algorithm. These ids are used to track trips of the particles
among chains.
These functions were designed to work within TS
and
specifically est_changepoints
, but have been generalized
and would work within any general ptMCMC as long as control
,
ids
, and swaps
are formatted properly.
prep_ids(control = list()) update_ids(ids, swaps)
prep_ids(control = list()) update_ids(ids, swaps)
control |
A |
ids |
The existing vector of chain ids. |
swaps |
Chain configuration after among-temperature swaps. |
The vector of chain ids.
prep_ids() data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) }
prep_ids() data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) }
Update the control list for the LDA model with the specific seed as indicated. And remove controls not used within the LDA itself.
prep_LDA_control(seed, control = list())
prep_LDA_control(seed, control = list())
seed |
|
control |
Named list of control parameters to be used in
|
list
of controls to be used in the LDA.
prep_LDA_control(seed = 1)
prep_LDA_control(seed = 1)
prep_pbar
creates and update_pbar
steps
through the progress bars (if desired) in TS
prep_pbar(control = list(), bar_type = "rho", nr = NULL) update_pbar(pbar, control = list())
prep_pbar(control = list(), bar_type = "rho", nr = NULL) update_pbar(pbar, control = list())
control |
A |
bar_type |
"rho" (for change point locations) or "eta" (for regressors). |
nr |
|
pbar |
The progress bar object returned from |
prep_pbar
: the initialized progress bar object. update_pbar
: the ticked-forward pbar
.
pb <- prep_pbar(control = list(nit = 2)); pb pb <- update_pbar(pb); pb pb <- update_pbar(pb); pb
pb <- prep_pbar(control = list(nit = 2)); pb pb <- update_pbar(pb); pb pb <- update_pbar(pb); pb
Calculate the proposal distribution in advance of actually running the ptMCMC algorithm in order to decrease computation time. The proposal distribution is a joint of three distributions: [1] a multinomial distribution selecting among the change points within the chain, [2] a binomial distribution selecting the direction of the step of the change point (earlier or later in the time series), and [3] a geometric distribution selecting the magnitude of the step.
prep_proposal_dist(nchangepoints, control = list())
prep_proposal_dist(nchangepoints, control = list())
nchangepoints |
Integer corresponding to the number of change points to include in the model. 0 is a valid input (corresponding to no change points, so a singular time series model), and the current implementation can reasonably include up to 6 change points. The number of change points is used to dictate the segmentation of the data for each continuous model and each LDA model. |
control |
A |
list
of two matrix
elements: [1] the size of the
proposed step for each iteration of each chain and [2] the identity of
the change point location to be shifted by the step for each iteration of
each chain.
prep_proposal_dist(nchangepoints = 2)
prep_proposal_dist(nchangepoints = 2)
Package the static inputs (controls and data structures) used
by the ptMCMC algorithm in the context of estimating change points.
This function was designed to work within TS
and
specifically est_changepoints
. It is still hardcoded to do
so, but has the capacity to be generalized to work with any estimation
via ptMCMC with additional coding work.
prep_ptMCMC_inputs(data, formula, nchangepoints, timename, weights = NULL, control = list())
prep_ptMCMC_inputs(data, formula, nchangepoints, timename, weights = NULL, control = list())
data |
Class |
formula |
|
nchangepoints |
Integer corresponding to the number of change points to include in the model. 0 is a valid input (corresponding to no change points, so a singular time series model), and the current implementation can reasonably include up to 6 change points. The number of change points is used to dictate the segmentation of the data for each continuous model and each LDA model. |
timename |
|
weights |
Optional class |
control |
A |
Class ptMCMC_inputs
list
, containing the static
inputs for use within the ptMCMC algorithm for estimating change points.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
prep_saves
creates the data structure used to save the
output from each iteration of the ptMCMC algorithm, which is added via
update_saves
. Once the ptMCMC is complete, the saved data objects
are then processed (burn-in iterations are dropped and the remaining
iterations are thinned) via process_saves
.
This set of functions was designed to work within TS
and
specifically est_changepoints
. They are still hardcoded to
do so, but have the capacity to be generalized to work with any
estimation via ptMCMC with additional coding work.
prep_saves(nchangepoints, control = list()) update_saves(i, saves, steps, swaps) process_saves(saves, control = list())
prep_saves(nchangepoints, control = list()) update_saves(i, saves, steps, swaps) process_saves(saves, control = list())
nchangepoints |
|
control |
A |
i |
|
saves |
The existing list of saved data objects. |
steps |
Chain configuration after within-temperature steps. |
swaps |
Chain configuration after among-temperature swaps. |
list
of ptMCMC objects: change points ($cpts
),
log-likelihoods ($lls
), chain ids ($ids
), step acceptances
($step_accepts
), and swap acceptances ($swap_accepts
).
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) } process_saves(saves, TS_control())
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) } process_saves(saves, TS_control())
Create the series of temperatures used in the ptMCMC
algorithm.
This function was designed to work within TS
and
est_changepoints
specifically, but has been generalized
and would work with any ptMCMC model as long as control
includes the relevant control parameters (and provided that the
check_control
function and its use here are generalized).
prep_temp_sequence(control = list())
prep_temp_sequence(control = list())
control |
A |
vector
of temperatures.
prep_temp_sequence()
prep_temp_sequence()
Append the estimated topic proportions from a fitted LDA model
to the document covariate table to create the data structure needed for
TS
.
prep_TS_data(document_covariate_table, LDA_models, mods, i = 1)
prep_TS_data(document_covariate_table, LDA_models, mods, i = 1)
document_covariate_table |
Document covariate table (rows: documents,
columns: time index and covariate options). Every model needs a
covariate to describe the time value for each document (in whatever
units and whose name in the table is input in |
LDA_models |
List of LDA models (class |
mods |
The |
i |
|
Class data.frame
object including [1] the time variable
(indicated in control
), [2] the predictor variables (required by
formula
) and [3], the multinomial response variable (indicated
in formula
), ready for input into TS
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- expand_TS(LDA_models, formulas = ~1, nchangepoints = 0) data1 <- prep_TS_data(document_covariate_table, LDA_models, mods)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- expand_TS(LDA_models, formulas = ~1, nchangepoints = 0) data1 <- prep_TS_data(document_covariate_table, LDA_models, mods)
If desired, print a message at the beginning of every model combination stating the TS model and the LDA model being evaluated.
print_model_run_message(models, i, LDA_models, control)
print_model_run_message(models, i, LDA_models, control)
models |
|
i |
|
LDA_models |
List of LDA models (class |
control |
A |
NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) nchangepoints <- 0:1 mods <- expand_TS(LDA_models, formulas, nchangepoints) print_model_run_message(mods, 1, LDA_models, TS_control())
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) nchangepoints <- 0:1 mods <- expand_TS(LDA_models, formulas, nchangepoints) print_model_run_message(mods, 1, LDA_models, TS_control())
Convenience function to print only the selected elements of a
LDA_TS
-class object returned by LDA_TS
## S3 method for class 'LDA_TS' print(x, ...)
## S3 method for class 'LDA_TS' print(x, ...)
x |
Class |
... |
Not used, simply included to maintain method compatibility. |
The selected models in x
as a two-element list
with
the TS component only returning the non-hidden components.
data(rodents) mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 1, timename = "newmoon") print(mod)
data(rodents) mod <- LDA_TS(data = rodents, topics = 2, nseeds = 1, formulas = ~1, nchangepoints = 1, timename = "newmoon") print(mod)
Convenience function to print only the most important
components of a TS_fit
-class object fit by
TS
.
## S3 method for class 'TS_fit' print(x, ...)
## S3 method for class 'TS_fit' print(x, ...)
x |
Class |
... |
Not used, simply included to maintain method compatibility. |
The non-hidden parts of x
as a list
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) print(TSmod)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) print(TSmod)
Convenience function to print only the names of a
TS_on_LDA
-class object generated by TS_on_LDA
.
## S3 method for class 'TS_on_LDA' print(x, ...)
## S3 method for class 'TS_on_LDA' print(x, ...)
x |
Class |
... |
Not used, simply included to maintain method compatibility. |
character
vector
of the names of x
's models.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas, nchangepoints = 0:1, timename = "newmoon", weights) print(mods)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas, nchangepoints = 0:1, timename = "newmoon", weights) print(mods)
This function wraps around TS_memo
(optionally memoised multinom_TS
) to provide a
simpler interface within the ptMCMC algorithm and is implemented within
propose_step
.
proposed_step_mods(prop_changepts, inputs)
proposed_step_mods(prop_changepts, inputs)
prop_changepts |
|
inputs |
Class |
List of models associated with the proposed step, with an element for each chain.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) i <- 1 pdist <- inputs$pdist ntemps <- length(inputs$temps) selection <- cbind(pdist$which_steps[i, ], 1:ntemps) prop_changepts <- cpts$changepts curr_changepts_s <- cpts$changepts[selection] prop_changepts_s <- curr_changepts_s + pdist$steps[i, ] if(all(is.na(prop_changepts_s))){ prop_changepts_s <- NULL } prop_changepts[selection] <- prop_changepts_s mods <- proposed_step_mods(prop_changepts, inputs)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) i <- 1 pdist <- inputs$pdist ntemps <- length(inputs$temps) selection <- cbind(pdist$which_steps[i, ], 1:ntemps) prop_changepts <- cpts$changepts curr_changepts_s <- cpts$changepts[selection] prop_changepts_s <- curr_changepts_s + pdist$steps[i, ] if(all(is.na(prop_changepts_s))){ prop_changepts_s <- NULL } prop_changepts[selection] <- prop_changepts_s mods <- proposed_step_mods(prop_changepts, inputs)
Adds vertical lines to the plot of the time series of fitted proportions associated with the change points of interest.
rho_lines(spec_rhos)
rho_lines(spec_rhos)
spec_rhos |
|
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) pred_gamma_TS_plot(TSmod) rho_lines(200)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) pred_gamma_TS_plot(TSmod) rho_lines(200)
An example LDATS dataset, functionally that used in Christensen et al. (2018). The data are counts of 21 rodent species across 436 sampling events, with the count being the total number observed across 8 50 m x 50 m plots, each sampled using 49 live traps (Brown 1998, Ernest et al. 2016).
rodents
rodents
A list
of two data.frame
-class objects with rows
corresponding to documents (sampling events). One element is the
document term table (called document_term_table
), which contains
counts of the species (terms) in each sample (document), and the other is
the document covariate table (called document_covariate_table
)
with columns of covariates (newmoon number, sin and cos of the fraction
of the year).
https://github.com/weecology/PortalData/tree/master/Rodents
Brown, J. H. 1998. The desert granivory experiments at Portal. Pages 71-95 in W. J. Resetarits Jr. and J. Bernardo, editors, Experimental Ecology. Oxford University Press, New York, New York, USA.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Ernest, S. K. M., et al. 2016. Long-term monitoring and experimental manipulation of a Chihuahuan desert ecosystem near Portal, Arizona (1977-2013). Ecology 97:1082. link.
Select the best model(s) of interest from an
LDA_set
object, based on a set of user-provided functions. The
functions default to choosing the model with the lowest AIC value.
select_LDA(LDA_models = NULL, control = list())
select_LDA(LDA_models = NULL, control = list())
LDA_models |
An object of class |
control |
A |
A reduced version of LDA_models
that only includes the
selected LDA model(s). The returned object is still an object of
class LDA_set
.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2) select_LDA(r_LDA)
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2) select_LDA(r_LDA)
Select the best model of interest from an
TS_on_LDA
object generated by TS_on_LDA
, based on
a set of user-provided functions. The functions default to choosing the
model with the lowest AIC value.
Presently, the set of functions should result in a singular selected
model. If multiple models are chosen via the selection, only the first
is returned.
select_TS(TS_models, control = list())
select_TS(TS_models, control = list())
TS_models |
An object of class |
control |
A |
A reduced version of TS_models
that only includes the
selected TS model. The returned object is a single TS model object of
class TS_fit
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas, nchangepoints = 0:1, timename = "newmoon", weights) select_TS(mods)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas, nchangepoints = 0:1, timename = "newmoon", weights) select_TS(mods)
Based on the inputs, create the set of colors to be used in the time series of the fitted gamma (topic proportion) values.
set_gamma_colors(x, cols = NULL, option = "D", alpha = 1)
set_gamma_colors(x, cols = NULL, option = "D", alpha = 1)
x |
Object of class |
cols |
Colors to be used to plot the time series of fitted topic proportions. |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
Vector of character
hex codes indicating colors to use.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) set_gamma_colors(TSmod)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) set_gamma_colors(TSmod)
Based on the inputs, create the set of colors to be used in
the LDA plots made by plot.LDA_TS
.
set_LDA_plot_colors(x, cols = NULL, option = "C", alpha = 0.8)
set_LDA_plot_colors(x, cols = NULL, option = "C", alpha = 0.8)
x |
Object of class |
cols |
Colors to be used to plot the topics.
Any valid color values (e.g., see |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
vector
of character
hex codes indicating colors to
use.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 4, nseeds = 10) set_LDA_plot_colors(r_LDA[[1]])
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 4, nseeds = 10) set_LDA_plot_colors(r_LDA[[1]])
A default list generator function that produces the options
for the colors controlling the panels of the LDATS summary plots, needed
because the change point histogram panel should be in a different color
scheme than the LDA and fitted time series model panels, which should be
in a matching color scheme. See set_LDA_plot_colors
,
set_TS_summary_plot_cols
, set_gamma_colors
,
and set_rho_hist_colors
for specific details on usage.
set_LDA_TS_plot_cols(rho_cols = NULL, rho_option = "D", rho_alpha = 0.4, gamma_cols = NULL, gamma_option = "C", gamma_alpha = 0.8)
set_LDA_TS_plot_cols(rho_cols = NULL, rho_option = "D", rho_alpha = 0.4, gamma_cols = NULL, gamma_option = "C", gamma_alpha = 0.8)
rho_cols |
Colors to be used to plot the histograms of change points.
Any valid color values (e.g., see |
rho_option |
A |
rho_alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
gamma_cols |
Colors to be used to plot the LDA topic proportions,
time series of observed topic proportions, and time series of fitted
topic proportions. Any valid color values (e.g., see
|
gamma_option |
A |
gamma_alpha |
Numeric value [0,1] that indicates the transparency of
the colors used. Supported only on some devices, see
|
list
of elements used to define the colors for the two
panels of the summary plot, as generated simply using
set_LDA_TS_plot_cols
. cols
has two elements:
LDA
and TS
, each corresponding the set of plots for
its stage in the full model. LDA
contains entries cols
and options
(see set_LDA_plot_colors
). TS
contains two entries, rho
and gamma
, each corresponding
to the related panel, and each containing default values for entries
named cols
, option
, and alpha
(see
set_TS_summary_plot_cols
, set_gamma_colors
,
and set_rho_hist_colors
).
set_LDA_TS_plot_cols()
set_LDA_TS_plot_cols()
Based on the inputs, create the set of colors to be used in the change point histogram.
set_rho_hist_colors(x = NULL, cols = NULL, option = "D", alpha = 1)
set_rho_hist_colors(x = NULL, cols = NULL, option = "D", alpha = 1)
x |
|
cols |
Colors to be used to plot the histograms of change points.
Any valid color values (e.g., see |
option |
A |
alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
Vector of character
hex codes indicating colors to use.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) set_rho_hist_colors(TSmod$rhos)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) set_rho_hist_colors(TSmod$rhos)
A default list generator function that produces the options
for the colors controlling the panels of the TS summary plots, so needed
because the panels should be in different color schemes. See
set_gamma_colors
and set_rho_hist_colors
for
specific details on usage.
set_TS_summary_plot_cols(rho_cols = NULL, rho_option = "D", rho_alpha = 0.4, gamma_cols = NULL, gamma_option = "C", gamma_alpha = 0.8)
set_TS_summary_plot_cols(rho_cols = NULL, rho_option = "D", rho_alpha = 0.4, gamma_cols = NULL, gamma_option = "C", gamma_alpha = 0.8)
rho_cols |
Colors to be used to plot the histograms of change points.
Any valid color values (e.g., see |
rho_option |
A |
rho_alpha |
Numeric value [0,1] that indicates the transparency of the
colors used. Supported only on some devices, see
|
gamma_cols |
Colors to be used to plot the LDA topic proportions,
time series of observed topic proportions, and time series of fitted
topic proportions. Any valid color values (e.g., see
|
gamma_option |
A |
gamma_alpha |
Numeric value [0,1] that indicates the transparency of
the colors used. Supported only on some devices, see
|
list
of elements used to define the colors for the two
panels. Contains two elements rho
and gamma
, each
corresponding to the related panel, and each containing default values
for entries named cols
, option
, and alpha
.
set_TS_summary_plot_cols()
set_TS_summary_plot_cols()
For a given set of parameters alpha
and Beta
and
document-specific total word counts, simulate a document-by-term matrix.
Additional structuring variables (the numbers of topics (k),
documents (M), terms (V)) are inferred from input objects.
sim_LDA_data(N, Beta, alpha = NULL, Theta = NULL, seed = NULL)
sim_LDA_data(N, Beta, alpha = NULL, Theta = NULL, seed = NULL)
N |
A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents. |
Beta |
|
alpha |
Single positive numeric value for the Dirichlet distribution
parameter defining topics within documents. To specifically define
document topic probabilities, use |
Theta |
|
seed |
Input to |
A document-by-term matrix
of counts (dim: M x V).
N <- c(10, 22, 15, 31) alpha <- 1.2 Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE) sim_LDA_data(N, Beta, alpha = alpha) Theta <- matrix(c(0.2, 0.8, 0.8, 0.2, 0.5, 0.5, 0.9, 0.1), 4, 2, byrow = TRUE) sim_LDA_data(N, Beta, Theta = Theta)
N <- c(10, 22, 15, 31) alpha <- 1.2 Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE) sim_LDA_data(N, Beta, alpha = alpha) Theta <- matrix(c(0.2, 0.8, 0.8, 0.2, 0.5, 0.5, 0.9, 0.1), 4, 2, byrow = TRUE) sim_LDA_data(N, Beta, Theta = Theta)
For a given set of covariates X
; parameters
Beta
, Eta
, rho
, and err
; and
document-specific time stamps tD
and lengths N
),
simulate a document-by-topic matrix.
Additional structuring variables (the numbers of topics (k), terms (V),
documents (M), segments (S), and covariates per segment (C))
are inferred from input objects.
sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err = 0, seed = NULL)
sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err = 0, seed = NULL)
N |
A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents. |
Beta |
|
X |
|
Eta |
|
rho |
Vector of integer-conformable time locations of changepoints or
|
tD |
Vector of integer-conformable times of the documents. Must be
of length M (as determined by |
err |
Additive error on the link-scale. Must be a non-negative
|
seed |
Input to |
A document-by-term matrix
of counts (dim: M x V).
N <- c(10, 22, 15, 31) tD <- c(1, 3, 4, 6) rho <- 3 X <- cbind(rep(1, 4), 1:4) Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5)) Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE) err <- 1 sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err)
N <- c(10, 22, 15, 31) tD <- c(1, 3, 4, 6) rho <- 3 X <- cbind(rep(1, 4), 1:4) Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5)) Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE) err <- 1 sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err)
For a given set of covariates X
; parameters Eta
,
rho
, and err
; and document-specific time stamps tD
,
simulate a document-by-topic matrix. Additional structuring variables
(numbers of topics (k), documents (M), segments (S), and
covariates per segment (C)) are inferred from input objects.
sim_TS_data(X, Eta, rho, tD, err = 0, seed = NULL)
sim_TS_data(X, Eta, rho, tD, err = 0, seed = NULL)
X |
|
Eta |
|
rho |
Vector of integer-conformable time locations of changepoints or
|
tD |
Vector of integer-conformable times of the documents. Must be
of length M (as determined by |
err |
Additive error on the link-scale. Must be a non-negative
|
seed |
Input to |
A document-by-topic matrix
of probabilities (dim: M x k).
tD <- c(1, 3, 4, 6) rho <- 3 X <- cbind(rep(1, 4), 1:4) Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5)) sim_TS_data(X, Eta, rho, tD, err = 1)
tD <- c(1, 3, 4, 6) rho <- 3 X <- cbind(rep(1, 4), 1:4) Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5)) sim_TS_data(X, Eta, rho, tD, err = 1)
Calculate the softmax (normalized exponential) of a vector of values or a set of vectors stacked rowwise.
softmax(x)
softmax(x)
x |
|
The softmax of x
.
dat <- matrix(runif(100, -1, 1), 25, 4) softmax(dat) softmax(dat[,1])
dat <- matrix(runif(100, -1, 1), 25, 4) softmax(dat) softmax(dat[,1])
This set of functions steps the chains forward one iteration
of the within-chain component of the ptMCMC algorithm. step_chains
is the main function, comprised of a proposal (made by prop_step
),
an evaluation of that proposal (made by eval_step
), and then an
update of the configuration (made by take_step
).
This set of functions was designed to work within TS
and
specifically est_changepoints
. They are still hardcoded to
do so, but have the capacity to be generalized to work with any
estimation via ptMCMC with additional coding work.
step_chains(i, cpts, inputs) propose_step(i, cpts, inputs) eval_step(i, cpts, prop_step, inputs) take_step(cpts, prop_step, accept_step)
step_chains(i, cpts, inputs) propose_step(i, cpts, inputs) eval_step(i, cpts, prop_step, inputs) take_step(cpts, prop_step, accept_step)
i |
|
cpts |
|
inputs |
Class |
prop_step |
Proposed step output from |
accept_step |
|
For each iteration of the ptMCMC algorithm, all of the chains have the potential to take a step. The possible step is proposed under a proposal distribution (here for change points we use a symmetric geometric distribution), the proposed step is then evaluated and either accepted or not (following the Metropolis-Hastings rule; Metropolis, et al. 1953, Hasting 1960, Gupta et al. 2018), and then accordingly taken or not (the configurations are updated).
step_chains
: list
of change points, log-likelihoods,
and logical indicators of acceptance for each chain. propose_step
: list
of change points and
log-likelihood values for the proposal. eval_step
: logical
vector indicating if each
chain's proposal was accepted. take_step
: list
of change points, log-likelihoods,
and logical indicators of acceptance for each chain.
Gupta, S., L. Hainsworth, J. S. Hogg, R. E. C. Lee, and J. R. Faeder. 2018. Evaluation of parallel tempering to accelerate Bayesian parameter estimation in systems biology. link.
Hastings, W. K. 1970. Monte Carlo sampling methods using Markov Chains and their applications. Biometrika 57:97-109. link.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087-1092. link.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) } # within step_chains() cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) i <- 1 prop_step <- propose_step(i, cpts, inputs) accept_step <- eval_step(i, cpts, prop_step, inputs) take_step(cpts, prop_step, accept_step)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) } # within step_chains() cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) i <- 1 prop_step <- propose_step(i, cpts, inputs) accept_step <- eval_step(i, cpts, prop_step, inputs) take_step(cpts, prop_step, accept_step)
summarize_etas
calculates summary statistics for each
of the chunk-level regressors.
measure_ets_vcov
generates the variance-covariance matrix for
the regressors.
summarize_etas(etas, control = list()) measure_eta_vcov(etas)
summarize_etas(etas, control = list()) measure_eta_vcov(etas)
etas |
Matrix of regressors (columns) across iterations of the
ptMCMC (rows), as returned from |
control |
A |
summarize_etas
: table of summary statistics for chunk-level
regressors including mean, median, mode, posterior interval, standard
deviation, MCMC error, autocorrelation, and effective sample size for
each regressor. measure_eta_vcov
: variance-covariance matrix for chunk-level
regressors.
etas <- matrix(rnorm(100), 50, 2) summarize_etas(etas) measure_eta_vcov(etas)
etas <- matrix(rnorm(100), 50, 2) summarize_etas(etas) measure_eta_vcov(etas)
summarize_rho
calculates summary statistics for each
of the change point locations.
measure_rho_vcov
generates the variance-covariance matrix for the
change point locations.
summarize_rhos(rhos, control = list()) measure_rho_vcov(rhos)
summarize_rhos(rhos, control = list()) measure_rho_vcov(rhos)
rhos |
Matrix of change point locations (columns) across iterations of
the ptMCMC (rows) or |
control |
A |
summarize_rhos
: table of summary statistics for change point
locations including mean, median, mode, posterior interval, standard
deviation, MCMC error, autocorrelation, and effective sample size for
each change point location. measure_rho_vcov
: variance-covariance matrix for change
point locations.
rhos <- matrix(sample(80:100, 100, TRUE), 50, 2) summarize_rhos(rhos) measure_rho_vcov(rhos)
rhos <- matrix(sample(80:100, 100, TRUE), 50, 2) summarize_rhos(rhos) measure_rho_vcov(rhos)
This function handles the among-chain swapping based on
temperatures and likelihood differentials.
This function was designed to work within TS
and
specifically est_changepoints
. It is still hardcoded to do
so, but has the capacity to be generalized to work with any estimation
via ptMCMC with additional coding work.
swap_chains(chainsin, inputs, ids)
swap_chains(chainsin, inputs, ids)
chainsin |
Chain configuration to be evaluated for swapping. |
inputs |
Class |
ids |
The vector of integer chain ids. |
The ptMCMC algorithm couples the chains (which are taking their own walks on the distribution surface) through "swaps", where neighboring chains exchange configurations (Geyer 1991, Falcioni and Deem 1999) following the Metropolis criterion (Metropolis et al. 1953). This allows them to share information and search the surface in combination (Earl and Deem 2005).
list
of updated change points, log-likelihoods, and chain
ids, as well as a vector of acceptance indicators for each swap.
Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7: 3910-3916. link.
Falcioni, M. and M. W. Deem. 1999. A biased Monte Carlo scheme for zeolite structure solution. Journal of Chemical Physics 110: 1754-1766. link.
Geyer, C. J. 1991. Markov Chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface. pp 156-163. American Statistical Association, New York, USA. link.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. 1953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087-1092. link.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) }
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) data <- data[order(data[,"newmoon"]), ] saves <- prep_saves(1, TS_control()) inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control()) ids <- prep_ids(TS_control()) for(i in 1:TS_control()$nit){ steps <- step_chains(i, cpts, inputs) swaps <- swap_chains(steps, inputs, ids) saves <- update_saves(i, saves, steps, swaps) cpts <- update_cpts(cpts, swaps) ids <- update_ids(ids, swaps) }
Produce a trace plot for the parameter of interest (rho or
eta) as part of TS_diagnostics_plot
. A horizontal line
is added to show the median of the posterior.
trace_plot(x, ylab = "parameter value")
trace_plot(x, ylab = "parameter value")
x |
Vector of parameter values drawn from the posterior distribution, indexed to the iteration by the order of the vector. |
ylab |
|
NULL
.
trace_plot(rnorm(100, 0, 1))
trace_plot(rnorm(100, 0, 1))
This is the main interface function for the LDATS application
of Bayesian change point Time Series analyses (Christensen et al.
2018), which extends the model of Western and Kleykamp (2004;
see also Ruggieri 2013) to multinomial (proportional) response data using
softmax regression (Ripley 1996, Venables and Ripley 2002, Bishop 2006)
using a generalized linear modeling approach (McCullagh and Nelder 1989).
The models are fit using parallel tempering Markov Chain Monte Carlo
(ptMCMC) methods (Earl and Deem 2005) to locate change points and
neural networks (Ripley 1996, Venables and Ripley 2002, Bishop 2006) to
estimate regressors. check_TS_inputs
checks that the inputs to
TS
are of proper classes for a full analysis.
TS(data, formula = gamma ~ 1, nchangepoints = 0, timename = "time", weights = NULL, control = list()) check_TS_inputs(data, formula = gamma ~ 1, nchangepoints = 0, timename = "time", weights = NULL, control = list())
TS(data, formula = gamma ~ 1, nchangepoints = 0, timename = "time", weights = NULL, control = list()) check_TS_inputs(data, formula = gamma ~ 1, nchangepoints = 0, timename = "time", weights = NULL, control = list())
data |
|
formula |
|
nchangepoints |
|
timename |
|
weights |
Optional class |
control |
A |
TS
: TS_fit
-class list containing the following
elements, many of
which are hidden for print
ing, but are accessible:
data
input to the function.
formula
input to the function.
nchangepoints
input to the function.
weights
input to the function.
control
input to the function.
Iteration-by-iteration
logLik values for the
full time series fit by multinom_TS
.
Iteration-by-iteration change point estimates from
est_changepoints
.
Iteration-by-iteration marginal regressor estimates from
est_regressors
, which have been
unconditioned with respect to the change point locations.
ptMCMC diagnostics,
see diagnose_ptMCMC
Summary table describing rhos
(the change
point locations),
see summarize_rhos
.
Variance-covariance matrix for the estimates of
rhos
(the change point locations), see
measure_rho_vcov
.
Summary table describing ets
(the
regressors),
see summarize_etas
.
Variance-covariance matrix for the estimates of
etas
(the regressors), see
measure_eta_vcov
.
Across-iteration average of log-likelihoods
(lls
).
Total number of parameters in the full model, including the change point locations and regressors.
Penalized negative log-likelihood, based on
logLik
and nparams
.
check_TS_inputs
: An error message is thrown if any input
is not proper, else NULL
.
Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY, USA.
Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.
Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7: 3910-3916. link.
McCullagh, P. and J. A. Nelder. 1989. Generalized Linear Models. 2nd Edition. Chapman and Hall, New York, NY, USA.
Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.
Ruggieri, E. 2013. A Bayesian approach to detecting change points in climactic records. International Journal of Climatology 33:520-528. link.
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) check_TS_inputs(data, timename = "newmoon")
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) check_TS_inputs(data, timename = "newmoon")
This function provides a simple creation and definition of a
list used to control the time series model fit occurring within
TS
.
TS_control(memoise = TRUE, response = "gamma", lambda = 0, measurer = AIC, selector = min, ntemps = 6, penultimate_temp = 2^6, ultimate_temp = 1e+10, q = 0, nit = 10000, magnitude = 12, quiet = FALSE, burnin = 0, thin_frac = 1, summary_prob = 0.95, seed = NULL)
TS_control(memoise = TRUE, response = "gamma", lambda = 0, measurer = AIC, selector = min, ntemps = 6, penultimate_temp = 2^6, ultimate_temp = 1e+10, q = 0, nit = 10000, magnitude = 12, quiet = FALSE, burnin = 0, thin_frac = 1, summary_prob = 0.95, seed = NULL)
memoise |
|
response |
|
lambda |
|
measurer , selector
|
Function names for use in evaluation of the TS
models. |
ntemps |
|
penultimate_temp |
Penultimate temperature in the ptMCMC sequence. |
ultimate_temp |
Ultimate temperature in the ptMCMC sequence. |
q |
Exponent controlling the ptMCMC temperature sequence from the focal chain (reference with temperature = 1) to the penultimate chain. 0 (default) implies a geometric sequence. 1 implies squaring before exponentiating. |
nit |
|
magnitude |
Average magnitude (defining a geometric distribution) for the proposed step size in the ptMCMC algorithm. |
quiet |
|
burnin |
|
thin_frac |
Fraction of iterations to retain, must be |
summary_prob |
Probability used for summarizing the posterior
distributions (via the highest posterior density interval, see
|
seed |
Input to |
list
, with named elements corresponding to the arguments.
TS_control()
TS_control()
Plot 4-panel figures (showing trace plots, posterior ECDF,
posterior density, and iteration autocorrelation) for each of the
parameters (change point locations and regressors) fitted within a
multinomial time series model (fit by TS
). eta_diagnostics_plots
creates the diagnostic plots
for the regressors (etas) of a time series model. rho_diagnostics_plots
creates the diagnostic plots
for the change point locations (rho) of a time series model.
TS_diagnostics_plot(x, interactive = TRUE) eta_diagnostics_plots(x, interactive) rho_diagnostics_plots(x, interactive)
TS_diagnostics_plot(x, interactive = TRUE) eta_diagnostics_plots(x, interactive) rho_diagnostics_plots(x, interactive)
x |
Object of class |
interactive |
|
NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) TS_diagnostics_plot(TSmod)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) TS_diagnostics_plot(TSmod)
This is a wrapper function that expands the main Time Series
analyses function (TS
) across the LDA models (estimated
using LDA
or LDA_set
and the
Time Series models, with respect to both continuous time formulas and the
number of discrete changepoints. This function allows direct passage of
the control parameters for the parallel tempering MCMC through to the
main Time Series function, TS
, via the
ptMCMC_controls
argument. check_TS_on_LDA_inputs
checks that the inputs to
TS_on_LDA
are of proper classes for a full analysis.
TS_on_LDA(LDA_models, document_covariate_table, formulas = ~1, nchangepoints = 0, timename = "time", weights = NULL, control = list()) check_TS_on_LDA_inputs(LDA_models, document_covariate_table, formulas = ~1, nchangepoints = 0, timename = "time", weights = NULL, control = list())
TS_on_LDA(LDA_models, document_covariate_table, formulas = ~1, nchangepoints = 0, timename = "time", weights = NULL, control = list()) check_TS_on_LDA_inputs(LDA_models, document_covariate_table, formulas = ~1, nchangepoints = 0, timename = "time", weights = NULL, control = list())
LDA_models |
List of LDA models (class |
document_covariate_table |
Document covariate table (rows: documents,
columns: time index and covariate options). Every model needs a
covariate to describe the time value for each document (in whatever
units and whose name in the table is input in |
formulas |
Vector of |
nchangepoints |
Vector of |
timename |
|
weights |
Optional class |
control |
A |
TS_on_LDA
: TS_on_LDA
-class list
of results
from TS
applied for each model on each LDA model input.
check_TS_inputs
: An error message is thrown if any input
is not proper, else NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas, nchangepoints = 0:1, timename = "newmoon", weights)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2) LDA_models <- select_LDA(LDAs) weights <- document_weights(document_term_table) formulas <- c(~ 1, ~ newmoon) mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas, nchangepoints = 0:1, timename = "newmoon", weights)
Produces a two-panel figure of [1] the change point
distributions as histograms over time and [2] the time series of the
fitted topic proportions over time, based on a selected set of
change point locations. pred_gamma_TS_plot
produces a time series of the
fitted topic proportions over time, based on a selected set of change
point locations. rho_hist
: make a plot of the change point
distributions as histograms over time.
TS_summary_plot(x, cols = set_TS_summary_plot_cols(), bin_width = 1, xname = NULL, border = NA, selection = "median", LDATS = FALSE) pred_gamma_TS_plot(x, selection = "median", cols = set_gamma_colors(x), xname = NULL, together = FALSE, LDATS = FALSE) rho_hist(x, cols = set_rho_hist_colors(x$rhos), bin_width = 1, xname = NULL, border = NA, together = FALSE, LDATS = FALSE)
TS_summary_plot(x, cols = set_TS_summary_plot_cols(), bin_width = 1, xname = NULL, border = NA, selection = "median", LDATS = FALSE) pred_gamma_TS_plot(x, selection = "median", cols = set_gamma_colors(x), xname = NULL, together = FALSE, LDATS = FALSE) rho_hist(x, cols = set_rho_hist_colors(x$rhos), bin_width = 1, xname = NULL, border = NA, together = FALSE, LDATS = FALSE)
x |
Object of class |
cols |
|
bin_width |
Width of the bins used in the histograms, in units of the x-axis (the time variable used to fit the model). |
xname |
Label for the x-axis in the summary time series plot. Defaults
to |
border |
Border for the histogram, default is |
selection |
Indicator of the change points to use. Currently only defined for "median" and "mode". |
LDATS |
|
together |
|
NULL
.
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) TS_summary_plot(TSmod) pred_gamma_TS_plot(TSmod) rho_hist(TSmod)
data(rodents) document_term_table <- rodents$document_term_table document_covariate_table <- rodents$document_covariate_table LDA_models <- LDA_set(document_term_table, topics = 2)[[1]] data <- document_covariate_table data$gamma <- LDA_models@gamma weights <- document_weights(document_term_table) TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights) TS_summary_plot(TSmod) pred_gamma_TS_plot(TSmod) rho_hist(TSmod)
Verify that a time series can be broken into a set of chunks based on input change points.
verify_changepoint_locations(data, changepoints = NULL, timename = "time")
verify_changepoint_locations(data, changepoints = NULL, timename = "time")
data |
Class |
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to |
timename |
|
Logical indicator of the check passing TRUE
or failing
FALSE
.
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma verify_changepoint_locations(dct, changepoints = 100, timename = "newmoon")
data(rodents) dtt <- rodents$document_term_table lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE)) dct <- rodents$document_covariate_table dct$gamma <- lda[[1]]@gamma verify_changepoint_locations(dct, changepoints = 100, timename = "newmoon")