This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]

This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.

version: 2024.10.0
website: https://github.com/qiime2/q2-longitudinal
user support:
Please post to the QIIME 2 forum for help with this plugin: https://forum.qiime2.org
citations:
Bokulich et al., 2018

Actions

NameTypeShort Description
nmitmethodNonparametric microbial interdependence test
first-differencesmethodCompute first differences or difference from baseline between sequential states
first-distancesmethodCompute first distances or distance from baseline between sequential states
pairwise-differencesvisualizerPaired difference testing and boxplots
pairwise-distancesvisualizerPaired pairwise distance testing and boxplots
linear-mixed-effectsvisualizerLinear mixed effects modeling
anovavisualizerANOVA test
volatilityvisualizerGenerate interactive volatility plot
plot-feature-volatilityvisualizerPlot longitudinal feature volatility and importances
feature-volatilitypipelineFeature volatility analysis
maturity-indexpipelineMicrobial maturity index prediction.

Artifact Classes

SampleData[FirstDifferences]

Formats

FirstDifferencesFormat
FirstDifferencesDirectoryFormat


longitudinal nmit

Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065

Citations

Bokulich et al., 2018; Zhang et al., 2017

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to use for microbial interdependence test.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

corr_method: Str % Choices('kendall', 'pearson', 'spearman')

The temporal correlation test to be applied.[default: 'kendall']

dist_method: Str % Choices('fro', 'nuc')

Temporal distance method, see numpy.linalg.norm for details.[default: 'fro']

Outputs

distance_matrix: DistanceMatrix

The resulting distance matrix.[required]


longitudinal first-differences

Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for computing first differences.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

Outputs

first_differences: SampleData[FirstDifferences]

Series of first differences.[required]


longitudinal first-distances

Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

baseline: Float

A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

first_distances: SampleData[FirstDifferences]

Series of first distances.[required]


longitudinal pairwise-differences

Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table to optionally use for paired comparisons.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

metric: Str

Numerical metadata or artifact column to test.[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

group_column: Str

Metadata column on which to separate groups for comparison[optional]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal pairwise-distances

Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.

Citations

Bokulich et al., 2018

Inputs

distance_matrix: DistanceMatrix

Matrix of distances between pairs of samples.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

group_column: Str

Metadata column on which to separate groups for comparison[required]

state_column: Str

Metadata column containing state (e.g., Time) across which samples are paired.[required]

state_1: Str

Baseline state column value.[required]

state_2: Str

State column value to pair with baseline.[required]

individual_id_column: Str

Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]

parametric: Bool

Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default: False]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

replicate_handling: Str % Choices('error', 'random', 'drop')

Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal linear-mixed-effects

Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.

Citations

Bokulich et al., 2018; Seabold & Perktold, 2010

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metric.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[required]

metric: Str

Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]

group_columns: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]

random_effects: Str

Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]

palette: Str % Choices('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')

Color palette to use for generating boxplots.[default: 'Set1']

lowess: Bool

Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default: False]

ci: Float % Range(0, 100)

Size of the confidence interval for the regression estimate.[default: 95]

formula: Str

R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]

Outputs

visualization: Visualization

<no description>[required]


longitudinal anova

Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.

Citations

Bokulich et al., 2018

Parameters

metadata: Metadata

Sample metadata containing formula terms.[required]

formula: Str

R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://patsy.readthedocs.io/en/latest/formulas.html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required]

sstype: Str % Choices('I', 'II', 'III')

Type of sum of squares calculation to perform (I, II, or III).[default: 'II']

repeated_measures: Bool

Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default: False]

individual_id_column: Str

The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]

rm_aggregate: Bool

If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default: False]

Outputs

visualization: Visualization

<no description>[required]


longitudinal volatility

Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing metrics.[optional]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

default_metric: Str

Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

Outputs

visualization: Visualization

<no description>[required]

Examples

longitudinal_volatility

[Command Line]
[Python API]
[Galaxy]
[R API]
[View Source]
wget -O 'metadata.tsv' \
  'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'

qiime longitudinal volatility \
  --m-metadata-file metadata.tsv \
  --p-state-column month \
  --o-visualization volatility-plot.qzv

longitudinal plot-feature-volatility

Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[RelativeFrequency]

Feature table containing features found in importances.[required]

importances: FeatureData[Importance]

Feature importance scores.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata column containing state (time) variable information.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

default_group_column: Str

The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]

yscale: Str % Choices('linear', 'pow', 'sqrt', 'log')

y-axis scaling strategy to apply.[default: 'linear']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

Outputs

visualization: Visualization

<no description>[required]


longitudinal feature-volatility

Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.

Citations

Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

Sample metadata file containing individual_id_column.[required]

state_column: Str

Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]

individual_id_column: Str

Metadata column containing IDs for individual subjects.[optional]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Estimator method to use for sample prediction.[default: 'RandomForestRegressor']

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

importance_threshold: Float % Range(0, None, inclusive_start=False) | Str % Choices('q1', 'q2', 'q3')

Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]

feature_count: Int % Range(1, None) | Str % Choices('all')

Filter feature table to include top N most important features. Set to "all" to include all features.[default: 100]

Outputs

filtered_table: FeatureTable[RelativeFrequency]

Feature table containing only important features.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

volatility_plot: Visualization

Interactive volatility plot visualization.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

sample_estimator: SampleEstimator[Regressor]

Trained sample regressor.[required]


longitudinal maturity-index

Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.

Citations

Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018

Inputs

table: FeatureTable[Frequency]

Feature table containing all features that should be used for target prediction.[required]

Parameters

metadata: Metadata

<no description>[required]

state_column: Str

Numeric metadata column containing sampling time (state) data to use as prediction target.[required]

group_by: Str

Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]

control: Str

Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]

individual_id_column: Str

Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]

estimator: Str % Choices('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')

Regression model to use for prediction.[default: 'RandomForestRegressor']

n_estimators: Int % Range(1, None)

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default: 100]

test_size: Float % Range(0.0, 1.0)

Fraction of input samples to exclude from training set and use for classifier testing.[default: 0.5]

step: Float % Range(0.0, 1.0, inclusive_start=False)

If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default: 0.05]

cv: Int % Range(1, None)

Number of k-fold cross-validations to perform.[default: 5]

random_state: Int

Seed used by random number generator.[optional]

n_jobs: Threads

Number of jobs to run in parallel.[default: 1]

parameter_tuning: Bool

Automatically tune hyperparameters using random grid search.[default: False]

optimize_feature_selection: Bool

Automatically optimize input feature selection using recursive feature elimination.[default: False]

stratify: Bool

Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default: False]

missing_samples: Str % Choices('error', 'ignore')

How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default: 'error']

feature_count: Int % Range(0, None)

Filter feature table to include top N most important features. Set to zero to include all features.[default: 50]

Outputs

sample_estimator: SampleEstimator[Regressor]

Trained sample estimator.[required]

feature_importance: FeatureData[Importance]

Importance of each input feature to model accuracy.[required]

predictions: SampleData[RegressorPredictions]

Predicted target values for each input sample.[required]

model_summary: Visualization

Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]

accuracy_results: Visualization

Accuracy results visualization.[required]

maz_scores: SampleData[RegressorPredictions]

Microbiota-for-age z-score predictions.[required]

clustermap: Visualization

Heatmap of important feature abundance at each time point in each group.[required]

volatility_plots: Visualization

Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]