This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
This QIIME 2 plugin supports methods for analysis of time series data, involving either paired sample comparisons or longitudinal study designs.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -longitudinal - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org - citations:
- Bokulich et al., 2018
Actions¶
Name | Type | Short Description |
---|---|---|
nmit | method | Nonparametric microbial interdependence test |
first-differences | method | Compute first differences or difference from baseline between sequential states |
first-distances | method | Compute first distances or distance from baseline between sequential states |
pairwise-differences | visualizer | Paired difference testing and boxplots |
pairwise-distances | visualizer | Paired pairwise distance testing and boxplots |
linear-mixed-effects | visualizer | Linear mixed effects modeling |
anova | visualizer | ANOVA test |
volatility | visualizer | Generate interactive volatility plot |
plot-feature-volatility | visualizer | Plot longitudinal feature volatility and importances |
feature-volatility | pipeline | Feature volatility analysis |
maturity-index | pipeline | Microbial maturity index prediction. |
Artifact Classes¶
SampleData[FirstDifferences] |
Formats¶
FirstDifferencesFormat |
FirstDifferencesDirectoryFormat |
longitudinal nmit¶
Perform nonparametric microbial interdependence test to determine longitudinal sample similarity as a function of temporal microbial composition. For more details and citation, please see doi.org/10.1002/gepi.22065
Citations¶
Bokulich et al., 2018; Zhang et al., 2017
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to use for microbial interdependence test.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- corr_method:
Str
%
Choices
('kendall', 'pearson', 'spearman')
The temporal correlation test to be applied.[default:
'kendall'
]- dist_method:
Str
%
Choices
('fro', 'nuc')
Temporal distance method, see numpy.linalg.norm for details.[default:
'fro'
]
Outputs¶
- distance_matrix:
DistanceMatrix
The resulting distance matrix.[required]
longitudinal first-differences¶
Calculates first differences in "metric" between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. First differences can be performed on a metadata column (including artifacts that can be input as metadata) or a feature in a feature table. Outputs a data series of first differences for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired differences between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first differences across time or among groups of subjects. Also supports differences from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for computing first differences.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static differences instead of first differences (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample differences at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
Outputs¶
- first_differences:
SampleData[FirstDifferences]
Series of first differences.[required]
longitudinal first-distances¶
Calculates first distances between sequential states for samples collected from individual subjects sampled repeatedly at two or more states. This method is similar to the "first differences" method, except that it requires a distance matrix as input and calculates first differences as distances between successive states. Outputs a data series of first distances for each individual subject at each sequential pair of states, labeled by the SampleID of the second state (e.g., paired distances between time 0 and time 1 would be labeled by the SampleIDs at time 1). This file can be used as input to linear mixed effects models or other longitudinal or diversity methods to compare changes in first distances across time or among groups of subjects. Also supports distance from baseline (or other static comparison state) by setting the "baseline" parameter.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- baseline:
Float
A value listed in the state_column metadata column against which all other states should be compared. Toggles calculation of static distances instead of first distances (which are calculated if no value is given for baseline). If a "baseline" value is provided, sample distances at each state are compared against the baseline state, instead of the previous state. Must be a value listed in the state_column.[optional]
- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- first_distances:
SampleData[FirstDifferences]
Series of first distances.[required]
longitudinal pairwise-differences¶
Performs paired difference testing between samples from each subject. Sample pairs may represent a typical intervention study (e.g., samples collected pre- and post-treatment), paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different treatments. This action tests whether the change in a numeric metadata value "metric" differs from zero and differs between groups (e.g., groups of subjects receiving different treatments), and produces boxplots of paired difference distributions for each group. Note that "metric" can be derived from a feature table or metadata.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table to optionally use for paired comparisons.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- metric:
Str
Numerical metadata or artifact column to test.[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[optional]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal pairwise-distances¶
Performs pairwise distance testing between sample pairs from each subject. Sample pairs may represent a typical intervention study, e.g., samples collected pre- and post-treatment; paired samples from two different timepoints (e.g., in a longitudinal study design), or identical samples receiving different two different treatments. This action tests whether the pairwise distance between each subject pair differs between groups (e.g., groups of subjects receiving different treatments) and produces boxplots of paired distance distributions for each group.
Citations¶
Inputs¶
- distance_matrix:
DistanceMatrix
Matrix of distances between pairs of samples.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- group_column:
Str
Metadata column on which to separate groups for comparison[required]
- state_column:
Str
Metadata column containing state (e.g., Time) across which samples are paired.[required]
- state_1:
Str
Baseline state column value.[required]
- state_2:
Str
State column value to pair with baseline.[required]
- individual_id_column:
Str
Metadata column containing subject IDs to use for pairing samples. WARNING: if replicates exist for an individual ID at either state_1 or state_2, that subject will be dropped and reported in standard output by default. Set replicate_handling="random" to instead randomly select one member.[required]
- parametric:
Bool
Perform parametric (ANOVA and t-tests) or non-parametric (Kruskal-Wallis, Wilcoxon, and Mann-Whitney U tests) statistical tests.[default:
False
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- replicate_handling:
Str
%
Choices
('error', 'random', 'drop')
Choose how replicate samples are handled. If replicates are detected, "error" causes method to fail; "drop" will discard all replicated samples; "random" chooses one representative at random from among replicates.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal linear-mixed-effects¶
Linear mixed effects models evaluate the contribution of exogenous covariates "group_columns" and "random_effects" to a single dependent variable, "metric". Perform LME and plot line plots of each group column. A feature table artifact is required input, though whether "metric" is derived from the feature table or metadata is optional.
Citations¶
Bokulich et al., 2018; Seabold & Perktold, 2010
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metric.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[required]
- metric:
Str
Dependent variable column name. Must be a column name located in the metadata or feature table files.[optional]
- group_columns:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine mean structure of "metric".[optional]
- random_effects:
Str
Comma-separated list (without spaces) of metadata columns to use as independent covariates used to determine the variance and covariance structure (random effects) of "metric". To add a random slope, the same value passed to "state_column" should be passed here. A random intercept for each individual is set by default and does not need to be passed here.[optional]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow', 'cividis')
Color palette to use for generating boxplots.[default:
'Set1'
]- lowess:
Bool
Estimate locally weighted scatterplot smoothing. Note that this will eliminate confidence interval plotting.[default:
False
]- ci:
Float
%
Range
(0, 100)
Size of the confidence interval for the regression estimate.[default:
95
]- formula:
Str
R-style formula to use for model specification. A formula must be used if the "metric" parameter is None. Note that the metric and group columns specified in the formula will override metric and group columns that are passed separately as parameters to this method. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal anova¶
Perform an ANOVA test on any factors present in a metadata file and/or metadata-transformable artifacts. This is followed by pairwise t-tests to examine pairwise differences between categorical sample groups.
Citations¶
Parameters¶
- metadata:
Metadata
Sample metadata containing formula terms.[required]
- formula:
Str
R-style formula specifying the model. All terms must be present in the sample metadata or metadata-transformable artifacts and can be continuous or categorical metadata columns. Formulae will be in the format "a ~ b + c", where "a" is the metric (dependent variable) and "b" and "c" are independent covariates. Use "+" to add a variable; "+ a:b" to add an interaction between variables a and b; "*" to include a variable and all interactions; and "-" to subtract a particular term (e.g., an interaction term). See https://
patsy .readthedocs .io /en /latest /formulas .html for full documentation of valid formula operators. Always enclose formulae in quotes to avoid unpleasant surprises.[required] - sstype:
Str
%
Choices
('I', 'II', 'III')
Type of sum of squares calculation to perform (I, II, or III).[default:
'II'
]- repeated_measures:
Bool
Perform ANOVA as a repeated measures ANOVA. Implemented via statsmodels, which has the following limitations: Currently, only fully balanced within-subject designs are supported. Calculation of between-subject effects and corrections for violation of sphericity are not yet implemented.[default:
False
]- individual_id_column:
Str
The column containing individual ID with repeated measures to account for.This should not be included in the formula.[optional]
- rm_aggregate:
Bool
If the data set contains more than a single observation per individual id and cell of the specified model, this function will be used to aggregate the data by the mean before running the ANOVA. Only applicable for repeated measures ANOVA. [default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal volatility¶
Generate an interactive control chart depicting the longitudinal volatility of sample metadata and/or feature frequencies across time (as set using the "state_column" parameter). Any numeric metadata column (and metadata-transformable artifacts, e.g., alpha diversity results) can be plotted on the y-axis, and are selectable using the "metric_column" selector. Metric values are averaged to compare across any categorical metadata column using the "group_column" selector. Longitudinal volatility for individual subjects sampled over time is co-plotted as "spaghetti" plots if the "individual_id_column" parameter is used. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing metrics.[optional]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- default_metric:
Str
Numeric metadata or artifact column to test by default (all numeric metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
Examples¶
longitudinal_volatility¶
wget -O 'metadata.tsv' \
'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
qiime longitudinal volatility \
--m-metadata-file metadata.tsv \
--p-state-column month \
--o-visualization volatility-plot.qzv
from qiime2 import Metadata
from urllib import request
import qiime2.plugins.longitudinal.actions as longitudinal_actions
url = 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn = 'metadata.tsv'
request.urlretrieve(url, fn)
metadata_md = Metadata.load(fn)
volatility_plot_viz, = longitudinal_actions.volatility(
metadata=metadata_md,
state_column='month',
)
- Using the
Upload Data
tool: - On the first tab (Regular), press the
Paste/Fetch
data button at the bottom.- Set "Name" (first text-field) to:
metadata.tsv
- In the larger text-area, copy-and-paste: https://
amplicon -docs .qiime2 .org /en /latest /data /examples /longitudinal /volatility /1 /metadata .tsv - ("Type", "Genome", and "Settings" can be ignored)
- Set "Name" (first text-field) to:
- Press the
Start
button at the bottom.
- On the first tab (Regular), press the
- Using the
qiime2 longitudinal volatility
tool: - For "metadata":
- Perform the following steps.
- Leave as
Metadata from TSV
- Set "Metadata Source" to
metadata.tsv
- Leave as
- Perform the following steps.
- Set "state_column" to
month
- Press the
Execute
button.
- For "metadata":
- Once completed, for the new entry in your history, use the
Edit
button to set the name as follows: - (Renaming is optional, but it will make any subsequent steps easier to complete.)
History Name "Name" to set (be sure to press [Save]) #: qiime2 longitudinal volatility [...] : visualization.qzv
volatility-plot.qzv
library(reticulate)
Metadata <- import("qiime2")$Metadata
longitudinal_actions <- import("qiime2.plugins.longitudinal.actions")
request <- import("urllib")$request
url <- 'https://amplicon-docs.qiime2.org/en/latest/data/examples/longitudinal/volatility/1/metadata.tsv'
fn <- 'metadata.tsv'
request$urlretrieve(url, fn)
metadata_md <- Metadata$load(fn)
action_results <- longitudinal_actions$volatility(
metadata=metadata_md,
state_column='month',
)
volatility_plot_viz <- action_results$visualization
from q2_longitudinal._examples import longitudinal_volatility
longitudinal_volatility(use)
longitudinal plot-feature-volatility¶
Plots an interactive control chart of feature abundances (y-axis) in each sample across time (or state; x-axis). Feature importance scores and descriptive statistics for each feature are plotted in interactive bar charts below the control chart, facilitating exploration of longitudinal feature data. This visualization is intended for use with the feature-volatility pipeline; use that pipeline to access this visualization.
Citations¶
Inputs¶
- table:
FeatureTable[RelativeFrequency]
Feature table containing features found in importances.[required]
- importances:
FeatureData[Importance]
Feature importance scores.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata column containing state (time) variable information.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- default_group_column:
Str
The default metadata column on which to separate groups for comparison (all categorical metadata columns will be available in the visualization).[optional]
- yscale:
Str
%
Choices
('linear', 'pow', 'sqrt', 'log')
y-axis scaling strategy to apply.[default:
'linear'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]
Outputs¶
- visualization:
Visualization
<no description>[required]
longitudinal feature-volatility¶
Identify features that are predictive of a numeric metadata column, state_column (e.g., time), and plot their relative frequencies across states using interactive feature volatility plots. A supervised learning regressor is used to identify important features and assess their ability to predict sample states. state_column will typically be a measure of time, but any numeric metadata column can be used.
Citations¶
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
Sample metadata file containing individual_id_column.[required]
- state_column:
Str
Metadata containing collection time (state) values for each sample. Must contain exclusively numeric values.[required]
- individual_id_column:
Str
Metadata column containing IDs for individual subjects.[optional]
- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Estimator method to use for sample prediction.[default:
'RandomForestRegressor'
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- importance_threshold:
Float
%
Range
(0, None, inclusive_start=False)
|
Str
%
Choices
('q1', 'q2', 'q3')
Filter feature table to exclude any features with an importance score less than this threshold. Set to "q1", "q2", or "q3" to select the first, second, or third quartile of values. Set to "None" to disable this filter.[optional]
- feature_count:
Int
%
Range
(1, None)
|
Str
%
Choices
('all')
Filter feature table to include top N most important features. Set to "all" to include all features.[default:
100
]
Outputs¶
- filtered_table:
FeatureTable[RelativeFrequency]
Feature table containing only important features.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- volatility_plot:
Visualization
Interactive volatility plot visualization.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- sample_estimator:
SampleEstimator[Regressor]
Trained sample regressor.[required]
longitudinal maturity-index¶
Calculates a "microbial maturity" index from a regression model trained on feature data to predict a given continuous metadata column, e.g., to predict age as a function of microbiota composition. The model is trained on a subset of control group samples, then predicts the column value for all samples. This visualization computes maturity index z-scores to compare relative "maturity" between each group, as described in doi:10.1038/nature13421. This method can be used to predict between-group differences in relative trajectory across any type of continuous metadata gradient, e.g., intestinal microbiome development by age, microbial succession during wine fermentation, or microbial community differences along environmental gradients, as a function of two or more different "treatment" groups.
Citations¶
Bokulich et al., 2018; Subramanian et al., 2014; Bokulich et al., 2018
Inputs¶
- table:
FeatureTable[Frequency]
Feature table containing all features that should be used for target prediction.[required]
Parameters¶
- metadata:
Metadata
<no description>[required]
- state_column:
Str
Numeric metadata column containing sampling time (state) data to use as prediction target.[required]
- group_by:
Str
Categorical metadata column to use for plotting and significance testing between main treatment groups.[required]
- control:
Str
Value of group_by to use as control group. The regression model will be trained using only control group data, and the maturity scores of other groups consequently will be assessed relative to this group.[required]
- individual_id_column:
Str
Optional metadata column containing IDs for individual subjects. Adds individual subject (spaghetti) vectors to volatility charts if a column name is provided.[optional]
- estimator:
Str
%
Choices
('RandomForestRegressor', 'ExtraTreesRegressor', 'GradientBoostingRegressor', 'AdaBoostRegressor[DecisionTree]', 'AdaBoostRegressor[ExtraTrees]', 'ElasticNet', 'Ridge', 'Lasso', 'KNeighborsRegressor', 'LinearSVR', 'SVR')
Regression model to use for prediction.[default:
'RandomForestRegressor'
]- n_estimators:
Int
%
Range
(1, None)
Number of trees to grow for estimation. More trees will improve predictive accuracy up to a threshold level, but will also increase time and memory requirements. This parameter only affects ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.[default:
100
]- test_size:
Float
%
Range
(0.0, 1.0)
Fraction of input samples to exclude from training set and use for classifier testing.[default:
0.5
]- step:
Float
%
Range
(0.0, 1.0, inclusive_start=False)
If optimize_feature_selection is True, step is the percentage of features to remove at each iteration.[default:
0.05
]- cv:
Int
%
Range
(1, None)
Number of k-fold cross-validations to perform.[default:
5
]- random_state:
Int
Seed used by random number generator.[optional]
- n_jobs:
Threads
Number of jobs to run in parallel.[default:
1
]- parameter_tuning:
Bool
Automatically tune hyperparameters using random grid search.[default:
False
]- optimize_feature_selection:
Bool
Automatically optimize input feature selection using recursive feature elimination.[default:
False
]- stratify:
Bool
Evenly stratify training and test data among metadata categories. If True, all values in column must match at least two samples.[default:
False
]- missing_samples:
Str
%
Choices
('error', 'ignore')
How to handle missing samples in metadata. "error" will fail if missing samples are detected. "ignore" will cause the feature table and metadata to be filtered, so that only samples found in both files are retained.[default:
'error'
]- feature_count:
Int
%
Range
(0, None)
Filter feature table to include top N most important features. Set to zero to include all features.[default:
50
]
Outputs¶
- sample_estimator:
SampleEstimator[Regressor]
Trained sample estimator.[required]
- feature_importance:
FeatureData[Importance]
Importance of each input feature to model accuracy.[required]
- predictions:
SampleData[RegressorPredictions]
Predicted target values for each input sample.[required]
- model_summary:
Visualization
Summarized parameter and (if enabled) feature selection information for the trained estimator.[required]
- accuracy_results:
Visualization
Accuracy results visualization.[required]
- maz_scores:
SampleData[RegressorPredictions]
Microbiota-for-age z-score predictions.[required]
- clustermap:
Visualization
Heatmap of important feature abundance at each time point in each group.[required]
- volatility_plots:
Visualization
Interactive volatility plots of MAZ and maturity scores, target (column) predictions, and the sample metadata.[required]
- Links
- Documentation
- Source Code
- Stars
- 9
- Last Commit
- 3131278
- Available Distros
- 2024.10
- 2024.10/amplicon
- 2024.10/metagenome
- 2024.10/pathogenome
- 2024.5
- 2024.5/amplicon
- 2024.5/metagenome
- 2024.2
- 2024.2/amplicon
- 2023.9
- 2023.9/amplicon
- 2023.7
- 2023.7/core