This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
This QIIME 2 plugin supports methods for assessing and controlling the quality of feature and sequence data.
- version:
2024.10.0
- website: https://
github .com /qiime2 /q2 -quality -control - user support:
- Please post to the QIIME 2 forum for help with this plugin: https://
forum .qiime2 .org
Actions¶
Name | Type | Short Description |
---|---|---|
exclude-seqs | method | Exclude sequences by alignment |
filter-reads | method | Filter demultiplexed sequences by alignment to reference database. |
bowtie2-build | method | Build bowtie2 index from reference sequences. |
decontam-identify | method | Identify contaminants |
decontam-remove | method | Remove contaminants |
evaluate-composition | visualizer | Evaluate expected vs. observed taxonomic composition of samples |
evaluate-seqs | visualizer | Compare query (observed) vs. reference (expected) sequences. |
evaluate-taxonomy | visualizer | Evaluate expected vs. observed taxonomic assignments |
decontam-score-viz | visualizer | Generate a histogram representation of the scores |
decontam-identify-batches | pipeline | Identify contaminants in Batch Mode |
Artifact Classes¶
FeatureData[DecontamScore] |
Formats¶
DecontamScoreFormat |
DecontamScoreDirFmt |
quality-control exclude-seqs¶
This method aligns feature sequences to a set of reference sequences to identify sequences that hit/miss the reference within a specified perc_identity, evalue, and perc_query_aligned. This method could be used to define a positive filter, e.g., extract only feature sequences that align to a certain clade of bacteria; or to define a negative filter, e.g., identify sequences that align to contaminant or human DNA sequences that should be excluded from subsequent analyses. Note that filtering is performed based on the perc_identity, perc_query_aligned, and evalue thresholds (the latter only if method==BLAST and an evalue is set). Set perc_identity==0 and/or perc_query_aligned==0 to disable these filtering thresholds as necessary.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- method:
Str
%
Choices
('blast', 'blastn-short')
|
Str
%
Choices
('vsearch')
Alignment method to use for matching feature sequences against reference sequences[default:
'blast'
]- perc_identity:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Reject match if percent identity to reference is lower. Must be in range [0.0, 1.0][default:
0.97
]- evalue:
Float
BLAST expectation (E) value threshold for saving hits. Reject if E value is higher than threshold. This threshold is disabled by default.[optional]
- perc_query_aligned:
Float
Percent of query sequence that must align to reference in order to be accepted as a hit.[default:
0.97
]- threads:
Threads
Number of threads to use. Only applies to vsearch method.[default:
1
]- left_justify:
Bool
%
Choices
(False)
|
Bool
Reject match if the pairwise alignment begins with gaps[default:
False
]
Outputs¶
- sequence_hits:
FeatureData[Sequence]
Subset of feature sequences that align to reference sequences[required]
- sequence_misses:
FeatureData[Sequence]
Subset of feature sequences that do not align to reference sequences[required]
quality-control filter-reads¶
Filter out (or keep) demultiplexed single- or paired-end sequences that align to a reference database, using bowtie2 and samtools. This method can be used to filter out human DNA sequences and other contaminants in any FASTQ sequence data (e.g., shotgun genome or amplicon sequence data), or alternatively (when exclude_seqs is False) to only keep sequences that do align to the reference.
Citations¶
Langmead & Salzberg, 2012; Li et al., 2009
Inputs¶
- demultiplexed_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The sequences to be trimmed.[required]
- database:
Bowtie2Index
Bowtie2 indexed database.[required]
Parameters¶
- n_threads:
Threads
Number of alignment threads to launch.[default:
1
]- mode:
Str
%
Choices
('local', 'global')
Bowtie2 alignment settings. See bowtie2 manual for more details.[default:
'local'
]- sensitivity:
Str
%
Choices
('very-fast', 'fast', 'sensitive', 'very-sensitive')
Bowtie2 alignment sensitivity. See bowtie2 manual for details.[default:
'sensitive'
]- ref_gap_open_penalty:
Int
%
Range
(1, None)
Reference gap open penalty.[default:
5
]- ref_gap_ext_penalty:
Int
%
Range
(1, None)
Reference gap extend penalty.[default:
3
]- exclude_seqs:
Bool
Exclude sequences that align to reference. Set this option to False to exclude sequences that do not align to the reference database.[default:
True
]
Outputs¶
- filtered_sequences:
SampleData[SequencesWithQuality¹ | PairedEndSequencesWithQuality²]
The resulting filtered sequences.[required]
quality-control bowtie2-build¶
Build bowtie2 index from reference sequences.
Citations¶
Langmead & Salzberg, 2012
Inputs¶
- sequences:
FeatureData[Sequence]
Reference sequences used to build bowtie2 index.[required]
Parameters¶
- n_threads:
Threads
Number of threads to launch.[default:
1
]
Outputs¶
- database:
Bowtie2Index
Bowtie2 index.[required]
quality-control decontam-identify¶
This method identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[default:
'prevalence'
]- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
Outputs¶
- decontam_scores:
FeatureData[DecontamScore]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
quality-control decontam-remove¶
Remove contaminant sequences from a feature table and the associated representative sequences.
Inputs¶
- decontam_scores:
FeatureData[DecontamScore]
Pre-feature decontam scores.[required]
- table:
FeatureTable[Frequency]
Feature table from which contaminants will be removed.[required]
- rep_seqs:
FeatureData[Sequence]
Feature representative sequences from which contaminants will be removed.[required]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Decontam score threshold. Features with a score less than or equal to this threshold will be removed.[default:
0.1
]
Outputs¶
- filtered_table:
FeatureTable[Frequency]
Feature table with contaminants removed.[required]
- filtered_rep_seqs:
FeatureData[Sequence]
Feature representative sequences with contaminants removed.[required]
quality-control evaluate-composition¶
This visualizer compares the feature composition of pairs of observed and expected samples containing the same sample ID in two separate feature tables. Typically, feature composition will consist of taxonomy classifications or other semicolon-delimited feature annotations. Taxon accuracy rate, taxon detection rate, and linear regression scores between expected and observed observations are calculated at each semicolon-delimited rank, and plots of per-level accuracy and observation correlations are plotted. A histogram of distance between false positive observations and the nearest expected feature is also generated, where distance equals the number of rank differences between the observed feature and the nearest common lineage in the expected feature. This visualizer is most suitable for testing per-run data quality on sequencing runs that contain mock communities or other samples with known composition. Also suitable for sanity checks of bioinformatics pipeline performance.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_features:
FeatureTable[RelativeFrequency]
Expected feature compositions[required]
- observed_features:
FeatureTable[RelativeFrequency]
Observed feature compositions[required]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[default:
7
]- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- plot_tar:
Bool
Plot taxon accuracy rate (TAR) on score plot. TAR is the number of true positive features divided by the total number of observed features (TAR = true positives / (true positives + false positives)).[default:
True
]- plot_tdr:
Bool
Plot taxon detection rate (TDR) on score plot. TDR is the number of true positive features divided by the total number of expected features (TDR = true positives / (true positives + false negatives)).[default:
True
]- plot_r_value:
Bool
Plot expected vs. observed linear regression r value on score plot.[default:
False
]- plot_r_squared:
Bool
Plot expected vs. observed linear regression r-squared value on score plot.[default:
True
]- plot_bray_curtis:
Bool
Plot expected vs. observed Bray-Curtis dissimilarity scores on score plot.[default:
False
]- plot_jaccard:
Bool
Plot expected vs. observed Jaccard distances scores on score plot.[default:
False
]- plot_observed_features:
Bool
Plot observed features count on score plot.[default:
False
]- plot_observed_features_ratio:
Bool
Plot ratio of observed:expected features on score plot.[default:
True
]- metadata:
MetadataColumn
[
Categorical
]
Optional sample metadata that maps observed_features sample IDs to expected_features sample IDs.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-seqs¶
This action aligns a set of query (e.g., observed) sequences against a set of reference (e.g., expected) sequences to evaluate the quality of alignment. The intended use is to align observed sequences against expected sequences (e.g., from a mock community) to determine the frequency of mismatches between observed sequences and the most similar expected sequences, e.g., as a measure of sequencing/method error. However, any sequences may be provided as input to generate a report on pairwise alignment quality against a set of reference sequences.
Citations¶
Inputs¶
- query_sequences:
FeatureData[Sequence]
Sequences to test for exclusion[required]
- reference_sequences:
FeatureData[Sequence]
Reference sequences to align against feature sequences[required]
Parameters¶
- show_alignments:
Bool
Option to plot pairwise alignments of query sequences and their top hits.[default:
False
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control evaluate-taxonomy¶
This visualizer compares a pair of observed and expected taxonomic assignments to calculate precision, recall, and F-measure at each taxonomic level, up to maximum level specified by the depth parameter. These metrics are calculated at each semicolon-delimited rank. This action is useful for comparing the accuracy of taxonomic assignment, e.g., between different taxonomy classifiers or other bioinformatics methods. Expected taxonomies should be derived from simulated or mock community sequences that have known taxonomic affiliations.
Citations¶
Bokulich et al., 2018
Inputs¶
- expected_taxa:
FeatureData[Taxonomy]
Expected taxonomic assignments[required]
- observed_taxa:
FeatureData[Taxonomy]
Observed taxonomic assignments[required]
- feature_table:
FeatureTable[RelativeFrequency]
Optional feature table containing relative frequency of each feature, used to weight accuracy scores by frequency. Must contain all features found in expected and/or observed taxa. Features found in the table but not the expected/observed taxa will be dropped prior to analysis.[optional]
Parameters¶
- depth:
Int
Maximum depth of semicolon-delimited taxonomic ranks to test (e.g., 1 = root, 7 = species for the greengenes reference sequence database).[required]
- palette:
Str
%
Choices
('Set1', 'Set2', 'Set3', 'Pastel1', 'Pastel2', 'Paired', 'Accent', 'Dark2', 'tab10', 'tab20', 'tab20b', 'tab20c', 'viridis', 'plasma', 'inferno', 'magma', 'terrain', 'rainbow')
Color palette to utilize for plotting.[default:
'Set1'
]- require_exp_ids:
Bool
Require that all features found in observed taxa must be found in expected taxa or raise error.[default:
True
]- require_obs_ids:
Bool
Require that all features found in expected taxa must be found in observed taxa or raise error.[default:
True
]- sample_id:
Str
Optional sample ID to use for extracting frequency data from feature table, and for labeling accuracy results. If no sample_id is provided, feature frequencies are derived from the sum of all samples present in the feature table.[optional]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-score-viz¶
Creates histogram based on the output of decontam identify
Inputs¶
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
Output from decontam identify to be visualized[required]
- table:
Collection
[
FeatureTable[Frequency]
]
Raw OTU/ASV table that was used as input to decontam-identify[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate sequences will be removed from[optional]
Parameters¶
- threshold:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
%
Range
(0.0, 1.0, inclusive_end=True)
Select bin size for the histogram[default:
0.02
]
Outputs¶
- visualization:
Visualization
<no description>[required]
quality-control decontam-identify-batches¶
This method breaks an ASV table into batches based on the given metadata and identifies contaminant sequences from an OTU or ASV table and reports them to the user
Inputs¶
- table:
FeatureTable[Frequency]
Feature table which contaminate sequences will be identified from[required]
- rep_seqs:
FeatureData[Sequence]
Representative Sequences table which contaminate seqeunces will be removed from[optional]
Parameters¶
- metadata:
Metadata
metadata file indicating which samples in the experiment are control samples, assumes sample names in file correspond to the
table
input parameter[required]- split_column:
Str
input metadata columns that you wish to subset the ASV table byNote: Column names must be in quotes and delimited by a space[required]
- method:
Str
%
Choices
('combined', 'frequency', 'prevalence')
Select how to which method to id contaminants with; Prevalence: Utilizes control ASVs/OTUs to identify contaminants, Frequency: Utilizes sample concentration information to identify contaminants, Combined: Utilizes both Prevalence and Frequency methods when identifying contaminants[required]
- filter_empty_features:
Bool
If true, features which are not present in a split feature table are dropped.[optional]
- freq_concentration_column:
Str
Input column name that has concentration information for the samples[optional]
- prev_control_column:
Str
Input column name containing experimental or control sample metadata[optional]
- prev_control_indicator:
Str
indicate the control sample identifier (e.g. "control" or "blank")[optional]
- threshold:
Float
Select threshold cutoff for decontam algorithm scores[default:
0.1
]- weighted:
Bool
weight the decontam scores by their associated read number[default:
True
]- bin_size:
Float
Select bin size for the histogram[default:
0.02
]
Outputs¶
- batch_subset_tables:
Collection
[
FeatureTable[Frequency]
]
Directory where feature tables split based on metadata and parameter split_column values should be written.[required]
- decontam_scores:
Collection
[
FeatureData[DecontamScore]
]
The resulting table of scores from the decontam algorithm which scores each feature on how likely they are to be a contaminant sequence[required]
- score_histograms:
Visualization
The vizulaizer histograms for all decontam score objects generated from the pipeline[required]
- Links
- Documentation
- Source Code
- Stars
- 4
- Last Commit
- 3c60c1f
- Available Distros
- 2024.10
- 2024.10/amplicon
- 2024.10/metagenome
- 2024.10/pathogenome
- 2024.5
- 2024.5/amplicon
- 2024.5/metagenome
- 2024.2
- 2024.2/amplicon
- 2023.9
- 2023.9/amplicon
- 2023.7
- 2023.7/core